Module: HDFSiterator¶
Iterators for distributed iteration over Hadoop Distributed Filesystem
Iterators that can iterate over distributed data in an HDFS filesystem in a distributed manner. See HDFS.
- iter HDFSiter(path: string, type rec, regex: string)¶
Iterate through an HDFS file (available in the default configured HDFS server) and yield records matching a regular expression.
Serial and leader-follower versions of this iterator are available.
Arguments: - path – the path to the file within the HDFS server
- rec – the type of the records to return
- regexp – a regular expression with the same number of captures as the number of fields in the record type rec