Iterators for distributed iteration over Hadoop Distributed Filesystem
Iterators that can iterate over distributed data in an HDFS filesystem
in a distributed manner. See
HDFSiter(path: string, type rec, regex: string)¶
Iterate through an HDFS file (available in the default configured HDFS server) and yield records matching a regular expression.
Serial and leader-follower versions of this iterator are available.
- path -- the path to the file within the HDFS server
- rec -- the type of the records to return
- regexp -- a regular expression with the same number of captures as
the number of fields in the record type