To read all the files in a directory in HDFS (Hadoop Distributed File System) using Java, you can use the FileSystem
class and the FileStatus
class from the org.apache.hadoop.fs
package.
Here's an example of how you can read all the files in a directory in HDFS using Java:
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class Main { public static void main(String[] args) throws Exception { // create a configuration object Configuration conf = new Configuration(); // get the filesystem object FileSystem fs = FileSystem.get(conf); // get the path of the directory Path dirPath = new Path("/path/to/directory"); // get the file status of the directory FileStatus[] fileStatuses = fs.listStatus(dirPath); // iterate over the file statuses for (FileStatus fileStatus : fileStatuses) { // get the path of the file Path filePath = fileStatus.getPath(); // print the path of the file System.out.println(filePath); } } }Source:www.lautturi.com
This code creates a Configuration
object and a FileSystem
object using the FileSystem.get()
method. It then creates a Path
object for the directory you want to list the files of. It uses the listStatus()
method of the FileSystem
object to get an array of FileStatus
objects representing the files in the directory. It then iterates over the array of FileStatus
objects and gets the Path
of each file using the getPath()
method. Finally, it prints the Path
of each file to the console.
Note that this code assumes that the HDFS cluster is configured and running, and that you have the necessary permissions to access the directory. You may also need to set up the HADOOP_CONF_DIR
environment variable or specify the path to the Hadoop configuration files in the Configuration