Skip to content
This repository has been archived by the owner on Apr 25, 2024. It is now read-only.

This projects provides a NFSv3 connector for Hadoop. Using the connector, Apache Hadoop and Apache Spark can use NFSv3 server as their storage backend.

Notifications You must be signed in to change notification settings

NetApp/NetApp-Hadoop-NFS-Connector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NetApp-Hadoop-NFS-Connector

This site is obsolete. Please use NetApp FAS NFSConnector product from now.netapp.com

Overview

The Hadoop NFS Connector allows Apache Hadoop (2.2+) and Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint. The NFS Connector supports two modes: (1) secondary filesystem - where Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a second storage endpoint, and (2) primary filesystem - where Hadoop/Spark runs entirely on a NFSv3 storage server.

The code is written in a way such that existing applications do not have to change. All one has to do is to copy the connector jar into the lib/ directory of Hadoop/Spark. Then, modify core-site.xml to provide the necessary details.

NOTE: The code is in beta. We would love for you to try it out and give us feedback.

This is the first release and it does the following:

  • Connects to a NFSv3 storage server supporting AUTH_NONE or AUTH_SYS authentication method.
  • Works with Apache Hadoop (vanilla) 2.2 or newer, Hortonworks HDP 2.2 or newer
  • Supports all operations defined by the Hadoop FileSystem interface.
  • Pipelines the READ/WRITE requests to utilize the underlying network (works fine with 1GbE and 10GbE networks)

We are planning to add these in the near future:

  • Ability to connect to multiple NFS endpoints (multiple IP addresses). This allows for even more bandwidth.
  • Integrate with Hadoop user authentication

How to use

Once the NFS connector is configured, you can easily invoke it from the command-line using the Hadoop shell.

  console> bin/hadoop fs -ls nfs://<nfs-server-hostname>:2049/ (if using as secondary filesystem)
  console> bin/hadoop fs -ls / (if using as default/primary filesystem)

When new jobs are submitted, you can simply provide it as an input or output path or both:

  (assuming NFS is used as a secondary filesystem)
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in /tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort /tera/in nfs://<nfs-server-hostname>:2049/tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out

Configuration

  1. Compile the project ``` console> mvn clean package ```
  2. Copy the jar file to the shared common library directory based on your Hadoop installation. For example, for hadoop-2.4.1: ``` console> cp target/hadoop-connector-nfsv3-1.0.jar $HADOOP_HOME/share/hadoop/common/lib/ ```
  3. Add parameters of NFSv3 connector into core-site.xml located in Hadoop configuration directory (e.g., for hadoop-2.4.1: $HADOOP_HOME/conf) ``` fs.defaultFS nfs://:2049 fs.nfs.configuration /nfs-mapping.json fs.nfs.impl org.apache.hadoop.fs.nfs.NFSv3FileSystem fs.AbstractFileSystem.nfs.impl org.apache.hadoop.fs.nfs.NFSv3AbstractFilesystem ```
  4. Start Hadoop. NFS can now be used inside Hadoop.

About

This projects provides a NFSv3 connector for Hadoop. Using the connector, Apache Hadoop and Apache Spark can use NFSv3 server as their storage backend.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages