Skip to content

Docker image for Apache Spark, with complete hadoop dependency included.

License

Notifications You must be signed in to change notification settings

Naitreey/docker-spark-hadoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docker-spark-hadoop

Project status: alpha.

Spark image built with official instructions presents several problems:

  • Does not support s3a:// urls for application and dependency jars. Application has to build custom Docker image to bundle application jar.
  • Built with Scala 2.11, rather than 2.12. Although Spark officially provides Scala 2.12 version of distribution, it doesn't include Hadoop dependencies.

This image addresses these issues.

How to build

  1. Download and extract Hadoop binary distribution (any version above 2.8) into build/ directory. Rename it as hadoop.

  2. Download and extract Spark binary distribution (without pre-packaged Hadoop dependencies) into build/ directory. Rename it as spark.

  3. Build Spark image:

    docker build -t <tag> -f Dockerfile .

About

Docker image for Apache Spark, with complete hadoop dependency included.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published