Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,662 public repositories matching this topic...

In this Project, I'll be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerised using Docker.

  • Updated May 23, 2024
  • Python

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

  • Updated May 23, 2024
  • Python

The Tech Canvas Experimenters Hub is an interdisciplinary repository for collaborative projects spanning various fields, such as hardware like Arduino UNO, financial engineering, machine learning, natural language processing, and the corresponding mathematical foundations for all fields.

  • Updated May 21, 2024
  • Python

Created by Matei Zaharia

Released May 26, 2014

Followers
416 followers
Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala