Skip to content
#

dataengineering

Here are 530 public repositories matching this topic...

OpenMetadata
orangutan-stem

An open-source project dedicated to constructing robust data pipelines and scalable software infrastructure. We leverage industry-standard tools favored by developers to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on farms across Sumatra, Indonesia, ensuring real-world applicability and resilience.

  • Updated May 27, 2024
  • Python

In this Project, I'll be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerised using Docker.

  • Updated May 23, 2024
  • Python

Improve this page

Add a description, image, and links to the dataengineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataengineering topic, visit your repo's landing page and select "manage topics."

Learn more