The Internals of Spark SQL
-
Updated
May 25, 2024
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
The Internals of Spark SQL
Open source platform for the machine learning lifecycle
Fully managed Apache Parquet implementation
REST API for Apache Spark on K8S or YARN
Simple and Distributed Machine Learning
curated list of awesome tools and libraries for specific domains
lakeFS - Data version control for your data lake | Git for data
Astronomy Broker based on Apache Spark
Apache Spark based framework for analysis A/B experiments
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
In this Project, I'll be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerised using Docker.
Experiment tracking server focused on speed and scalability
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Analyzed Apple's dataset to check how many people bought Airpods after buying Mac or iPhone. Thereafter, using ML and predictive analytics to check future outcomes.
Python package for working with demand-side grid projects, datasets and queries
The Proxima platform.
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Apache Spark based 'Dist' utility to supplement Data Cooker ETL tool
The Tech Canvas Experimenters Hub is an interdisciplinary repository for collaborative projects spanning various fields, such as hardware like Arduino UNO, financial engineering, machine learning, natural language processing, and the corresponding mathematical foundations for all fields.
Created by Matei Zaharia
Released May 26, 2014