Trying best case apache spark working environment for robust data pipelines
-
Updated
Apr 1, 2023 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Trying best case apache spark working environment for robust data pipelines
Writing dummy snippets of code to read, manipulate, and build a simple ML model with PySpark.
Distributed processing challenge
Parctice with Spark on Azure Databricks
Real-time analysis pipeline
parser for XML files using pyspark
A forecasting project based on Apache-Spark and implemented with Naive Bayes theorem.
Gallery of Apache Zeppelin notebooks using Enth-Spark-AI.
An image for running Scala Jupyter notebooks and Apache Spark in the cloud on OpenShift
Final project for IDS 561 Big Data Analytics at University of Illinois at Chicago, Instagram search engine based on Biography, built using Apache Spark with Python
Created by Matei Zaharia
Released May 26, 2014