Yelp Dataset Analysis using Apach Spark, PIG and insightfulls using Zeppelin GUI
-
Updated
Aug 3, 2017 - PigLatin
Yelp Dataset Analysis using Apach Spark, PIG and insightfulls using Zeppelin GUI
In this repository, Google Collab is paired with SparkSQL to determine key metrics about home sales data. Spark is also used to create temporary views, partition data, and cache/unchache a temporary table in the process.
Spark's assignment using SparkSQL and Spark Streaming processing with Kafka. Calculating spaceships consumptions.
spark with scala, including rdd, transform, action, hdfs, sparkSQL, dataframe and mllib
Designed a Machine Learning model which takes newsgroup dataset and performs binary classification to predict if a given document has Atheistic or Christian sentiment. Used LIME library and PySpark. Performed feature selection to improve classifier’s performance.
Weather Data Analysis using Python, Pandas, SparkSQL, AutoRegression Model
Developing Spark applications using scala
A fun place for me to blog about distributed databases, aerial arts, and life in general
Performance Benchmarking for Solr, Elastic, Impala, SparkSQL via SparkThrift -- using JMeter
Scope of this project is to calculate Daily Revenue from retail products
Home sale predictions using Pyspark and SparkSQL
Dockerfile for spark-ubunt-scala-python3
Add a description, image, and links to the sparksql topic page so that developers can more easily learn about it.
To associate your repository with the sparksql topic, visit your repo's landing page and select "manage topics."