Skip to content

caiomsouza/microsoft-azure-databricks-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microsoft Azure Databricks Playground

Azure Databricks: A Brief Introduction
https://www.youtube.com/watch?v=cxyUy1bZ9mk

Intro to Machine Learning for Developers on Azure Databricks
https://databricks.com/intro-to-machine-learning-for-developers-on-azure-databricks

This repo is dedicated to Microsoft Azure Databricks sample codes.

Azure Databricks Videos

https://github.com/caiomsouza/microsoft-azure-databricks-playground/blob/master/videos.md

Sample Code from Microsoft

https://notebooks.azure.com/caiomsouza/libraries/Azure-MachineLearningNotebooks/tree/databricks

Databricks Articles

Article Name and Link
Azure Databricks Videos
How to start an Azure Databricks Cluster
How to connect Azure Data Lake Store with Azure Databricks
How to connect Azure Databricks with Azure Blob Storage
Creating a Spark Temp Table using Spark SQL
Attach Event Hub Spark libraries to Azure Databricks (Spark) cluster
Processing Event Hubs Capture files (AVRO Format) using Spark (Azure Databricks), save to Parquet or CSV format
Getting tweets with keyword “Bolsonaro” and sending them to the Event Hub in real-time using Azure Databricks (Spark)
Predicting using Azure Databricks (Spark), SparkR and RandomForest
Consuming real-time sensor data using Azure Event Hub and Azure Databricks (Spark)
IoT Smart House Demo: Send real-time sensor data to Event Hub move to Data Lake Store and explore using Databricks
Read a CSV File from Azure Data Lake Store with Azure Databricks
This notebook reads a CDM folder, applies transformations to some of the entities and then writes out all entities including the modified ones to a new CDM folder

The power of Azure Databricks

I started in the Big Data World some years ago using pure Apache Software, then Hortonworks and Cloudera. In the last years I really enjoyed to work with HDFS, MapReduce I and II, Storm, Pig, Hive, Cloudera Impala, Spark, etc.

Since I joined the Microsoft world in April 2018, I started looking with my open source eyes to Microsoft Azure offers to deliver Big Data Science projects and every day I like more and more the Azure Databricks offer.

I am very happy to see Microsoft moving each day more and more to Azure Databricks World (Apache Spark, Python, R, Scala and all open source technologies). The combination of Microsoft and Databricks is incredible. Great product and support from Microsoft and Databricks.

Basically with Azure Databricks you have into one single product the power to run big data jobs, implement machine learning (Python or R) using a notebook.

You can run Azure Databricks Notebooks direct as a Job in Azure Databricks or schedule it in Azure Data Factory.

What is Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

More:
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/azure-databricks/what-is-azure-databricks.md

Introduction to Azure Databricks
https://www.slideshare.net/jamserra/introduction-to-azure-databricks-83448539

Benchmarking Big Data SQL Platforms in the Cloud
TPC-DS benchmarks demonstrate Databricks Runtime 3.0’s superior performance
https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html

Papers

Spark: Cluster Computing with Working Sets
http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf

Improving MapReduce Performance in Heterogeneous Environments
http://static.usenix.org/event/osdi08/tech/full_papers/zaharia/zaharia.pdf

MLlib: Machine Learning in Apache Spark
http://www.jmlr.org/papers/volume17/15-237/15-237.pdf

Spark AI Summit 2018 talks

https://databricks.com/sparkaisummit/europe/spark-summit-2018-keynotes

Sample Codes (External)

https://github.com/hipic/biz_data_LA
https://docs.azuredatabricks.net/spark/latest/training/index.html