-
Updated
Jan 2, 2017 - Scala
dataproc-cluster
Here are 22 public repositories matching this topic...
Kaggle - Outbrain Click Prediction (Oct-2016 - Jan-2017)
-
Updated
Apr 21, 2017 - R
Determination of which words occur in a dataset of textbooks along with each word's occurrence count identification with the help of Google Cloud Platform based Dataproc cluster formation.
-
Updated
Jul 28, 2017 - Java
Run Jupyter Notebooks (and store data) on Google Cloud Platform.
-
Updated
Oct 6, 2017 - Python
Collection of personal resources on Google Cloud
-
Updated
Dec 1, 2017
A Scala Spark based project to experiment with map-reduce algorithms on big data graph shaped
-
Updated
Jul 13, 2018 - Scala
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
-
Updated
Feb 29, 2020 - HCL
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
-
Updated
Mar 4, 2020 - Python
gke with terraform, dataproc with terraform
-
Updated
Mar 11, 2020 - HCL
Training a classification model as a Dataproc Job and using Kafka/PubSub connector for real-time prediction using pre-trained models
-
Updated
Oct 11, 2020 - Jupyter Notebook
Creating gcloud dataproc cluster with this github action
-
Updated
Oct 18, 2020 - Shell
-
Updated
Jun 26, 2021 - Jupyter Notebook
Yelp ETL Pipeline in Apache Spark on Google Cloud Dataproc
-
Updated
Jul 10, 2021 - Jupyter Notebook
Criando um ecossitema Hadoop totalmente gerenciado com Google Cloud Platform: O desafio consiste em efetuar um processamento de dados utilizando o produto Dataproc do GCP. Esse processamento irá efetuar a contagem das palavras de um livro e informar quantas vezes cada palavra aparece no mesmo.
-
Updated
Aug 4, 2021
Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
-
Updated
Aug 21, 2021 - Shell
Content about how to create big data ecosystems on the Cloud
-
Updated
Aug 28, 2021 - HTML
GCP_Data_Enginner
-
Updated
Sep 4, 2021 - Shell
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
-
Updated
Sep 19, 2022 - Python
Deploying production ready environment for Spark cluster
-
Updated
Oct 30, 2022 - HCL
An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.
-
Updated
May 19, 2023 - Python
Improve this page
Add a description, image, and links to the dataproc-cluster topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataproc-cluster topic, visit your repo's landing page and select "manage topics."