A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
-
Updated
Mar 26, 2024
A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
DE직무에 필요한 모든 것
Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Dockerfile for running Apache Knox (http://knox.apache.org/) in Docker
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
Big Data is Stored and analyzed of various Customer using Hadoop and other tools like Hive, Zookeeper, Hbase and sqoop and all details of the customer is analyzed then result are given.This result is very useful for companies.
The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.
HDFS、MapReduce、Hive、Zookeeper原理以及实践操作
Built a Large Scale Distributed Data Processing system for Streaming Analytics using Hadoop Ecosystem (Apache Spark and HDFS), in Cloud for real-time spatial analytics.
Practise programs in hadoop ecosystem for refrence
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
[Work in progress] Client library for simplified access to Apache Accumulo
This project focuses on analyzing movie data using Pyspark tailored for efficient data processing on Hadoop Distributed File System (HDFS)
Add a description, image, and links to the hadoop-ecosystem topic page so that developers can more easily learn about it.
To associate your repository with the hadoop-ecosystem topic, visit your repo's landing page and select "manage topics."