Skip to content

Count the number of times a word occurs in 1GB (Big Data) Dataset of books using hadoop map-reduce

Notifications You must be signed in to change notification settings

rishabhindoria/Google-Cloud-Platform-Hadoop-Map-Reduce

Repository files navigation

Google-Cloud-Platform-Hadoop-Map-Reduce

• Objective : To determine which words occur how many times in a dataset of textbooks Dataset size: 1GB

• Created a Google Dataproc Cluster

• Created a master and 3 worker nodes. Each node’s configuration was n1-standard-4 4vCPU 15 GB memory.

• Mapper class in java program was used to output key:value pairs wherein key was words and value was the textbook_id in which that particular word occurred.

• Reducer class in java program was used to grab the key:value pairs emitted by Mapper class to create the final output line for each word which stated which word occurred how many times in each of the textbook_id in the dataset(1GB) of textbooks.

• The java program took 1hr to run on 1GB dataset uploaded on Google Cloud Platform to create the final output.

Releases

No releases published

Packages

No packages published

Languages