Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 384 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 384 Bytes

Book-Search-Engine

Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)

  • Crawled about 100,000 web pages using crawler4j and performed link analysis by implementing PageRank on the web graph with Apache Spark’s Graphx.
  • Indexed the crawled documents using Apache Lucene and ordered the documents for each query by a combination of PageRank and TF/IDF score.