AkkaSearchEngine

An concurrent website crawler and content indexer and search engine concept application using Akka actor models to enable both concurrent tasking of scraping and indexing and search but ideally concurrent scalable useage as well. Currently the Actor System is limited to generating 10 Scrape actors at a time but there is potential to scale this massively.

To Run

Download package
cd into the package directory and execute sbt run (you can run in docker too but akka logging hasn't been disabled yet
Follow instructions on the page

Todos

Implement the search function through an akka http web service and return html document with search results instead of data map.
Store the inverted index to a Hadoop or cassandra cluster and crawl every major news website.
Return the sentences containing the sentences in which search terms appear on page.
Connect web front end to the akka service.
Handle dead-letter issues in akka actors on shutdown.
more sophisticated ranking than number of hits of keywords

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
project		project
src/main		src/main
target		target
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
build.sbt		build.sbt
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project

project

src/main

src/main

target

target

.gitattributes

.gitattributes

Dockerfile

Dockerfile

README.md

README.md

build.sbt

build.sbt

entrypoint.sh

entrypoint.sh

Repository files navigation

AkkaSearchEngine

About

Releases

Packages

Languages

TRReeve/AkkaSearchEngine

Folders and files

Latest commit

History

Repository files navigation

AkkaSearchEngine

About

Topics

Resources

Stars

Watchers

Forks

Languages