File System Crawler for Elasticsearch

Welcome to the FS Crawler for Elasticsearch

This crawler helps to index binary documents such as PDF, Open Office, MS Office.

Main features:

Local file system (or a mounted drive) crawling and index new files, update existing ones and removes old ones.
Remote file system over SSH/FTP crawling.
REST interface to let you "upload" your binary documents to elasticsearch.

Latest versions

Current "most stable" versions are:

Elasticsearch	FS Crawler	Released	Docs
6.x, 7.x, 8.x	2.10-SNAPSHOT		2.10-SNAPSHOT

The guide has been moved to ReadTheDocs.

Works on my machine - and yours ! Spin up pre-configured, standardized dev environments of this repository, by clicking on the button below.

Thanks to JetBrains for the IntelliJ IDEA License!

Thanks to SonarCloud for the free analysis!

Name		Name	Last commit message	Last commit date
Latest commit History 2,731 Commits
.github		.github
.mvn		.mvn
beans		beans
cli		cli
contrib		contrib
core		core
crawler		crawler
distribution		distribution
docs		docs
elasticsearch-client		elasticsearch-client
framework		framework
integration-tests		integration-tests
rest		rest
settings		settings
src/main/resources/org/apache/maven/plugin/announcement		src/main/resources/org/apache/maven/plugin/announcement
test-documents		test-documents
test-framework		test-framework
tika		tika
.actrc		.actrc
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.readthedocs.yaml		.readthedocs.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
deploy-settings.xml		deploy-settings.xml
pom.xml		pom.xml
release.sh		release.sh