Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Indexing...? #74

Open
silasdavis opened this issue Jul 10, 2019 · 0 comments
Open

[Core] Indexing...? #74

silasdavis opened this issue Jul 10, 2019 · 0 comments

Comments

@silasdavis
Copy link
Collaborator

Currently we have no native ability to search documents stored with Hoard. We need to decide whether hoard should take on indexing responsibilities itself.

The opportunity exists for a Hoard instance to maintain its own index. Some thoughts:

  • Whether to index should be elective by caller
  • We should encrypt the index itself
  • We could use something like: https://github.com/blevesearch/bleve
  • We could pipe to an external service like Solr/Elasticsearch but this is a rather heavy dependency and makes it harder to manage deletion from backing store and index (when we implement deletion)

At some point index-as-a-file will become unwieldy but if callers are able to specify an index and therefore shard appropriately it might go a long way. We could consider replication of indices over Tendermint.

Bleve is a spiritual relation of lucene. I would expect its indices to be rather compact.

Storing the index itself in Hoard seems appealing - obviously this would probably need to be snapshots of the index, though that is not entirely a given (on IPFS provided a index is DAG-friendly it might be okay - which it won't be when encrypted of course...).

If we do not commit the index for each document we obviously run the risk of irretrievably loosing that index information, since we cannot do something like trawl data to re-index gaps. Though I suppose if we hung on to references to data we stored we could... Since we operate on having the secrets that encrypt grants for data stored with a particular Hoard instance we could maintain a write-ahead log of references that have been indexed in memory but whose index has not been persisted, and on crashing recover that log....

@compleatang compleatang changed the title Hoard indexing [Core] Indexing...? Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants