-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algolia index records reduction after an undefined amount of time #560
Comments
Hey @FilippoRezzonico, Can you confirm you are using the latest version of the docsearch-scraper docker image? The only case an index can be unavailable, is at the end of a successful crawl: when the crawler runs, it stores records in a |
Hi @shortcuts,
|
Could you confirm the index is deleted when the search does not work anymore? If it's the case, I suggest you to contact support@algolia.com (also provide the link of this issue so you don't have to re-explain it) and they will be able to give you information of why your index is deleted. With our scraper, we always keep the production index up and don't perform delete operations
If the index is not deleted but only the search does not work, it might be related to some inconsistencies during the crawl. Could you please provide a gist with your config file so I can try it?
Hope this gives you hints :D |
Actually, our indexes are not deleted but it seems that their number of records gets reduced after some time. So I think that, as you said, it could be caused by some issues during the crawling. {
"index_name": "mia-platform-docs",
"start_urls": [
"https://docs.mia-platform.eu"
],
"stop_urls": [
"/$"
],
"selectors": {
"text": "article p, article li",
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5",
"lvl6": "article h6",
"lvl0": {
"selector": ".menu__link--sublist.menu__link--active",
"global": true,
"default_value": "Documentation"
}
},
"sitemap_urls": [
"https://docs.mia-platform.eu/sitemap.xml"
],
"sitemap_alternate_links": true,
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"min_indexed_level": 0,
"conversation_id": [
"1280385092"
],
"nb_hits": 12708
} Do you see any possible problem with it?
|
As the issue is mostly inconsistencies between crawls, it might have an impact, yes.
The new doc has been deployed since then, links are now at On my side, I had
There's no caching on our side so it shouldn't have an impact. |
For now, I will try to update our sitemap.xml file and see if this solves the problem once and for all. |
Situation
docsearch-scraper docker image starts scraping my company documentation website any time we deploy a new version of it and correctly updates our indexes. Using the search bar will then return the correct records.
The problem is that, after an undefined amount of time (can be weeks or even months), our algolia search bar will not return any record if used. Is it possible that indexes get deleted after a certain amount of time or that algolia docsearch-scraper deletes them or returns no indexes if some condition occurs (maybe launching it multiple times in a small amount of time)?
Result
Our documentation search through Algolia does not return any single record untill we run the docsearch-scraper docker image again.
Workaround
Any time that we notice that our Algolia search does not return any result we run docsearch-scraper docker image again.
The text was updated successfully, but these errors were encountered: