Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Unreachable hosts error when trying to scrape data #576

Open
beeena opened this issue Apr 4, 2023 · 0 comments
Open

Getting Unreachable hosts error when trying to scrape data #576

beeena opened this issue Apr 4, 2023 · 0 comments

Comments

@beeena
Copy link

beeena commented Apr 4, 2023

I'm trying to scrape data using the following command.

docker run -it --env-file=./config/development/dev.env -e "CONFIG=$(cat ./config/config.json | jq -r tostring)" algolia/docsearch-scraper

Although I have ensured the usage of an accurate API-key and App-ID, I am encountering an error of "Unreachable hosts".

Error

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 119, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 45, in run_config
    config.query_rules
  File "/root/src/algolia_helper.py", line 21, in __init__
    self.index_name_tmp
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 127, in copy_rules
    return self.copy_index(src_index_name, dst_index_name, request_options)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 94, in copy_index
    request_options,
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 35, in write
    return self.request(verb, hosts, path, data, request_options, timeout)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 72, in request
    return self.retry(hosts, request, relative_url)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 94, in retry
    raise AlgoliaUnreachableHostException("Unreachable hosts")
algoliasearch.exceptions.AlgoliaUnreachableHostException: Unreachable hosts

config.json


{
    "index_name": "dev_RESORTIFI_HELP",
    "start_urls": [
      "https://help.resortifi.com/"
    ],
    "sitemap_urls": [
      "https://help.resortifi.com/sitemap.xml"
    ],
    "sitemap_alternate_links": true,
    "stop_urls": [
      "/tests"
    ],
    "selectors": {
      "lvl0": {
        "selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
        "type": "xpath",
        "global": true,
        "default_value": "Documentation"
      },
      "lvl1": "header h1",
      "lvl2": "article h2",
      "lvl3": "article h3",
      "lvl4": "article h4",
      "lvl5": "article h5, article td:first-child",
      "lvl6": "article h6",
      "text": "article p, article li, article td:last-child"
    },
    "strip_chars": " .,;:#",
    "custom_settings": {
      "separatorsToIndex": "_",
      "attributesForFaceting": [
        "language",
        "version",
        "type",
        "docusaurus_tag"
      ],
      "attributesToRetrieve": [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type"
      ]
    },
    "conversation_id": [
      "833762294"
    ],
    "nb_hits": 46250
  }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant