subdomain crawl with "allowedDomains" parameter crawls top domain, too #381

michaelpapesch · 2021-11-29T11:50:38Z

For the domain "test.domain.com" result.response.url includes urls from "domain.com", too.
I tried it with the subdomain name and regexp.
I don't understand, why, shouldn't "allowedDomains" parameter prevent scanning from URLs of other domains?

(async () => {
    const crawler = await HCCrawler.launch({
        headless: true,
        args: [
            '--ignore-certificate-errors',
            '--no-sandbox',
        ],
        allowedDomains: [domain],
        maxDepth: 8,
        customCrawl: async (page, crawl) => {
            const result = await crawl();
            result.content = await page.content();
            return result;
        },
        onSuccess: result => {
            const values = [
                result.response.url
            ];
        },
    await crawler.queue(url);
    await crawler.onIdle();
    await crawler.close().then(() => connection.end());
    console.log('Scan completed.');
})();

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subdomain crawl with "allowedDomains" parameter crawls top domain, too #381

subdomain crawl with "allowedDomains" parameter crawls top domain, too #381

michaelpapesch commented Nov 29, 2021 •

edited

subdomain crawl with "allowedDomains" parameter crawls top domain, too #381

subdomain crawl with "allowedDomains" parameter crawls top domain, too #381

Comments

michaelpapesch commented Nov 29, 2021 • edited

michaelpapesch commented Nov 29, 2021 •

edited