Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap results rounding #145

Open
bomjastik opened this issue Jun 3, 2019 · 11 comments
Open

Sitemap results rounding #145

bomjastik opened this issue Jun 3, 2019 · 11 comments

Comments

@bomjastik
Copy link

bomjastik commented Jun 3, 2019

Hello.

I'm trying to create the site map and everything works well besides one thing.

When I'm trying to create the site map from the index with 29,248 rows, I'm getting 29,000 result rows in the site map. The exact 29,000.

I have tried to do the same for another index with 3,291 rows, and I have got the exact 3000 result rows in it.

For smaller indexes with less than 100 rows, I'm getting correct result rows number.

Is there any reason why my results rounding may happen?

My index.js:

const algoliaSitemap = require('algolia-sitemap');

const algoliaConfig = {
    appId: '',
    apiKey: '',
    indexName: '',
};

function hitToParams({pretty_url, objectID, station_name}) {
    const url = pretty_url => `some_url`;

    const loc = url(pretty_url);

    const lastmod = new Date().toISOString();

    const priority = 0.8;

    return {
        loc,
        lastmod,
        priority,
    };
}

algoliaSitemap({
    algoliaConfig,
    sitemapLoc: 'https://mydomain.com/sitemaps',
    outputFolder: 'sitemaps',
    hitToParams,
}).then(() => {
    console.log('Done generating sitemaps'); // eslint-disable-line no-console
})
    .catch(console.error);

Regards.

@Haroenv
Copy link
Contributor

Haroenv commented Jun 4, 2019

Is this happening with v2.2.0, did you try out if this also happens for you on 2.1.0? Thanks!

@alexpchin
Copy link

alexpchin commented Oct 4, 2020

Hi, @Haroenv I've just started to implement this too and I currently have 2,159 results but I'm only getting 2000 URLs in sitemap.0.xml?

@Haroenv
Copy link
Contributor

Haroenv commented Oct 4, 2020

do you have multiple sitemap files @alexpchin? Does this also happen with appId: 'latency', indexName: 'instant_search', apiKey: '8baa3791b46c2a52259007747e743e07'?

@alexpchin
Copy link

alexpchin commented Oct 4, 2020

Hi @Haroenv, I've just checked with:

algoliaSitemap({
    algoliaConfig: {
      appId: "latency",
      apiKey: "8baa3791b46c2a52259007747e743e07",
      indexName: "instant_search",
    },
    hitToParams: ({ objectID }) => {
      return {
        loc: `https://someurl.com/${objectID}`,
        lastmod: new Date().toISOString(),
        priority: 0.7,
      }
    },
  }),

Only one sitemaps/sitemap-index.xml and sitemaps/sitemap.0.xml are created. Inside sitemaps/sitemap.0.xml there are 21,000 URLs.

  const aggregator = async (args) => {
    let { hits, cursor } = args;
    do {
      if (!hits) {
        return;
      }
      batch = batch.concat(
        hits.reduce((entries, hit) => {
          const entry = hitToParams(hit);
          return entry ? entries.concat(entry) : entries;
        }, [])
      );
      if (batch.length > CHUNK_SIZE) {
        await flush();
      }
      ({ hits, cursor } = await index.browseFrom(cursor));
     **console.log('hits -->', hits.length)**
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 469
    } while (cursor);
    **console.log('batch ---> is 21,000', batch.length);**
    await handleSitemap(batch);
    const sitemapIndex = createSitemapindex(sitemaps);
    await saveSiteMap({
      sitemap: sitemapIndex,
      root: outputFolder,
      filename: 'sitemap-index',
    });
  };

*Aside, at some point we could potentially add a check if the output folder hasn't already been created and create it?

@Haroenv
Copy link
Contributor

Haroenv commented Oct 4, 2020

Thanks for checking, that indeed doesn't look right! Maybe the last batch doesn't have a cursor, and therefore the loop stops. We should switch it to a regular while loop and do the fetching the beginning of the loop to avoid it I think. The program flow will be much simpler

for your aside, that definitely makes sense I think mkdir now has a -p option in node that we can use

@alexpchin
Copy link

alexpchin commented Oct 4, 2020

Hi @Haroenv, sorry, yes I was going to follow this up with a suggestion for a solution as the problem is with the cursor but I was eating some noodles...

We could just solve quickly by using a run flag that updates at the start of the while loop if there is no curser?

  const aggregator = async (args) => {
    let { hits, cursor } = args;
    let run = true;
    do {
      run = !!cursor;
      if (!hits) {
        return;
      }
      batch = batch.concat(
        hits.reduce((entries, hit) => {
          const entry = hitToParams(hit);
          return entry ? entries.concat(entry) : entries;
        }, [])
      );
      if (batch.length > CHUNK_SIZE) {
        await flush();
      }
      ({ hits, cursor } = await index.browseFrom(cursor));
    } while (run);
    await handleSitemap(batch);
    const sitemapIndex = createSitemapindex(sitemaps);
    await saveSiteMap({
      sitemap: sitemapIndex,
      root: outputFolder,
      filename: 'sitemap-index',
    });
  };

  return index.browse(params).then(aggregator);
}

This would work but arguably a little more confusing in terms of flow.

@Haroenv
Copy link
Contributor

Haroenv commented Oct 5, 2020

I think that makes sense, but a regular while with the fetching first will be simpler probably :)

@alexpchin
Copy link

@Haroenv Did you want me to PR? I saw that there is also a number of chores outstanding too?

@Haroenv
Copy link
Contributor

Haroenv commented Oct 7, 2020

All PRs that were open were for devDependencies, so not urgent for this project. A PR would be well appreciated if you confirm that this is the source of the bug. Thanks @alexpchin !

@Pr00xxy
Copy link

Pr00xxy commented Sep 6, 2021

@Haroenv Hi was there any follow up on this issue? I have started prototyping an implementation using this library but this bug is a blocker for now

@Haroenv
Copy link
Contributor

Haroenv commented Sep 6, 2021

Does the fix suggested work for you @Pr00xxy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants