Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Sitemap handling #1755

Open
johnboxall opened this issue Apr 22, 2024 · 2 comments
Open

[FEATURE] Add Sitemap handling #1755

johnboxall opened this issue Apr 22, 2024 · 2 comments

Comments

@johnboxall
Copy link
Collaborator

johnboxall commented Apr 22, 2024

Sitemaps are pretty cool! They tell crawlers what pages are available on a site. On large sites with complex catalogs, it helps where it might not be possible to discover all pages with plain navigation

B2C Commerce includes sitemap management features: https://help.salesforce.com/s/articleView?id=cc.b2c_sitemap_overview.htm&type=5

And while it isn't too difficult to integrate them into PWA Kit, it would be nice if they were included out of the box.

At a high level, to integrate sitemaps into PWA Kit, folks must:

  1. Configure a Hostname Alias for their site in Business Manager.
  2. Run the system job to create the sitemap files.
  3. Check that sitemaps are accessible. If you're using SFRA, must include a SiteMap controller. Once complete, sitemaps should be accessible at https://$HOST/s/$SITE/$SITEMAP.
  4. Setup a handler in ssr.js to serve the sitemap file from the storefront domain.

An example Express.js handler that infers the settings and handle errors could look like this:

import {getConfig} from '@salesforce/pwa-kit-runtime/utils/ssr-config'
const {Readable} = require('stream')

async function handleSiteMap(req, res) {
    const config = getConfig()
    const host = config.ssrParameters.proxyConfigs.find((config) => {
        return config.path === 'ocapi'
    }).host
    if (!host) {
        return res.status(500).send('Storefront host not configured.')
    }
    const file = req.originalUrl.substring(1)
    const site = config.app.defaultSite
    const url = `https://${host}/s/${site}/${file}`
    let siteMapResponse
    try {
        siteMapResponse = await fetch(url)
    } catch (err) {
        return res.status(500).send('Error fetching sitemap.')
    }

    res.status(siteMapResponse.status)
    const contentType = siteMapResponse.headers.get('content-type')
    res.set('Content-Type', contentType)
    Readable.fromWeb(siteMapResponse.body)
        .once('error', function handleSiteMapPipeError(err) {
            res.status(500).send('Error fetching sitemap.')
        })
        .pipe(res)
}

It is important to pipe the file and proxy the status and headers so it renders correctly (and quickly as sitemap files can be big!)

It would also be useful to add caching headers so it is stored on the edge.

From there, you'd wire it up in ssr.js:

// ...

const runtime = getRuntime()
const {handler} = runtime.createHandler(options, (app) => {
  // ...
  app.get(/^\/sitemap(?:_index|\_(\d+))\.xml$/, handleSiteMap);
})

This solution assumes it is desirable to serve the sitemap file from the storefront/MRT domain. If that wasn't need, you use a handler that returns a HTTP redirect or the MRT redirects to do with with less/no code.

@johnboxall
Copy link
Collaborator Author

johnboxall commented May 13, 2024

One gotcha in the Sitemaps page is Business Manager is that if your Alias file defines a hostname in the settings key, then only that hostname will show up for selection.

If for some reason, you can't get the hostname or links write, you can also rewrite them using an XML parser:

const { getConfig } = require("@salesforce/pwa-kit-runtime/utils/ssr-config");
const { Readable, Transform } = require("stream");
const sax = require("sax");

async function handleSiteMap(req, res) {
  const config = getConfig();
  const host = config.ssrParameters.proxyConfigs.find((config) => {
    return config.path === "ocapi";
  }).host;
  if (!host) {
    return res.status(500).send("Storefront host not configured.");
  }
  const file = req.originalUrl.substring(1);
  const site = config.app.defaultSite;
  const url = `https://${host}/s/${site}/${file}`;
  let siteMapResponse;
  try {
    siteMapResponse = await fetch(url);
  } catch (err) {
    return res.status(500).send("Error fetching sitemap.");
  }

  res.status(siteMapResponse.status);
  const contentType = siteMapResponse.headers.get("content-type");
  res.set("Content-Type", contentType);

  const parser = sax.createStream(true, { lowercase: true });
  parser.once("error", (_) => {
    res.status(500).send("Error parsing sitemap.");
  });

  const linkRewriter = new Transform({
    transform(chunk, _, callback) {
      const strChunk = chunk.toString();
      // 👇 Rewrite here!!!
      const rewrittenChunk = strChunk.replace(/https:\/\//g, "http://");
      this.push(rewrittenChunk);
      callback();
    },
  });

  Readable.fromWeb(siteMapResponse.body)
    .once("error", (_) => {
      res.status(500).send("Error reading sitemap.");
    })
    .pipe(parser)
    .pipe(linkRewriter)
    .pipe(res);
}

@johnboxall
Copy link
Collaborator Author

johnboxall commented May 15, 2024

If you're using eCDN (or any other stacked CDN) for routing with the shopper facing vanity domain name, an alternative is to directly route traffic for the sitemap resource from the CDN:

  1. Update your B2C Commerce alias file with your domain
  2. Run the sitemap job for that domain
  3. Update your CDN routing expression to route requests for sitemaps (/^\/sitemap(?:_index|\_(\d+))\.xml$/) to the B2C Commerce instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant