Revisit the feed concept #794

kptdobe · 2021-03-17T10:42:45Z

For the theblog, I had to debug the feed concept and I think it must be revisited (or killed ?).

Cache issues

https://blog.adobe.com/feeds/jp.xml is cached empty. How can it be updated ? The live version is correct https://theblog--adobe.hlx.live/feeds/jp.xml.

More generally, if the xml pages are cached and we want the feeds to be up-to-date, then they all must be flushed on every query-index changes (a topic could be added / removed, a page could be added / removed...).

ESI include issues

A feed is a collection of 10 ESI includes which on hlx.page is impossible to get right: timeout or something wrong happens and the xml stream is cut in the middle. This should definitively be implemented differently.

cc @trieloff @davidnuescheler

kptdobe · 2021-03-17T10:53:15Z

Sidekick is out-of-scope. The problem is that the author neither go on the "source content" page (xml definition in the git repo) nor on the query-index. It should rather be a backend job that maintains all the feed pages.

rofe · 2021-03-17T10:57:30Z

Sidekick would only work as a browser extension on an xml or json document anyway. The bookmarklet won't load,

kptdobe · 2021-03-19T09:40:03Z

Some details on the ESI include issue.

The problem is easy to reproduce, just open https://theblog--adobe.hlx.page/feeds/jp.xml.

The definition of the feed is here: https://github.com/adobe/theblog/blob/master/feeds/jp.xml
The action code that executes the rendering is here: https://github.com/adobe/helix-pages/blob/master/cgi-bin/feed.js

The xml page ends up being a set of 10 ESI includes to individual {entry.id}.embed.html requests (definition is coming from an XLSX spreadsheet but idea is to dump an html block inside a <![CDATA[...]]> attribute). When it fails, you can flush the last (broken) include (curl -X PURGE {entry.id}.embed.html) and reload https://theblog--adobe.hlx.page/feeds/jp.xml: this usually produce a failure at a different place.

I have added some logs that are visible in the AWS console (CloudWatch > LogGroups > pages--cgi-bin-feed). But this will probably not help for the ESI include debugging.

cc @stefan-guggisberg tell me if this is not clear or you need more info.

kptdobe · 2021-03-19T10:48:21Z

For the cache issue, @trieloff mentioned that lowering the TTL to 15 or 30 mins on the outer CDN for feeds "should" do the trick. Something we can explore if we solve the ESI issue.

stefan-guggisberg · 2021-03-23T15:15:01Z

I could boil down the ESI include issue to the following simple example:

I put the following static XML snippet with 5 ESI includes in my blog fork: https://github.com/stefan-guggisberg/theblog/blob/master/entries.xml

When I request it through Fastly (https://theblog--stefan-guggisberg.hlx.page/entries.xml) the response is always corrupted.

With fewer ESI includes it works, e.g. https://theblog--stefan-guggisberg.hlx.page/entries2.xml

The issue we're facing seems to be a combination of some Fastly timeout for ESI processing (to be verified) and slow delivery of included resources.

stefan-guggisberg · 2021-03-25T08:25:24Z

Regarding the caching issues:

AFAICU, in order to completely purge a feed the following steps are required (in this exact order):

purge the blog posts in the feed if they were modified
purge the cgi-bin request, e.g. /cgi-bin/feed.xml?src=/jp/query-index.json%3Flimit%3D10&id=path&title=title&updated=date
purge the feed request, e.g. /feeds/jp.xml

on

inner CDN (hlx.page)
outer CDN (hlx.live)
Skyline (only the feed request, e.g. /feeds/jp.xml)

See also https://github.com/adobe/project-helix/pull/540, which probably lead to confusion.

@trieloff Please review

kptdobe · 2021-03-25T08:47:32Z

Thanks, this is helpful. Just one "detail": the request to a blog post for rendering in the feed uses the .embed selector. Do we really purge the path with this selector when we purge a blog posts ?

stefan-guggisberg · 2021-03-25T08:56:12Z

Do we really purge the path with this selector when we purge a blog posts ?

I seriously doubt it. @rofe might know for sure.

stefan-guggisberg · 2021-03-25T09:05:19Z

Re adobe/project-helix#540: I went ahead and applied the change to theblog--adobe.hlx.live

trieloff · 2021-03-25T09:36:45Z

I've been thinking that we would probably benefit from fetching the included posts on the server side using helix-fetch instead of ESI to make use of the greater concurrency and better error handling this affords us.

This would also reduce the number of Fastly caches in play.

kptdobe · 2021-03-25T10:38:54Z

That's what I suggested as an alternative if we cannot solve the ESI issue. I am just afraid of the 60s limit to retrieve those 10 responses.

trieloff · 2021-03-25T17:31:32Z

30 seconds. We can start all requests in parallel and skip the entries that are not fast enough.

kptdobe mentioned this issue Mar 25, 2021

Japan feeds are not working adobe/theblog#631

Closed

2 tasks

trieloff mentioned this issue May 17, 2021

new repo: helix-xml-feed adobe/helix-home#200

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit the feed concept #794

Revisit the feed concept #794

kptdobe commented Mar 17, 2021 •

edited

kptdobe commented Mar 17, 2021

rofe commented Mar 17, 2021

kptdobe commented Mar 19, 2021

kptdobe commented Mar 19, 2021

stefan-guggisberg commented Mar 23, 2021 •

edited

stefan-guggisberg commented Mar 25, 2021 •

edited

kptdobe commented Mar 25, 2021

stefan-guggisberg commented Mar 25, 2021

stefan-guggisberg commented Mar 25, 2021

trieloff commented Mar 25, 2021

kptdobe commented Mar 25, 2021

trieloff commented Mar 25, 2021

Revisit the feed concept #794

Revisit the feed concept #794

Comments

kptdobe commented Mar 17, 2021 • edited

Cache issues

ESI include issues

kptdobe commented Mar 17, 2021

rofe commented Mar 17, 2021

kptdobe commented Mar 19, 2021

kptdobe commented Mar 19, 2021

stefan-guggisberg commented Mar 23, 2021 • edited

stefan-guggisberg commented Mar 25, 2021 • edited

kptdobe commented Mar 25, 2021

stefan-guggisberg commented Mar 25, 2021

stefan-guggisberg commented Mar 25, 2021

trieloff commented Mar 25, 2021

kptdobe commented Mar 25, 2021

trieloff commented Mar 25, 2021

kptdobe commented Mar 17, 2021 •

edited

stefan-guggisberg commented Mar 23, 2021 •

edited

stefan-guggisberg commented Mar 25, 2021 •

edited