-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rewrite metadata urls #509
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the correct place nor the correct code to use. also I'm not sure if we should link rewrite the metadata....
split.forEach((v) => { | ||
try { | ||
const u = new URL(v); | ||
if ((previewSuffix && u.host.endsWith(previewSuffix)) | ||
|| (liveSuffix && u.host.endsWith(liveSuffix))) { | ||
rewrite.push(getAbsoluteUrl(state, `${u.pathname}${u.search}${u.hash}`)); | ||
} else { | ||
rewrite.push(v); | ||
} | ||
} catch (e) { | ||
rewrite.push(v); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we need to be careful not to change the behaviour of the canonical url until we decided how it should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code only makes sure no ref--repo--owner.hlx.page|live
appears in meta on production, it does not change the canonical computation logic.
delete metadata[name]; | ||
} else if (Array.isArray(metadata[name])) { | ||
// make absolute URLs for arrays | ||
metadata[name] = metadata[name].map((value) => makeAbsoluteURLForMeta(state, value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reuse rewriteUrls
here. That util is also used in the rewrite-urls
step and turns URLs like this into path-only URLs. Or do we really need absolute URLs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we really need absolute URLs here?
or add a flag makeAbsolute
to rewrite-urls
()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it would probably be better to apply the rewriting in render, as the logic in metadata is already a bit complex:
helix-html-pipeline/src/steps/render.js
Lines 62 to 72 in c7b98c4
for (const [name, value] of Object.entries(meta.page)) { | |
const attr = name.includes(':') && !name.startsWith('twitter:') ? 'property' : 'name'; | |
if (Array.isArray(value)) { | |
for (const v of value) { | |
appendElement($head, createElement('meta', attr, name, 'content', v)); | |
} | |
} else { | |
appendElement($head, createElement('meta', attr, name, 'content', value)); | |
} | |
} | |
appendElement($head, createElement('link', 'rel', 'alternate', 'type', 'application/xml+atom', 'href', meta.feed, 'title', `${meta.title} feed`)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta are strings only (not anchor with href), I think it must stay absolute urls - you would need a special treatment to consume the meta if it can be a relative or an absolute url (if external link). In fact, making them relative might break all existing logic...
I initially started with rewriteUrls
but several things:
- it does not make absolute urls but relative - the extra
makeAbsolute
flag would be required and this adds extra logic which is currently isolated in "my"makeAbsoluteURLForMeta
function. - it does not deal with branches - if an author page a url with
custombranch--repo--owner.hlx.page
, it is not considered - this could be an improvement of the current method - because
rewriteUrls
is used in different places, I did not want to interfere and take the risk to change some behaviour
I can re-try with rewriteUrls
and see the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it would probably be better to apply the rewriting in render, as the logic in metadata is already a bit complex:
helix-html-pipeline/src/steps/render.js
Lines 62 to 72 in c7b98c4
for (const [name, value] of Object.entries(meta.page)) { const attr = name.includes(':') && !name.startsWith('twitter:') ? 'property' : 'name'; if (Array.isArray(value)) { for (const v of value) { appendElement($head, createElement('meta', attr, name, 'content', v)); } } else { appendElement($head, createElement('meta', attr, name, 'content', value)); } } appendElement($head, createElement('link', 'rel', 'alternate', 'type', 'application/xml+atom', 'href', meta.feed, 'title', `${meta.title} feed`));
But I assume we do not get the values from the metadata sheet in here... no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev branch for testing can be easily achieved in the client side code consuming the metadata: it needs the path only anyway and use the current host. And it is already the case today: the code cannot assume it gets the correct host anyway if the content is managed by an author.
The only problem that this solves is authors copy/pasting urls from any of the 3 hosts in the metadata of their Word / gdoc documents, as we tell them to do. If they paste a hlx.page url, we should just make sure it does not appear in production. All the rest is anyway out of scope and does to change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I do not think this is a "BREAKING CHANGE", slight change in behaviour, yes, but you have a URL today, you still get a URL after this. Different host, yes but points to same "content" in prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra logic which is currently isolated in "my"
makeAbsoluteURLForMeta
function.
Ok, switching to relative links may be too breaking, but isn't similar behavior alreasy in makeCanonicalHtmlUrl
today? I wouldn really try to reuse existing rewriting logic as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. makeCanonicalHtmlUrl
is doing something else (only appending index.html
where needed).
We have a large collection of methods doing similar but different things. That's why it is hard to plug on top without breaking backward compatibility.
Also rewriteUrl
should be named makeRelative
because this is really what it is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I do not think this is a "BREAKING CHANGE",
yes it is. projects might expect .hlx.page or hlx.live urls, and might be not be able to deal with the production url.
This PR will trigger a minor release when merged. |
Did you see #501 ? It is already the case in production we have hlx.page urls referenced... depending on who write the consuming JS code, this could lead to requests to hlx.page from production. |
We probably a broader discussion on this: are we ok with having |
Fix #501
Will need several eyes on this one because it changes how URLs in metadata are treated. With the change, you should not have a url in the metadata in the form of
https://${ref}--${repo}--${owner}.hlx.page|live
in production. While this is a must-have in production, this will also be the case in preview and live.Any client-side code handling those meta should still be working because they anyway have to deal already with authors pasting any of the 3 host urls in the content. But at least, hlx.page and hlx.live urls will not appear in production.