Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rewrite metadata urls #509

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

feat: rewrite metadata urls #509

wants to merge 2 commits into from

Conversation

kptdobe
Copy link

@kptdobe kptdobe commented Jan 23, 2024

Fix #501

Will need several eyes on this one because it changes how URLs in metadata are treated. With the change, you should not have a url in the metadata in the form of https://${ref}--${repo}--${owner}.hlx.page|live in production. While this is a must-have in production, this will also be the case in preview and live.
Any client-side code handling those meta should still be working because they anyway have to deal already with authors pasting any of the 3 host urls in the content. But at least, hlx.page and hlx.live urls will not appear in production.

Copy link
Contributor

@tripodsan tripodsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct place nor the correct code to use. also I'm not sure if we should link rewrite the metadata....

src/steps/utils.js Outdated Show resolved Hide resolved
Comment on lines +243 to +255
split.forEach((v) => {
try {
const u = new URL(v);
if ((previewSuffix && u.host.endsWith(previewSuffix))
|| (liveSuffix && u.host.endsWith(liveSuffix))) {
rewrite.push(getAbsoluteUrl(state, `${u.pathname}${u.search}${u.hash}`));
} else {
rewrite.push(v);
}
} catch (e) {
rewrite.push(v);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we need to be careful not to change the behaviour of the canonical url until we decided how it should be handled.

Copy link
Author

@kptdobe kptdobe Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code only makes sure no ref--repo--owner.hlx.page|live appears in meta on production, it does not change the canonical computation logic.

delete metadata[name];
} else if (Array.isArray(metadata[name])) {
// make absolute URLs for arrays
metadata[name] = metadata[name].map((value) => makeAbsoluteURLForMeta(state, value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reuse rewriteUrls here. That util is also used in the rewrite-urls step and turns URLs like this into path-only URLs. Or do we really need absolute URLs here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we really need absolute URLs here?

or add a flag makeAbsolute to rewrite-urls()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it would probably be better to apply the rewriting in render, as the logic in metadata is already a bit complex:

for (const [name, value] of Object.entries(meta.page)) {
const attr = name.includes(':') && !name.startsWith('twitter:') ? 'property' : 'name';
if (Array.isArray(value)) {
for (const v of value) {
appendElement($head, createElement('meta', attr, name, 'content', v));
}
} else {
appendElement($head, createElement('meta', attr, name, 'content', value));
}
}
appendElement($head, createElement('link', 'rel', 'alternate', 'type', 'application/xml+atom', 'href', meta.feed, 'title', `${meta.title} feed`));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta are strings only (not anchor with href), I think it must stay absolute urls - you would need a special treatment to consume the meta if it can be a relative or an absolute url (if external link). In fact, making them relative might break all existing logic...

I initially started with rewriteUrls but several things:

  1. it does not make absolute urls but relative - the extra makeAbsolute flag would be required and this adds extra logic which is currently isolated in "my" makeAbsoluteURLForMeta function.
  2. it does not deal with branches - if an author page a url with custombranch--repo--owner.hlx.page, it is not considered - this could be an improvement of the current method
  3. because rewriteUrls is used in different places, I did not want to interfere and take the risk to change some behaviour

I can re-try with rewriteUrls and see the impact.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it would probably be better to apply the rewriting in render, as the logic in metadata is already a bit complex:

for (const [name, value] of Object.entries(meta.page)) {
const attr = name.includes(':') && !name.startsWith('twitter:') ? 'property' : 'name';
if (Array.isArray(value)) {
for (const v of value) {
appendElement($head, createElement('meta', attr, name, 'content', v));
}
} else {
appendElement($head, createElement('meta', attr, name, 'content', value));
}
}
appendElement($head, createElement('link', 'rel', 'alternate', 'type', 'application/xml+atom', 'href', meta.feed, 'title', `${meta.title} feed`));

But I assume we do not get the values from the metadata sheet in here... no ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev branch for testing can be easily achieved in the client side code consuming the metadata: it needs the path only anyway and use the current host. And it is already the case today: the code cannot assume it gets the correct host anyway if the content is managed by an author.
The only problem that this solves is authors copy/pasting urls from any of the 3 hosts in the metadata of their Word / gdoc documents, as we tell them to do. If they paste a hlx.page url, we should just make sure it does not appear in production. All the rest is anyway out of scope and does to change here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I do not think this is a "BREAKING CHANGE", slight change in behaviour, yes, but you have a URL today, you still get a URL after this. Different host, yes but points to same "content" in prod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra logic which is currently isolated in "my" makeAbsoluteURLForMeta function.

Ok, switching to relative links may be too breaking, but isn't similar behavior alreasy in makeCanonicalHtmlUrl today? I wouldn really try to reuse existing rewriting logic as much as possible.

Copy link
Author

@kptdobe kptdobe Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. makeCanonicalHtmlUrl is doing something else (only appending index.html where needed).
We have a large collection of methods doing similar but different things. That's why it is hard to plug on top without breaking backward compatibility.
Also rewriteUrl should be named makeRelative because this is really what it is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I do not think this is a "BREAKING CHANGE",

yes it is. projects might expect .hlx.page or hlx.live urls, and might be not be able to deal with the production url.

Copy link

This PR will trigger a minor release when merged.

@kptdobe
Copy link
Author

kptdobe commented Jan 24, 2024

I don't think this is the correct place nor the correct code to use. also I'm not sure if we should link rewrite the metadata....

Did you see #501 ? It is already the case in production we have hlx.page urls referenced... depending on who write the consuming JS code, this could lead to requests to hlx.page from production.

@kptdobe
Copy link
Author

kptdobe commented Jan 24, 2024

We probably a broader discussion on this: are we ok with having hlx.page text links in our production markup ? On the same topic: #450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some metadata URL not re-written
3 participants