Document fragments being removed from template element in html. #742

airblast-dev · 2024-03-18T17:07:53Z

I was working on a website using client side templating with handlebars. I noticed that none of my templates were working.

Turns out building the HTML with trunk caused the document fragments in template elements to disappear.
Building with/without minification, using release/debug mode did not change anything, and using verbose flag on the cli didn't give any useful information.

I have tried tracking down why this happens. From my understanding it comes from parse_document from html5ever from the nipper dependency (Its used when creating a Document).

I am relatively new to these kind of stuff so take that information with a grain of salt, and since I am not really trusting in my findings I thought it would be better to create an issue here first in case I am overlooking something on the Trunk side.

Input:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    <template>
        <p> Whatever I write here, gets completely removed. We are just left with a blank document fragment. </p>
    </template>
</body>
</html>

Output:

<!DOCTYPE html><html><head>
</head>
<body>
    <template></template>


</body></html>

Expected result:

<!DOCTYPE html><html><head>
</head>
<body>
    <template> 
        <p> Whatever I write here, gets completely removed. We are just left with a blank document fragment. </p>
    </template>


</body></html>

The text was updated successfully, but these errors were encountered:

ctron · 2024-03-19T08:40:54Z

H, I've never worked with that. My expectation would also be that the content stays the same. It definitely needs some investigation.

airblast-dev · 2024-03-19T13:25:37Z

Ok, so I can confirm the issue is from Document::from at:

trunk/src/pipelines/html.rs

Line 89 in a7f0f47

let mut target_html = Document::from(&raw_html);

Slapping a dbg! before and after does show that the template contents are removed. See the debug statements below:

[src/pipelines/html.rs:90:9] &raw_html = "<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n</head>\n\n<body>\n  <template>\n      Hello world\n  </template>\n</body>\n\n</html>\n"
[src/pipelines/html.rs:92:9] target_html.html().to_string() = "<!DOCTYPE html><html><head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n</head>\n\n<body>\n  <template></template>\n\n\n\n</body></html>"

I took a look at nipper's Document, there isn't a way to do anything to the underlying parser, see docs here. Looking at the implementation it completely functional with Default::defaults's for the arguments of parse_document, so I doubt its possible to fix this while using nippers's Document.

I am not even sure this behaviour is intended from the nipper side of things, even so the maintainer hasnt checked any PR's, or issues since 2021 (to see what I mean just take a look at their issues), so I doubt there's a chance the API will get more flexible for our use case even if we create a PR and put in the work.

From what I can tell the only way this could practically be solveable is through using html5ever directly, instead of calling their wrappers from nipper. This is especially simple in Document's case since it calls a single function with default arguments for the implementation for Document::From for &str.

Looking into html5ever various tags are parsed seperately, in a sense excluded from document. I'm not exactly sure but I from what I can tell "<template>"'s contents are treated as something that is unrelated to the document, and is treated as standalone document fragment.

TLDR;

We will probably have to directly use html5ever and call our own Document struct, with different html5ever::parse_document arguments, and do some manual parsing stuff.

I will attempt to solve it using the solutions above when I have some time. During my research I also found some other problems that could cause issues in the future, I will create an issue for them once I do more research 👍 .

ctron · 2024-03-19T13:52:10Z

Maybe it's worth checking out lol_html for replacing nipper. And maybe a good first step is to isolate all functions using nipper, putting them into separate functions.

ctron · 2024-03-19T13:54:30Z

It might be caused by this: servo/html5ever#464

ctron · 2024-03-20T07:50:21Z

Ok, I was able to fix this in nipper. Now I guess the only thing left doing is getting a PR merged with that change.

In some cases the <template> tags should not be replaced, but kept as they are. This allows Closes: trunk-rs/trunk#742

ctron · 2024-03-20T08:06:08Z

I created a PR (importcjj/nipper#30) … let's see how that goes.

airblast-dev · 2024-03-20T10:04:20Z

I hope Im wrong but I heavily doubt it will be accepted.

I instead made a fork replacing, nipper with lol_html.
Replacing nipper did fix the issue.

Should I create a PR, or will we wait for importcjj/nipper#30?

ctron · 2024-03-20T10:53:25Z

Well if you have a PR which would completely replace nipper would definitely want to check it out. I have the feeling that a lol_html replacement might make things much more complicated, as the model seem quite different. Then again, I also think that nipper isn't the best tool build on today (I saw quite some panics and usafe, and tendril, … ).

Then again, it mostly works an didn't cause issues so far. So if we can get away with a small fix, that wouldn't be bad. Still lol_html might be a good replacement.

airblast-dev · 2024-03-22T10:04:32Z

Fixed by: #743

ctron · 2024-03-22T10:10:36Z

I just checked it locally and wanted to follow up on this.

It looks like the output generated (at least for me) isn't what it should be.

<link> elements end up after the <head> section, and the <script> element it outside the <body>.

ctron · 2024-03-22T10:18:01Z

I think the reason for this would be:

    fn append_html(&mut self, selector: &str, html: &str) -> Result<()> {
        self.select_mut(selector, |el| {
            el.after(html, lol_html::html_content::ContentType::Html);
            Ok(())
        })
    }

It's selecting the element and then inserting "after" it. So appending html to body, results in adding that after body. I think that should "append to children", not sure how that would work.

ctron · 2024-03-22T11:42:56Z

Should be fixed by 357f3c0

airblast-dev · 2024-03-22T14:36:38Z

I think the reason for this would be:
    fn append_html(&mut self, selector: &str, html: &str) -> Result<()> {
        self.select_mut(selector, |el| {
            el.after(html, lol_html::html_content::ContentType::Html);
            Ok(())
        })
    }
It's selecting the element and then inserting "after" it. So appending html to body, results in adding that after body. I think that should "append to children", not sure how that would work.

Yes you are correct. Apologies for missing that.

Should be fixed by 357f3c0

I can confirm it is fixed on my end as well.

ctron · 2024-03-22T15:11:54Z

I'd leave this one open for a bit longer. Should we encounter something. It's a pretty deep change. Then again, it seems to just work. I am still unsure if this should simply go into 0.19.2, or wait for a new major release?!

airblast-dev · 2024-03-23T02:42:40Z

I think a new release that includes lol_html should be on hold for now.

I already found a two weird problems. Both when testing on the vanilla example.

Once trunk gets to the HTML minification the application can randomly hang. Spamming the trunk build --release command eventually makes it run.
Last script tag attribute isn't read correctly. This also feels random, but is consistent with whenever the issue mentioned above happens.

After peppering debug statements around, I found a bit of info on what is happening. For some reason if the 4th ID is read before the 5th, the minification locks up, and the 5th ID is never found (so non of the attributes are replaced in debug mode). I am not exactly sure why this is happening.

It is very late for me, I will look into them in more detail tomorrow and go over the changes that were made.
In the meanwhile it would be great if you could check things out on your end as well just to confirm its not an issue on my end.

airblast-dev · 2024-03-23T12:33:08Z

For some reason the lol_html rewriter is unable to find the last ID attribute. I confirmed it does exist, and is attempted to be removed from the element. The rewriter is just unable to find the element, even though I can see it exists for the element at that stage of the pipeline.

Since the whole thing felt random, I thought it could be related to the ordering of the pipelines execution (at the very least something about async). So I swapped out FuturesUnordered with FuturesOrdered in pipelines/html.rs , and processed the the pipelines in reverse order of insertion. Processing in their insertion order never works, reversing the pipeline (or rather pushing the futures to front) makes it work consistently. It feels like a very weird race condition without any mutability or really anything that can cause a race condition.

The solution mentioned above solves both of the previous problems, but leaves a bigger question of "Why is it happening?". For that I honestly don't have an idea.

airblast-dev · 2024-03-23T12:47:39Z

I can create a PR with the fix. However it does feel kind of janky since we dont know why it happened, and why it is solved. I can also try to collect more information on why it happens but I can't promise I will find anything.

airblast-dev · 2024-03-23T18:05:53Z

I found the problem and made a PR to fix it (it was related to the order the elements were modified, and the self-closing tag causing issues).

Perhaps its a good idea to make a separate issue to document the differences between nipper and lol_html.

For example:
When a script tag is self closed (such as <script src="..." />), lol_html processed the closed element, but will ignore the following content after it, expecting a closing tag. The HTML spec does require a closing tag for script tags, so it is likely this was intentional.

This means people using self-closing tags with script elements, will have to replace them with the spec compliant version. This would be turning <script src="src.js"/> into <script src="src.js"></script>. The self closing version is also used on the trunk.rs site on the script asset examples.

ctron · 2024-03-26T16:41:18Z

Thanks for taking a look at this. So yes, maybe it makes sense starting a new 0.20.x release cycle with a few alpha/beta/rc versions for this.

ctron · 2024-03-26T16:51:47Z

Just to be clear, I think having a "breaking change" of requiring a spec compliant HTML page is fine. But we should put it into a breaking change release (0.20.x) and add a nice little warning.

ctron · 2024-04-12T16:18:16Z

I think that's closed, but not released yet.

airblast-dev · 2024-04-12T17:59:28Z

I apologize as I wasn't able to take a look into this.

Though one thing I wanted to ask is would you prefer an error or just a warning?

Detecting the error is quite simple (a single conditional), but in case we want to provide extra information things would get quite messy. I think giving a warning would make things much simpler, as the user can just check the output to see where the truncation occurs to find the script tag that is causing the issue.

ctron · 2024-04-13T10:40:31Z

Though one thing I wanted to ask is would you prefer an error or just a warning?

Personally I would prefer an error, as it might break things. And I am sure people will ignore warnings. However, I could also understand that having an --ignore-script-error (or a better name) option might make sense in case of a false positive, or a case we did not think about.

ctron added help wanted Extra attention is needed investigate More investigation is needed labels Mar 19, 2024

ctron added the bug Something isn't working label Mar 19, 2024

ctron added a commit to ctron/nipper that referenced this issue Mar 20, 2024

feat: allow keeping templates as they are

aed7ca4

In some cases the <template> tags should not be replaced, but kept as they are. This allows Closes: trunk-rs/trunk#742

ctron mentioned this issue Mar 20, 2024

feat: allow keeping templates as they are importcjj/nipper#30

Open

airblast-dev mentioned this issue Mar 20, 2024

refactor: Replace the nipper dependency with lol_html. #743

Merged

airblast-dev closed this as completed Mar 22, 2024

ctron reopened this Mar 22, 2024

airblast-dev mentioned this issue Mar 23, 2024

fix: make manually inserted script tag, spec compliant. #749

Merged

ctron closed this as completed Apr 12, 2024

ctron added this to the 0.20.0 milestone Apr 12, 2024

airblast-dev mentioned this issue Apr 12, 2024

feat: raise error for self closing script tags. #772

Merged

airblast-dev mentioned this issue Apr 18, 2024

chore: clean up a few things around the script element check #776

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document fragments being removed from template element in html. #742

Document fragments being removed from template element in html. #742

airblast-dev commented Mar 18, 2024 •

edited

ctron commented Mar 19, 2024

airblast-dev commented Mar 19, 2024 •

edited

ctron commented Mar 19, 2024

ctron commented Mar 19, 2024

ctron commented Mar 20, 2024

ctron commented Mar 20, 2024

airblast-dev commented Mar 20, 2024

ctron commented Mar 20, 2024

airblast-dev commented Mar 22, 2024

ctron commented Mar 22, 2024

ctron commented Mar 22, 2024

ctron commented Mar 22, 2024

airblast-dev commented Mar 22, 2024 •

edited

ctron commented Mar 22, 2024

airblast-dev commented Mar 23, 2024 •

edited

airblast-dev commented Mar 23, 2024 •

edited

airblast-dev commented Mar 23, 2024

airblast-dev commented Mar 23, 2024

ctron commented Mar 26, 2024

ctron commented Mar 26, 2024

ctron commented Apr 12, 2024

airblast-dev commented Apr 12, 2024 •

edited

ctron commented Apr 13, 2024

Document fragments being removed from template element in html. #742

Document fragments being removed from template element in html. #742

Comments

airblast-dev commented Mar 18, 2024 • edited

ctron commented Mar 19, 2024

airblast-dev commented Mar 19, 2024 • edited

TLDR;

ctron commented Mar 19, 2024

ctron commented Mar 19, 2024

ctron commented Mar 20, 2024

ctron commented Mar 20, 2024

airblast-dev commented Mar 20, 2024

ctron commented Mar 20, 2024

airblast-dev commented Mar 22, 2024

ctron commented Mar 22, 2024

ctron commented Mar 22, 2024

ctron commented Mar 22, 2024

airblast-dev commented Mar 22, 2024 • edited

ctron commented Mar 22, 2024

airblast-dev commented Mar 23, 2024 • edited

airblast-dev commented Mar 23, 2024 • edited

airblast-dev commented Mar 23, 2024

airblast-dev commented Mar 23, 2024

ctron commented Mar 26, 2024

ctron commented Mar 26, 2024

ctron commented Apr 12, 2024

airblast-dev commented Apr 12, 2024 • edited

ctron commented Apr 13, 2024

airblast-dev commented Mar 18, 2024 •

edited

airblast-dev commented Mar 19, 2024 •

edited

airblast-dev commented Mar 22, 2024 •

edited

airblast-dev commented Mar 23, 2024 •

edited

airblast-dev commented Mar 23, 2024 •

edited

airblast-dev commented Apr 12, 2024 •

edited