Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document fragments being removed from template element in html. #742

Closed
airblast-dev opened this issue Mar 18, 2024 · 23 comments · May be fixed by importcjj/nipper#30
Closed

Document fragments being removed from template element in html. #742

airblast-dev opened this issue Mar 18, 2024 · 23 comments · May be fixed by importcjj/nipper#30
Labels
bug Something isn't working help wanted Extra attention is needed investigate More investigation is needed
Milestone

Comments

@airblast-dev
Copy link
Contributor

airblast-dev commented Mar 18, 2024

I was working on a website using client side templating with handlebars. I noticed that none of my templates were working.

Turns out building the HTML with trunk caused the document fragments in template elements to disappear.
Building with/without minification, using release/debug mode did not change anything, and using verbose flag on the cli didn't give any useful information.

I have tried tracking down why this happens. From my understanding it comes from parse_document from html5ever from the nipper dependency (Its used when creating a Document).

I am relatively new to these kind of stuff so take that information with a grain of salt, and since I am not really trusting in my findings I thought it would be better to create an issue here first in case I am overlooking something on the Trunk side.

Input:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    <template>
        <p> Whatever I write here, gets completely removed. We are just left with a blank document fragment. </p>
    </template>
</body>
</html>

Output:

<!DOCTYPE html><html><head>
</head>
<body>
    <template></template>


</body></html>

Expected result:

<!DOCTYPE html><html><head>
</head>
<body>
    <template> 
        <p> Whatever I write here, gets completely removed. We are just left with a blank document fragment. </p>
    </template>


</body></html>
@ctron
Copy link
Collaborator

ctron commented Mar 19, 2024

H, I've never worked with that. My expectation would also be that the content stays the same. It definitely needs some investigation.

@ctron ctron added help wanted Extra attention is needed investigate More investigation is needed labels Mar 19, 2024
@airblast-dev
Copy link
Contributor Author

airblast-dev commented Mar 19, 2024

Ok, so I can confirm the issue is from Document::from at:

let mut target_html = Document::from(&raw_html);

Slapping a dbg! before and after does show that the template contents are removed. See the debug statements below:

[src/pipelines/html.rs:90:9] &raw_html = "<!DOCTYPE html>\n<html>\n\n<head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n</head>\n\n<body>\n  <template>\n      Hello world\n  </template>\n</body>\n\n</html>\n"
[src/pipelines/html.rs:92:9] target_html.html().to_string() = "<!DOCTYPE html><html><head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n</head>\n\n<body>\n  <template></template>\n\n\n\n</body></html>"

I took a look at nipper's Document, there isn't a way to do anything to the underlying parser, see docs here. Looking at the implementation it completely functional with Default::defaults's for the arguments of parse_document, so I doubt its possible to fix this while using nippers's Document.

I am not even sure this behaviour is intended from the nipper side of things, even so the maintainer hasnt checked any PR's, or issues since 2021 (to see what I mean just take a look at their issues), so I doubt there's a chance the API will get more flexible for our use case even if we create a PR and put in the work.

From what I can tell the only way this could practically be solveable is through using html5ever directly, instead of calling their wrappers from nipper. This is especially simple in Document's case since it calls a single function with default arguments for the implementation for Document::From for &str.

Looking into html5ever various tags are parsed seperately, in a sense excluded from document. I'm not exactly sure but I from what I can tell "<template>"'s contents are treated as something that is unrelated to the document, and is treated as standalone document fragment.

TLDR;

We will probably have to directly use html5ever and call our own Document struct, with different html5ever::parse_document arguments, and do some manual parsing stuff.

I will attempt to solve it using the solutions above when I have some time. During my research I also found some other problems that could cause issues in the future, I will create an issue for them once I do more research 👍 .

@ctron
Copy link
Collaborator

ctron commented Mar 19, 2024

Maybe it's worth checking out lol_html for replacing nipper. And maybe a good first step is to isolate all functions using nipper, putting them into separate functions.

@ctron ctron added the bug Something isn't working label Mar 19, 2024
@ctron
Copy link
Collaborator

ctron commented Mar 19, 2024

It might be caused by this: servo/html5ever#464

@ctron
Copy link
Collaborator

ctron commented Mar 20, 2024

Ok, I was able to fix this in nipper. Now I guess the only thing left doing is getting a PR merged with that change.

ctron added a commit to ctron/nipper that referenced this issue Mar 20, 2024
In some cases the <template> tags should not be replaced, but kept as
they are. This allows

Closes: trunk-rs/trunk#742
@ctron
Copy link
Collaborator

ctron commented Mar 20, 2024

I created a PR (importcjj/nipper#30) … let's see how that goes.

@airblast-dev
Copy link
Contributor Author

I hope Im wrong but I heavily doubt it will be accepted.

I instead made a fork replacing, nipper with lol_html.
Replacing nipper did fix the issue.

Should I create a PR, or will we wait for importcjj/nipper#30?

@ctron
Copy link
Collaborator

ctron commented Mar 20, 2024

Well if you have a PR which would completely replace nipper would definitely want to check it out. I have the feeling that a lol_html replacement might make things much more complicated, as the model seem quite different. Then again, I also think that nipper isn't the best tool build on today (I saw quite some panics and usafe, and tendril, … ).

Then again, it mostly works an didn't cause issues so far. So if we can get away with a small fix, that wouldn't be bad. Still lol_html might be a good replacement.

@airblast-dev
Copy link
Contributor Author

Fixed by: #743

@ctron ctron reopened this Mar 22, 2024
@ctron
Copy link
Collaborator

ctron commented Mar 22, 2024

I just checked it locally and wanted to follow up on this.

It looks like the output generated (at least for me) isn't what it should be.

<link> elements end up after the <head> section, and the <script> element it outside the <body>.

@ctron
Copy link
Collaborator

ctron commented Mar 22, 2024

I think the reason for this would be:

    fn append_html(&mut self, selector: &str, html: &str) -> Result<()> {
        self.select_mut(selector, |el| {
            el.after(html, lol_html::html_content::ContentType::Html);
            Ok(())
        })
    }

It's selecting the element and then inserting "after" it. So appending html to body, results in adding that after body. I think that should "append to children", not sure how that would work.

@ctron
Copy link
Collaborator

ctron commented Mar 22, 2024

Should be fixed by 357f3c0

@airblast-dev
Copy link
Contributor Author

airblast-dev commented Mar 22, 2024

I think the reason for this would be:

    fn append_html(&mut self, selector: &str, html: &str) -> Result<()> {
        self.select_mut(selector, |el| {
            el.after(html, lol_html::html_content::ContentType::Html);
            Ok(())
        })
    }

It's selecting the element and then inserting "after" it. So appending html to body, results in adding that after body. I think that should "append to children", not sure how that would work.

Yes you are correct. Apologies for missing that.

Should be fixed by 357f3c0

I can confirm it is fixed on my end as well.

@ctron
Copy link
Collaborator

ctron commented Mar 22, 2024

I'd leave this one open for a bit longer. Should we encounter something. It's a pretty deep change. Then again, it seems to just work. I am still unsure if this should simply go into 0.19.2, or wait for a new major release?!

@airblast-dev
Copy link
Contributor Author

airblast-dev commented Mar 23, 2024

I think a new release that includes lol_html should be on hold for now.

I already found a two weird problems. Both when testing on the vanilla example.

  • Once trunk gets to the HTML minification the application can randomly hang. Spamming the trunk build --release command eventually makes it run.
  • Last script tag attribute isn't read correctly. This also feels random, but is consistent with whenever the issue mentioned above happens.

After peppering debug statements around, I found a bit of info on what is happening. For some reason if the 4th ID is read before the 5th, the minification locks up, and the 5th ID is never found (so non of the attributes are replaced in debug mode). I am not exactly sure why this is happening.

It is very late for me, I will look into them in more detail tomorrow and go over the changes that were made.
In the meanwhile it would be great if you could check things out on your end as well just to confirm its not an issue on my end.

@airblast-dev
Copy link
Contributor Author

airblast-dev commented Mar 23, 2024

For some reason the lol_html rewriter is unable to find the last ID attribute. I confirmed it does exist, and is attempted to be removed from the element. The rewriter is just unable to find the element, even though I can see it exists for the element at that stage of the pipeline.

Since the whole thing felt random, I thought it could be related to the ordering of the pipelines execution (at the very least something about async). So I swapped out FuturesUnordered with FuturesOrdered in pipelines/html.rs , and processed the the pipelines in reverse order of insertion. Processing in their insertion order never works, reversing the pipeline (or rather pushing the futures to front) makes it work consistently. It feels like a very weird race condition without any mutability or really anything that can cause a race condition.

The solution mentioned above solves both of the previous problems, but leaves a bigger question of "Why is it happening?". For that I honestly don't have an idea.

@airblast-dev
Copy link
Contributor Author

I can create a PR with the fix. However it does feel kind of janky since we dont know why it happened, and why it is solved. I can also try to collect more information on why it happens but I can't promise I will find anything.

@airblast-dev
Copy link
Contributor Author

I found the problem and made a PR to fix it (it was related to the order the elements were modified, and the self-closing tag causing issues).

Perhaps its a good idea to make a separate issue to document the differences between nipper and lol_html.

For example:
When a script tag is self closed (such as <script src="..." />), lol_html processed the closed element, but will ignore the following content after it, expecting a closing tag. The HTML spec does require a closing tag for script tags, so it is likely this was intentional.

This means people using self-closing tags with script elements, will have to replace them with the spec compliant version. This would be turning <script src="src.js"/> into <script src="src.js"></script>. The self closing version is also used on the trunk.rs site on the script asset examples.

@ctron
Copy link
Collaborator

ctron commented Mar 26, 2024

Thanks for taking a look at this. So yes, maybe it makes sense starting a new 0.20.x release cycle with a few alpha/beta/rc versions for this.

@ctron
Copy link
Collaborator

ctron commented Mar 26, 2024

Just to be clear, I think having a "breaking change" of requiring a spec compliant HTML page is fine. But we should put it into a breaking change release (0.20.x) and add a nice little warning.

@ctron
Copy link
Collaborator

ctron commented Apr 12, 2024

I think that's closed, but not released yet.

@ctron ctron closed this as completed Apr 12, 2024
@ctron ctron added this to the 0.20.0 milestone Apr 12, 2024
@airblast-dev
Copy link
Contributor Author

airblast-dev commented Apr 12, 2024

I apologize as I wasn't able to take a look into this.

Though one thing I wanted to ask is would you prefer an error or just a warning?

Detecting the error is quite simple (a single conditional), but in case we want to provide extra information things would get quite messy. I think giving a warning would make things much simpler, as the user can just check the output to see where the truncation occurs to find the script tag that is causing the issue.

@ctron
Copy link
Collaborator

ctron commented Apr 13, 2024

Though one thing I wanted to ask is would you prefer an error or just a warning?

Personally I would prefer an error, as it might break things. And I am sure people will ignore warnings. However, I could also understand that having an --ignore-script-error (or a better name) option might make sense in case of a false positive, or a case we did not think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed investigate More investigation is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants