Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAXParseException when trying to add an unclosed, raw <link> tag into a head { ... } block #247

Closed
bitspittle opened this issue Nov 15, 2023 · 4 comments

Comments

@bitspittle
Copy link

Specifically, org.xml.sax.SAXParseException; The element type "link" must be terminated by the matching end-tag "</link>".


Expected: kotlinx.html can receive unclosed <link> tags in the <head> element.
Actual: The <link> element, which is a void element and does not have to close in valid html (and in fact is how kotlinx.html generates it) is getting triggered with a parse exception by kotlinx.html

Repro steps

Here's a very simple example to show the issue:

println(createHTML().head {
    link {
        rel = "stylesheet"
        href = "https://example.com/fake.css"
    }
})

which outputs:

<head>
  <link rel="stylesheet" href="https://example.com/fake.css">
</head>

Writing the kotlinx.html code to represent that...

document {
    append {
        head {
            unsafe {
                +"<link rel=\"stylesheet\" href=\"https://example.com/fake.css\">"
            }
        }
    }
}

results in the unexpected stack trace:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 74; The element type "link" must be terminated by the matching end-tag "</link>".
        at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
        at kotlinx.html.dom.HTMLDOMBuilder$UnsafeImpl$1.unaryPlus(dom-jvm.kt:98)
        at ...
@qwertukg
Copy link

qwertukg commented Mar 12, 2024

u should close tag like this: "<link rel=\"stylesheet\" href=\"https://example.com/fake.css\"/>" - / before >

@bitspittle
Copy link
Author

bitspittle commented Mar 12, 2024

@qwertukg ah, I think you're missing the point. My example shows that the very html that kotlinx html itself generates, meaning it is valid html, turns into a parse error when you feed it back into itself.

The reason I ran into this is because I had a pipeline where one part generated html from kotlinx and then another part consumed it. For the second part of the pipeline, the output of the first part is opaque, and its consumption automatic. You can't just edit it by hand because there's no human in the process.

I worked around it in an ugly way long ago but this should not be a parse exception.

@severn-everett
Copy link
Contributor

The issue is that you're not quite "feeding it back into itself". The document() function that you're using in the second example - along with createHTMLDocument() - is the Java-based builder for creating a full XML structure, so passing in a raw string that contains an unclosed tag is causing the exception in your reproduction code. When building the HTML structure directly, this is not a problem due to the code in this library rectifying the difference between HTML and XML; passing a string in directly bypasses the code and goes straight to the SAX Parser, hence the exception. Given the unsafe() function is primarily for dealing with Javascript and CSS code, this issue might be out of the scope of the library.

@bitspittle
Copy link
Author

@severn-everett That's fair. I'll go ahead and mark the issue closed. I'm sure the team is busy, but you can of course reopen if the team wants to look into it more.

It's too bad SAX can't be configured to be more flexible about void elements (or if it can, it's probably really tricky to do :) I remember fighting with SAX a decade ago and I'm not advocating the kotlinx html team spend any time fighting SAX because it seems like this isn't a common problem :)

At some point, our codebase switched to using Jsoup to parse the incoming html text, which seems like a clean enough workaround to recommend in case anyone else comes across this thread in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants