SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

bitspittle · 2023-11-15T23:51:22Z

Specifically, org.xml.sax.SAXParseException; The element type "link" must be terminated by the matching end-tag "</link>".

Expected: kotlinx.html can receive unclosed <link> tags in the <head> element.
Actual: The <link> element, which is a void element and does not have to close in valid html (and in fact is how kotlinx.html generates it) is getting triggered with a parse exception by kotlinx.html

Repro steps

Here's a very simple example to show the issue:

println(createHTML().head {
    link {
        rel = "stylesheet"
        href = "https://example.com/fake.css"
    }
})

which outputs:

<head>
  <link rel="stylesheet" href="https://example.com/fake.css">
</head>

Writing the kotlinx.html code to represent that...

document {
    append {
        head {
            unsafe {
                +"<link rel=\"stylesheet\" href=\"https://example.com/fake.css\">"
            }
        }
    }
}

results in the unexpected stack trace:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 74; The element type "link" must be terminated by the matching end-tag "</link>".
        at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
        at kotlinx.html.dom.HTMLDOMBuilder$UnsafeImpl$1.unaryPlus(dom-jvm.kt:98)
        at ...

The text was updated successfully, but these errors were encountered:

qwertukg · 2024-03-12T05:02:53Z

u should close tag like this: "<link rel=\"stylesheet\" href=\"https://example.com/fake.css\"/>" - / before >

bitspittle · 2024-03-12T16:23:26Z

@qwertukg ah, I think you're missing the point. My example shows that the very html that kotlinx html itself generates, meaning it is valid html, turns into a parse error when you feed it back into itself.

The reason I ran into this is because I had a pipeline where one part generated html from kotlinx and then another part consumed it. For the second part of the pipeline, the output of the first part is opaque, and its consumption automatic. You can't just edit it by hand because there's no human in the process.

I worked around it in an ugly way long ago but this should not be a parse exception.

severn-everett · 2024-05-07T08:17:02Z

The issue is that you're not quite "feeding it back into itself". The document() function that you're using in the second example - along with createHTMLDocument() - is the Java-based builder for creating a full XML structure, so passing in a raw string that contains an unclosed tag is causing the exception in your reproduction code. When building the HTML structure directly, this is not a problem due to the code in this library rectifying the difference between HTML and XML; passing a string in directly bypasses the code and goes straight to the SAX Parser, hence the exception. Given the unsafe() function is primarily for dealing with Javascript and CSS code, this issue might be out of the scope of the library.

bitspittle · 2024-05-07T15:35:17Z

@severn-everett That's fair. I'll go ahead and mark the issue closed. I'm sure the team is busy, but you can of course reopen if the team wants to look into it more.

It's too bad SAX can't be configured to be more flexible about void elements (or if it can, it's probably really tricky to do :) I remember fighting with SAX a decade ago and I'm not advocating the kotlinx html team spend any time fighting SAX because it seems like this isn't a common problem :)

At some point, our codebase switched to using Jsoup to parse the incoming html text, which seems like a clean enough workaround to recommend in case anyone else comes across this thread in the future.

bitspittle closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

bitspittle commented Nov 15, 2023

qwertukg commented Mar 12, 2024 •

edited

bitspittle commented Mar 12, 2024 •

edited

severn-everett commented May 7, 2024

bitspittle commented May 7, 2024

SAXParseException when trying to add an unclosed, raw <link> tag into a head { ... } block #247

SAXParseException when trying to add an unclosed, raw <link> tag into a head { ... } block #247

Comments

bitspittle commented Nov 15, 2023

Repro steps

qwertukg commented Mar 12, 2024 • edited

bitspittle commented Mar 12, 2024 • edited

severn-everett commented May 7, 2024

bitspittle commented May 7, 2024

SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

SAXParseException when trying to add an unclosed, raw `<link>` tag into a `head { ... }` block #247

qwertukg commented Mar 12, 2024 •

edited

bitspittle commented Mar 12, 2024 •

edited