Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when parsing <![CDATA[]]> tag which contains <> (angle brackets) #263

Open
navrkald opened this issue Feb 2, 2024 · 0 comments
Open

Comments

@navrkald
Copy link

navrkald commented Feb 2, 2024

How to reproduce the issue:

import { parse } from "node-html-parser";

console.log(
      parse(
        `<ac:structured-macro
          ac:name="code"
          ac:schema-version="1"
          ac:macro-id="some id">
            <ac:parameter ac:name="language">bash</ac:parameter>
            <ac:plain-text-body>
              <![CDATA[
              export AWS_ACCESS_KEY_ID=<your Access key ID> export AWS_SECRET_ACCESS_KEY=<your Secret access key>
              ]]>
            </ac:plain-text-body>
        </ac:structured-macro>
        <p><br/></p>`
      ).toString()
    );

Output of such program is:

     <ac:structured-macro           ac:name="code"
              ac:schema-version="1"
              ac:macro-id="some id">
                <ac:parameter ac:name="language">bash</ac:parameter>
                
                  <![CDATA[
                  export AWS_ACCESS_KEY_ID=<your Access key ID> export AWS_SECRET_ACCESS_KEY=</your>
                  ]]>
                
            
            <p><br></p></ac:structured-macro>

There is problem it have crippled both content of CDATA (</your>) but as well it get confused and crippled rest of the html. It have completely swallowed tag ac:plain-text-body plus it crippled ending tag </ac:structured-macro> which should end immediately after ac:plain-text-body, but was moved to the end of html.

If I remove angle brackets <> from the content of CDATA tag html is parsed and printed correctly.

Expected results:

Is it will not try to anyhow interpret angle brackets inside tag and will parse HTML correctly.

Note:

This is just small part of large html page which get's whole crippled because of this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant