Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables parsing issue - with <tr> inside <tr> case #200

Open
alecpl opened this issue Feb 7, 2021 · 4 comments
Open

Tables parsing issue - with <tr> inside <tr> case #200

alecpl opened this issue Feb 7, 2021 · 4 comments

Comments

@alecpl
Copy link
Contributor

alecpl commented Feb 7, 2021

When working with code generated by Microsoft Outlook I found a case where DOMDocument based parser has no problem with specific code, but HTML5 parser does.

The minimal test case input is this:

<table id="t1">
  <tr>
    <td>
      <table id="t2">
        <tr>
        <tr>
          <td></td>
        </tr>
        </tr>
      </table>
    </td>
  </tr>
  <tr><td></td></tr>
</table>

Note the <tr> element as a child of another <tr>. This causes HTML5 parser to output:

<table id="t1">
  <tr>
    <td>
      <table id="t2">
        <tr></tr>
        <tr>
          <td></td>
        </tr>
      </table>
    </td>
  </tr>
</table>
<tr><td></td></tr>

Which obviously is invalid and causes the parent table to be "closed" before it should, leaving the next (here: last) tr element outside of the table.

Reference: roundcube/roundcubemail#7356

@goetas
Copy link
Member

goetas commented Feb 7, 2021

Since <tr> is not a valid child for <tr>, what would be the suggested solution here? What browsers do?

@alecpl
Copy link
Contributor Author

alecpl commented Feb 7, 2021

Both Firefox and Chrome convert the t2 table to:

<table id="t2">
    <tbody>
        <tr></tr>
        <tr>
            <td></td>
        </tr>
    </tbody>
</table>

@alecpl
Copy link
Contributor Author

alecpl commented Feb 7, 2021

Sorry, I wasn't clear. The t2 table is the same as in HTML5 output. The difference in the browser is that the outer table is not broken, i.e. the second row is where it should be.

So, the issue here is not the content of the inner table, but that it has impact on the outer table.

@goetas
Copy link
Member

goetas commented Feb 8, 2021

ah, is see, indeed those <tr><td></td></tr> are wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants