Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setex heading after table #179

Open
UziTech opened this issue Feb 10, 2020 · 15 comments · May be fixed by #185
Open

setex heading after table #179

UziTech opened this issue Feb 10, 2020 · 15 comments · May be fixed by #185

Comments

@UziTech
Copy link

UziTech commented Feb 10, 2020

A setex heading is a block level element but it does not seem to interrupt a table.

Is this by design since it cannot interrupt a paragraph? or should it be able to interrupt a table without being preceded by a new line?

Discussion: markedjs/marked#1598 (comment)

example

| abc | def |
| --- | --- |
| bar | foo |
| baz | boo |
title
=====

actual

abc def
bar foo
baz boo
title
=====

expected

abc def
bar foo
baz boo

title

@calculuschild
Copy link

I would also like to know the answer to this. We have a couple of cases where this behavior seems to break protocol but we can't be sure if we should work around it or if it's intended.

@UziTech
Copy link
Author

UziTech commented Mar 6, 2020

@github @kivikakk any feedback on this?

@kivikakk
Copy link

kivikakk commented Mar 6, 2020

I no longer work at GitHub, so I can’t help, sorry!

@UziTech
Copy link
Author

UziTech commented Mar 6, 2020

@kivikakk thanks for the response. Do you happen to know who could help with this?

@kivikakk
Copy link

kivikakk commented Mar 6, 2020

@UziTech Unfortunately not :( Your best bet is likely to contact support.

@mity
Copy link

mity commented Mar 7, 2020

I am not cmark-gfm contributor either, I semi-regularly track its development mainly as a maintainer of MD4C, for compatibility reasons.

Yet, as a person who has some experience with Markdown parser implementation, let me to voice a strong doubt whether allowing setext header to interrupt tables is a good idea. Rationale follows in the following paragraphs.

Tables do not necessarily have to look as nice as provided in the 1st post. They may just look as this:

head1 | head2
---|---
value1

(Notice the table body rows do not necessarily have to have a pipe at all in them.)

This renders the same as

| head1  | head2  |
| ------ | ------ |
| value1 |        |

Assuming we allow setext header to interrupt the preceding table, we introduce a new problem that you simply cannot generally tell which of the preceding lines still belong to a table and which should be part of the subsequent header.

Or, from another perspective, consider how CommonMark specification defines a setext header: If the setext underline follows a paragraph, the whole paragraph becomes the header (and the underline itself gets eaten). Because the table extension allows the notation with the pipes stripped, tables cannot reasonably allow paragraphs to interrupt tables.

Changing this would require that wither the tables with pipes stripped behave differently (imho that's a bad idea for the sake of consistency) or that only lines which do have at least a single pipe in them are part of the table. That would make table parsing more complicated and slower and I am not sure whether it would not expose other problems elsewhere.

Last but not least, imho, table is in general something I call informally "a heavy-content block", similar in this to ordinary paragraphs: If we allow them in the text flow without any blank lines, they may be even hard to notice to a human eye in the raw Markdown input unless they are formatted really really nicely. Consider this from section 1.1 of the CommonMark specification:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

Because of this, my gut feeling is that paragraphs (or setext headers which are in syntax just paragraph followed with the underline) should not be allowed to interrupt the tables.

(I would also argue that for the very same reason the same for the opposite case, i.e. that tables shouldn't be allowed to interrupt paragraphs either. Interestingly enough, cmark-gfm currently behaves inconsistently in this scenario, as reported in #180)

EDIT: Additionally, consider also it would add a new very specific rules (exceptions), possibly complicating the implementation further, which would have to resolve crazy cases like e.g. this:

foo | bar
--- | ---
=====

@UziTech
Copy link
Author

UziTech commented Mar 10, 2020

@mity Thanks for the thoughtful response. Just to summarize your point, are you saying because setext headings are paragraphs with an underline and paragraphs take precedence over tables that it would be difficult to determine which is desired by the user in some of those crazy cases?

@UziTech
Copy link
Author

UziTech commented Mar 10, 2020

As a markdown parser maintainer I feel like requiring a pipe in a table row is something that should be done anyways. I know it is not a requirement now, but I feel like it should be given the markdown design goal you stated. It would make parsing tables much easier and less ambiguous.

As a markdown user I would expect something like the code below to be parsed as a table with just a header and a setext heading below the table.

foo | bar
--- | ---
asd
===

If someone wanted the last two lines to be part of the table they could add pipes. It is easier to visually parse and better subscribes to the markdown design goal.

foo | bar
--- | ---
asd |
=== |

@UziTech
Copy link
Author

UziTech commented Mar 10, 2020

or maybe this should be undefined behavior since there should be a blank line between the two blocks.

@mity
Copy link

mity commented Mar 10, 2020

Just to summarize your point, are you saying because setext headings are paragraphs with an underline and paragraphs take precedence over tables that it would be difficult to determine which is desired by the user in some of those crazy cases?

Yes, that's one of my arguments. Consider e.g. this:

A | B
---|---
line1
line2
line3
=====

You cannot reasonably determine which lines line[1-3] are part of the table and which form the header after the table.

@mity
Copy link

mity commented Mar 10, 2020

If someone wanted the last two lines to be part of the table they could add pipes. It is easier to visually parse and better subscribes to the markdown design goal.

I might agree if such a change would not break tons of documents out there. But I guess it would because GFM supported it for the long time without the pipes.

@UziTech
Copy link
Author

UziTech commented Mar 10, 2020

Very good points. I see the ambiguity in the example with three lines. And I agree the change to require pipes would probably not work out well.

I think this is sufficiently resolved to close this issue.

@mity Thank you for your valuable input.

@UziTech UziTech closed this as completed Mar 10, 2020
@calculuschild
Copy link

calculuschild commented Mar 10, 2020

Is this something that should be added to the gfm spec then? I think that is the core issue here: the spec is contradictory or at a minimum ambiguous.

@mity
Copy link

mity commented Mar 10, 2020

Is this something that should be added to the gtm spec then?

Ideally, yes.

I think that is the core issue here: the spec is contradictory or at a minimum ambiguous.

Imho, all the GFM extensions are quite under-documented in the specification. This is just one example. The current reality is that if you want to be reasonably compatible with GFM, you have either study cmark-gfm code and/or simply test how cmark-gfm parses problematic/ambiguous/unspecified corner cases.

@UziTech UziTech reopened this Mar 10, 2020
UziTech added a commit to UziTech/cmark-gfm that referenced this issue Mar 10, 2020
@UziTech UziTech linked a pull request Mar 10, 2020 that will close this issue
@UziTech
Copy link
Author

UziTech commented Mar 10, 2020

I created PR #185 to add an example showing that a setext heading does not break a table.

Let's hope someone at @github is actually watching this repo. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants