Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserves em and strong markers in the AST #373

Merged
merged 2 commits into from May 9, 2023
Merged

Conversation

rpaul-stripe
Copy link
Contributor

@rpaul-stripe rpaul-stripe commented May 9, 2023

This PR addresses issue #369. It makes the parser capture the actual symbols used for em and strong nodes and preserve that information in the AST so that it can be used in the formatter to reproduce the content accurately.

  • Parser modifications
    • Modifies the parser to capture the emphasis markup for em and strong nodes and apply it to a node attribute called marker
    • Adds a Jasmine test case to ensure that emphasis nodes get the proper marker attribute
  • Schema modifications
    • Adds a non-rendering marker attribute to the strong and em node schemas
  • Formatter modifications
    • Modifies the formatter so that it renders em and strong nodes using the original emphasis marker from the source content and falls back on * or ** when not available
    • Adds a Jasmine test case to the formatter to ensure that the output matches the source

Closes #369

break;
}
case 'em': {
yield '_';
yield n.attributes.marker ?? '*';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing _ to *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in leaving it as _ (as a default convention)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the issue is that _ is treated like regular text and not an emphasis delimiter when it's used inside of a text node, e.g. foo_bar_baz is rendered literally whereas foo*bar*baz treats the *bar* part as emphasized. Because the underscore behaves differently in this specific case, it's not safe to convert foo*bar*baz to foo_bar_baz—it changes the meaning of the content. So using the asterisk is a safer universal default than the underscore. It's a little unfortunate, I'd prefer the underscore otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, makes sense.

};

export const em: Schema = {
render: 'em',
children: ['strong', 's', 'link', 'code', 'text', 'tag'],
attributes: {
marker: { type: String, render: false },
},
};

export const s: Schema = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add it for Strikethrough (s) too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like strikethrough always uses the exact same marker syntax, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Thanks!

Copy link
Contributor

@mfix-stripe mfix-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@mfix-stripe mfix-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can remove some of the character escape code I wrote now that we have captured the marker. Can you take a look at that briefly @rpaul-stripe (not a blocker for merging)?

@rpaul-stripe rpaul-stripe merged commit 1bf6f46 into main May 9, 2023
2 checks passed
@rpaul-stripe rpaul-stripe deleted the preserve-em-symbol branch May 9, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emphasis written using * without leading and trailing whitespace is incorrectly formatted to use _
2 participants