Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework markdown parser to fix #4613 #4677

Merged
merged 6 commits into from May 7, 2024
Merged

Conversation

codesoap
Copy link
Contributor

@codesoap codesoap commented Feb 26, 2024

Description:

First of all: Sorry for the huge diff, without any previous conversation. I had an idea for a rework, but before suggesting it, I wanted to see if the idea would even work. Before I knew it, I fell into a rabbit hole and now here I am with this PR.

Inspired by #4613, I read through markdown.go and investigated goldmark a little bit (here's a tool to visualize goldmark's ASTs: goldmarkvis). I got an understanding of where spaces should be added, but couldn't find an easy way to incorporate my newfound knowledge into the existing design. I felt like it would be easier, if the AST were rendered with recursive code.

I went ahead and tried it out. With this approach, I was able to fix all the problems mentioned in #4613 (including the "split link"). As a side effect, the code also got a little more compact and slightly more readable (but that's highly subjective, of course).

I have also added a benchmark, which shows a slight decline of performance with the new implementation, but it's not too bad and I would assume that performance would have decreased anyway, no matter how #4613 would have been fixed:

  • Before: BenchmarkMarkdownParsing-4 8527 135023 ns/op 34864 B/op 306 allocs/op
  • After: BenchmarkMarkdownParsing-4 7798 149466 ns/op 35920 B/op 328 allocs/op

Since the new implementation adds leading spaces instead of trailing ones, and also adds new segments for some spaces and the end of paragraphs, I had to adapt the existing tests slightly.

Fixes #4613

Checklist:

  • Tests included.
  • Lint and formatter run with no errors.
  • Tests all pass.

The new version simplifies the code by using recursion instead of
walking the ast while keeping track of state.

It also improves the code by using a type switch.

Fixes fyne-io#4613
@coveralls
Copy link

coveralls commented Feb 26, 2024

Coverage Status

coverage: 65.013% (+0.2%) from 64.783%
when pulling 471f1cf on codesoap:markdown
into 6b7246b on fyne-io:develop.

@andydotxyz
Copy link
Member

Since the new implementation adds leading spaces instead of trailing ones, and also adds new segments for some spaces and the end of paragraphs, I had to adapt the existing tests slightly.

Moving where spaces are inserted seems OK, so the test changes where it is moved from one segment to another seems OK.
However adding trailing spaces to handle the newline doesn't seem right - because these spaces will be inserted into the content and the String() method no longer correct.

Unless I misunderstood what you meant?

@codesoap
Copy link
Contributor Author

Sorry, I didn't explain it enough. String() should not be affected. No superfluous spaces are inserted. Besides the swap from trailing to leading spaces, there are these two new special segments:

  1. goldmark seems to signal a single newline after emph/strong/link with an ast.Text node, which has no content (""). This is translated to a TextSegment with the Text: " ". We need to play along with this quirk of goldmark, because (as far as I can see) this is the only way we can distinguish *foo*\nbar from *foo*bar.
  2. Previously the last segment of a paragraph got modified with text.Style.Inline = false. As far as I can see, this won't work if the last element of a paragraph is, for example, a HyperlinkSegment, because this doesn't have a Style. Thus I now always add a TextSegment with Style: RichTextStyleParagraph and no Text at the end of paragraphs.

An example will probably explain it better:

*foo*
bar

Will now be parsed to the following 4 TextSegments:

  1. A segment with RichTextStyleEmphasis and Text: "foo".
  2. A segment with RichTextStyleInline and Text: " " (result of 1. from above).
  3. A segment with RichTextStyleInline and Text: "bar".
  4. A segment with RichTextStyleParagraph and Text: "" (result of 2. from above).

@andydotxyz
Copy link
Member

I think I follow.
To confirm this could you check in the tests that the newly inserted text segments for line break are indeed Text with "" and not " " just to be sure? Or at least some of them, just so we can't regress this special marker to something else in the future.

@codesoap
Copy link
Contributor Author

I have added an explicit test for the last segment of a paragraph. Is this OK? I can also adapt the existing test functions, if you'd prefer that.

@montovaneli
Copy link

montovaneli commented Apr 5, 2024

Sorry to jump into the discussion, but I'm really interested in this pull request.

I have a case using a list + new paragraph that doesn't seem right.
Testing on https://markdownlivepreview.com/

image

Now, using this pull request and the string s:= "- **List Item 1**: 1234\n\nNew paragraph\n\nAnother new paragraph":

image

Sorry if I misunderstood something, but it seems that the "New paragraph" is not a new paragraph at all.

@andydotxyz
Copy link
Member

Sorry if I misunderstood something, but it seems that the "New paragraph" is not a new paragraph at all.

To know this you should look at the object tree of Segments. I suspect it is an unrelated bug where the gap between a block element and a paragraph are not spaced enough. Try testing before and after this PR patch applied to see if it changes things or not. I think they are unrelated.

assert.True(t, text.Inline())
} else {
t.Error("Segment should be Text")
}
if text, ok := r.Segments[1].(*TextSegment); ok {
assert.Equal(t, "line2", text.Text)
assert.Equal(t, " line2", text.Text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just coming back to this apologies for the delay. Is this a desirable change? Putting a space at the leading of a string seems like it could cause a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously trailing spaces were inserted, now leading spaces are inserted instead. I cannot remember why I changed it, but I guess it made the implementation a little easier. I don't see how either choice would be better than the other. What kind of problem do you foresee?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I foresee is that strings are indexed from the beginning, so if anyone was actually working with the content of a TextSegment then the characters inside them are now off-by-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll try to revert back to trailing spaces when I find some time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now revisited the code and was able to return to trailing spaces instead of leading ones. In the process, the code even became a little more compact and the renderNode function requires one less argument.

I hope I didn't forget anything that I had considered when originally creating this pull request, but the tests cannot find any issues.

Copy link
Member

@andydotxyz andydotxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working this out

@andydotxyz andydotxyz merged commit 54154f9 into fyne-io:develop May 7, 2024
12 checks passed
@codesoap
Copy link
Contributor Author

codesoap commented May 7, 2024

Thanks for accepting this PR! I kinda feared it would be rejected because it changes too much and really appreciate you taking the time to consider it.

@andydotxyz
Copy link
Member

The benefit of a robust unit test suite is that the size of the change isn't usually a problem. It just takes longer.

I always go for the smallest possible fix, but in some cases refactoring is the way forward :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants