Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpret newline as space #165

Open
metasoarous opened this issue Jun 12, 2020 · 5 comments
Open

Interpret newline as space #165

metasoarous opened this issue Jun 12, 2020 · 5 comments

Comments

@metasoarous
Copy link
Contributor

I was a bit surprised to realize that markdown-clj's default interpretation of a newline is as a no-op, versus a space, in contrast with other markdown processing tools.

While I'd lobby for making newlines behave as spaces, for now is there a way to use parser customizations to achieve the desired result?

Thanks!

@metasoarous
Copy link
Contributor Author

metasoarous commented Jun 12, 2020

Hmm... I'm now realizing that in addition, space characters are being trimmed from newlines, and so there doesn't seem to be any way without messing with the parser to add space characters to a md document, let alone compatibility with other parsers.

Would you please consider addressing this?

Thanks again

@yogthos
Copy link
Owner

yogthos commented Jun 13, 2020

Oh yeah, that's one of the limitations of how I wrote the parsing originally where it reads input line by line, and never got around to improving that. I'm definitely open to improving that, but can't promise I'll have the time in the near future. I think the easiest approach would be to handle that here as lines are being read, and to keep reading until a blank line when inside a paragraph. Similar change would be needed for cljs part as well.

If anybody has time to take a look at this, I can help guide the PR and a release.

@gsinclair
Copy link

I came here to report this, but found that it has already been reported, so I thought I'd at least contribute my minimal failing example.

(let [s1 "Random text.\nRandom text.\n"]
  (md-to-html-string s1))
(let [s2 "Random text.\n\n    code block\n\nRandom text.\nRandom text.\n"]
  (md-to-html-string s2))
(let [s3 "Random text.\n\n    code block\n     \nRandom text.\nRandom text.\n"]
  (md-to-html-string s3))

If you execute those three forms in the REPL, you will find that in s1 and s2 all paragraphs parse correctly (i.e. newline converts to space) but in s3 the final paragraph does not parse as one would wish. It appears the superfluous space in the blank line after the code block somehow disrupts the subsequent parsing.

That is, the spaces in the line after the code block have an undesirable effect in the parsing of the subsequent paragraph.

@Oddsor
Copy link
Contributor

Oddsor commented Mar 10, 2022

I took a stab at this issue, in particular the issue mentioned by @gsinclair , with a PR here: #178

Reading line by line certainly causes some challenges here as @yogthos says! For example, indented code blocks in markdown usually have trailing newlines trimmed, but I'm not sure we can ensure that happens with the current parsing strategy.

I'm not exactly an expert in writing parsers, so I don't know what would be a better long term solution to handle these cases 😅

@yogthos
Copy link
Owner

yogthos commented Mar 10, 2022

I think the solution is reasonable with the current state of things. :) The tests pass so I'm going to say that's reasonable enough, and if a new issue gets opened then can add a new test and fix it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants