Interpret newline as space #165

metasoarous · 2020-06-12T23:06:53Z

I was a bit surprised to realize that markdown-clj's default interpretation of a newline is as a no-op, versus a space, in contrast with other markdown processing tools.

While I'd lobby for making newlines behave as spaces, for now is there a way to use parser customizations to achieve the desired result?

Thanks!

The text was updated successfully, but these errors were encountered:

metasoarous · 2020-06-12T23:10:35Z

Hmm... I'm now realizing that in addition, space characters are being trimmed from newlines, and so there doesn't seem to be any way without messing with the parser to add space characters to a md document, let alone compatibility with other parsers.

Would you please consider addressing this?

Thanks again

yogthos · 2020-06-13T01:57:04Z

Oh yeah, that's one of the limitations of how I wrote the parsing originally where it reads input line by line, and never got around to improving that. I'm definitely open to improving that, but can't promise I'll have the time in the near future. I think the easiest approach would be to handle that here as lines are being read, and to keep reading until a blank line when inside a paragraph. Similar change would be needed for cljs part as well.

If anybody has time to take a look at this, I can help guide the PR and a release.

gsinclair · 2021-11-21T09:08:05Z

I came here to report this, but found that it has already been reported, so I thought I'd at least contribute my minimal failing example.

(let [s1 "Random text.\nRandom text.\n"]
  (md-to-html-string s1))
(let [s2 "Random text.\n\n    code block\n\nRandom text.\nRandom text.\n"]
  (md-to-html-string s2))
(let [s3 "Random text.\n\n    code block\n     \nRandom text.\nRandom text.\n"]
  (md-to-html-string s3))

If you execute those three forms in the REPL, you will find that in s1 and s2 all paragraphs parse correctly (i.e. newline converts to space) but in s3 the final paragraph does not parse as one would wish. It appears the superfluous space in the blank line after the code block somehow disrupts the subsequent parsing.

That is, the spaces in the line after the code block have an undesirable effect in the parsing of the subsequent paragraph.

Oddsor · 2022-03-10T18:05:10Z

I took a stab at this issue, in particular the issue mentioned by @gsinclair , with a PR here: #178

Reading line by line certainly causes some challenges here as @yogthos says! For example, indented code blocks in markdown usually have trailing newlines trimmed, but I'm not sure we can ensure that happens with the current parsing strategy.

I'm not exactly an expert in writing parsers, so I don't know what would be a better long term solution to handle these cases 😅

yogthos · 2022-03-10T23:09:21Z

I think the solution is reasonable with the current state of things. :) The tests pass so I'm going to say that's reasonable enough, and if a new issue gets opened then can add a new test and fix it then.

yogthos added enhancement help wanted labels Jun 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpret newline as space #165

Interpret newline as space #165

metasoarous commented Jun 12, 2020

metasoarous commented Jun 12, 2020 •

edited

yogthos commented Jun 13, 2020

gsinclair commented Nov 21, 2021

Oddsor commented Mar 10, 2022

yogthos commented Mar 10, 2022

Interpret newline as space #165

Interpret newline as space #165

Comments

metasoarous commented Jun 12, 2020

metasoarous commented Jun 12, 2020 • edited

yogthos commented Jun 13, 2020

gsinclair commented Nov 21, 2021

Oddsor commented Mar 10, 2022

yogthos commented Mar 10, 2022

metasoarous commented Jun 12, 2020 •

edited