Support for semantic linefeeds? #17

jlevy · 2015-07-13T06:40:23Z

Thanks for the useful tool. In addition to the obvious benefits of consistency, I think it has the potential to help reduce merge and diff friction in Github when many people edit Markdown.

Have you considered support for Semantic linefeeds? That is, if a flag is enabled, use heuristics to split based on punctuation (probably period comma, and a few less common ones, at least in English).

I know it's a slightly unusual practice, and frowned on in some places (like Wikipedia) but it offers some useful benefits in the context of Markdown in Git, so thought I'd mention it here.

See jlevy/the-art-of-command-line#167 for some discussion. Having a tool for this convention (and perhaps some tunable variations, to allow experimentation on what conventions are best) might be helpful for the heavily-edited documents (like "awesome lists", https://github.com/jlevy/the-art-of-command-line, etc. ) that are becoming increasingly common on Github.

dmitshur · 2016-03-16T08:24:38Z

Thanks for the suggestion. As the label implies, this is something I'm thinking about, but I currently don't have actionable plans that are worth executing.

There are some benefits to semantic linefeeds, but I also like the current model that doesn't insert newlines at all, and expects your text editor/viewer to render the text with word wrap on. That way, as you resize your text editor/viewer, all text reflows and there's no need to manually edit newline positions.

One extra factor is that this will require a flag, and I prefer to avoid having configuration.

I just wanted to reply and give you some more insight on my thoughts on this.

jlevy · 2016-07-12T20:33:39Z

Yes, definitely there are pros and cons to both approaches. In general I agree with flowing text for text documents, but when you have GitHub workflow on Markdown, you begin thinking of Markdown as more like source code (with clear semantics and clean merges of commits) rather than flowing the way it will when it's later formatted. Editors often give previews anyway.

Also get that you want to avoid myriad configuration. That said, this whole discussion is one of those perennial problems and it may take some experimentation to find good solutions. It took decades to have large numbers of developers implement the "gofmt"-style non-negotiable formatting idea. I think having config settings "discouraged but possible" is one way to allow experimental features (e.g. don't add lots of flags, and require a special config file or something like that).

Anyway, thanks for the response! I'll update if I find a better solution to this problem.

dmitshur · 2016-07-17T16:21:59Z

That said, this whole discussion is one of those perennial problems and it may take some experimentation to find good solutions.

I'm in full agreement there.

I do think that it's best for the person most interested in a certain experiment to run it themselves, to maximize the chance of it working out well. I'd be very glad to see you fork this project, for example, experiment this, and we can later decide to merge the efforts if it makes sense.

I still don't see a viable way of making semantic newlines work well in the context of markdown files. My main problem is that, when used, it makes editing text anywhere other than "end" more difficult. Imagine you delete or add some text to the first line, it becomes shorter/longer, and all following lines need to be reflowed. Perhaps markdownfmt itself can do that for you, but wouldn't that still cause each line to have a large diff, defeating the purpose?

Anyway, thanks for the response! I'll update if I find a better solution to this problem.

No problem, and please do. I'm also happy to keep discussing it here; the "thinking" label allows me not worry about wanting to close this issue asap. :)

dmitshur · 2016-07-17T17:16:00Z

Also, FWIW, here's what a diff when editing a single paragraph can look like. IMO, it can be quite readable, since individual words that are changed are highlighted too:

jlevy · 2016-07-18T02:21:13Z

Sure, thanks — to continue the discussion: I get that you can see the difference sub-line with coloring (and git --color-words supports this too). That's not the real pain point. Rather it is merge conflicts when many people edit one doc. If two people change the same line, the merge is then a conflict under standard merging rules. E.g., when paragraphs are lines 5 sentences long, then non-trivial merges are 5 times more likely than if the sentences were split. (If we all wrote code with 300-char widths and 5 statements to a line, we'd have the same problem there, too.)

In your example, it sounds like you're thinking of regular word wrapping, e.g. on a column width, and yes, there reflows break everything too. What I was suggesting was a variant on the semantic breaks, where you break on something "stable" like sentence-ending periods and comma phrases more than a certain length. The rules can even be a little complex, as long as it's something deterministic and doesn't make the source ugly. Then, say, modifying one word, would only have "local" effect on a (much shorter) line, so conflicts would be less common.

Yes, perhaps I'll experiment with it, too (but I'll have to find time to pick up enough Go I resist the temptation to redo it in Python 😉 ).

jlevy · 2018-06-13T00:06:50Z

For what it's worth, I've finally revisited this idea, and wrote a new plugin for Atom that handles this need. I think the semi-semantic wrapping approach is preferable to #36's fixed line-length wrapping. It's new and I'll be experimenting with it more, so any feedback (and bugs) welcome!

runlow · 2021-03-15T06:57:17Z

@jlevy
using git diff --word-diff may be a better solution

If you change a word in a paragraph and rewrap (hard wrap, gwap in vim in normal mode) that paragraph - with this option the command will show only that one word changing. (The default diff output would show several lines.)

As for hard-wrapping itself - it's probably better for reading raw text files in my opinion, although not sure yet I can explain why. HTML and markdown usually ignore newlines when rendered either way (except for pre/code blocks).

jlevy · 2021-10-15T18:35:52Z

Just an update: As a practical matter, we've been using this approach in flowmark for some time now in the process of publishing about a dozen books, and overall it's worked well.

git diff --word-diff is a nice idea for cases when you can't control the format. But note GitHub's UI and git merges don't work at word level—they operate at line level. So having content in a form that is readable, normalized, and merges cleanly has been helpful for revising complex docs and editorial workflows.

dmitshur added the thinking label Jul 13, 2015

dmitshur mentioned this issue Apr 4, 2017

Line wrap at N(=80) characters #36

Open

jlevy mentioned this issue Apr 1, 2019

Consider phrase breaks? jlevy/atom-flowmark#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for semantic linefeeds? #17

Support for semantic linefeeds? #17

jlevy commented Jul 13, 2015

dmitshur commented Mar 16, 2016

jlevy commented Jul 12, 2016 •

edited

dmitshur commented Jul 17, 2016

dmitshur commented Jul 17, 2016

jlevy commented Jul 18, 2016 •

edited

jlevy commented Jun 13, 2018 •

edited

runlow commented Mar 15, 2021 •

edited

jlevy commented Oct 15, 2021

Support for semantic linefeeds? #17

Support for semantic linefeeds? #17

Comments

jlevy commented Jul 13, 2015

dmitshur commented Mar 16, 2016

jlevy commented Jul 12, 2016 • edited

dmitshur commented Jul 17, 2016

dmitshur commented Jul 17, 2016

jlevy commented Jul 18, 2016 • edited

jlevy commented Jun 13, 2018 • edited

runlow commented Mar 15, 2021 • edited

jlevy commented Oct 15, 2021

jlevy commented Jul 12, 2016 •

edited

jlevy commented Jul 18, 2016 •

edited

jlevy commented Jun 13, 2018 •

edited

runlow commented Mar 15, 2021 •

edited