Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruff_python_formatter: support reformatting Markdown code blocks #9030

Merged
merged 2 commits into from
Dec 7, 2023

Conversation

BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Dec 6, 2023

(This is not possible to actually use until #8854 is merged.)

This commit slots in support for formatting Markdown fenced code blocks1. With the refactoring done for reStructuredText previously, this ended up being pretty easy to add. Markdown code blocks are also quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled Markdown code fences are Python or not by default. In this PR, we make such an assumption. This follows what rustdoc does. The mitigation here is that if an unlabeled code block isn't Python, then it probably won't parse as Python. And we'll end up skipping it. So in the vast majority of cases, the worst thing that can happen is a little bit of wasted work.

Closes #8860

@BurntSushi BurntSushi added docstring Related to docstring linting or formatting formatter Related to the formatter labels Dec 6, 2023
@BurntSushi BurntSushi added this to the Formatter: Stable milestone Dec 6, 2023
When I renamed `line` to `trimmed`, I forgot one.

(Previously, I had rebound `line`, and I think that made
these sorts of bugs impossible.)

This regression is covered by new tests for Markdown supported
in a subsequent commit.
Copy link
Contributor

github-actions bot commented Dec 6, 2023

ruff-ecosystem results

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

return None;
}
if !original.line.trim_whitespace().is_empty()
&& indentation_length(original.line) < self.opening_fence_indent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this case? The code is under-indented compared to the opening of the code fence? I actually don't know what Markdown does in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a weird case. I believe it is technically valid Markdown. But when I allowed it, it led to an idempotency bug that seemed a little tricky to fix. Basically, it seemed like a bug that was probably okay to ship with, and we can prioritize fixing it based on user feedback.

Good call out though. I added this comment:

        // When a line in a Markdown fenced closed block is indented *less*
        // than the opening indent, we treat the entire block as invalid.
        //
        // I believe that code blocks of this form are actually valid Markdown
        // in some cases, but the interplay between it and our docstring
        // whitespace normalization leads to undesirable outcomes. For example,
        // if the line here is unindented out beyond the initial indent of the
        // docstring itself, then this causes the entire docstring to have
        // its indent normalized. And, at the time of writing, a subsequent
        // formatting run undoes this indentation, thus violating idempotency.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

My only fear is into how many weird edge cases we'll run where we would need a full markdown parser to parse nested structures correctly. But that's something we can figure out later.

I recommend releasing this soon under the formatter Beta where we have some more wiggle room to make breaking changes if any of our assumptions were wrong.

pass


def markdown_rst_directive():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test suite and I like how you documented the cases.

Should we add a test where the code block itself contains closing quotes at a different indentation level:

def test() {
	"""
  You can use this function to render markdown, even conditionally.

	```py
	if 5 == 4:
		a = """
			It supports **code blocks** 
			```py
				print("Hello World")
			```
  	"""

A simplified version that triggered me to make this example is that

def test() {
	"""
  Code block with incorrectly placed closing fences
	
	```python
	
		 ```
	  # The above fences should not close the code block
		a = 10
	```

	Now the code block is closed
	"""

Copy link
Member Author

@BurntSushi BurntSushi Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the first case, the """ ends the docstring and makes the entire thing invalid Python. I tried playing with it a bit to massage it into a more "useful" test, but couldn't get to anything I liked.

In the second case, we do actually allow the first set of closing fences to close the block. Markdown technically allows up to three spaces of indentation before the start of the fences, but I think that's a little tricky to enforce in this context given that we're in a docstring. So I ended up just deciding that Markdown fences could be indented as much as possible. This isn't as much of an issue here, since we specifically do not support (at least, not yet anyway) the sort of Markdown code block that is inserted purely by indentation and not fences. I added this as a test case though.

@BurntSushi
Copy link
Member Author

My only fear is into how many weird edge cases we'll run where we would need a full markdown parser to parse nested structures correctly. But that's something we can figure out later.

Yeah right, this PR only covers naked code fence blocks. It won't work with fenced code blocks inside of quote blocks for example. I do indeed feel like that's something we can add later if there's demand for it. And yeah, that may wind up requiring a full Markdown parser (pulldown-cmark perhaps).

I recommend releasing this soon under the formatter Beta where we have some more wiggle room to make breaking changes if any of our assumptions were wrong.

Yeah the plan is to get this out soon. Basically want to get this merged, get #8855 fixed up and then write a blog post. :-)

This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
@BurntSushi
Copy link
Member Author

Going to bring this in. As usual, if y'all have any more feedback, I'd be happy to address it in follow-up PRs.

@BurntSushi BurntSushi merged commit 04ec11a into main Dec 7, 2023
17 checks passed
@BurntSushi BurntSushi deleted the ag/fmt/markdown branch December 7, 2023 19:30
@ofek
Copy link
Contributor

ofek commented Dec 7, 2023

Is it supported to have additional text after the language marker? For example: https://github.com/pypa/hatch/blob/52a81d80cd03bfb90d99a5837c778971ea0c8c22/src/hatch/env/plugin/interface.py#L23-L41

BurntSushi added a commit that referenced this pull request Dec 7, 2023
@ofek asked [about this][ref]. I did specifically add support for it,
but neglected to add a test. This PR adds a test.

[ref]: #9030 (comment)
@BurntSushi
Copy link
Member Author

Ah yeah good catch. I did indeed specifically add support for that, but neglected to add a test. I added one in #9050.

BurntSushi added a commit that referenced this pull request Dec 8, 2023
@ofek asked [about this][ref]. I did specifically add support for it,
but neglected to add a test. This PR adds a test.

[ref]:
#9030 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docstring Related to docstring linting or formatting formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docstring code formatter: add support for Markdown Python code snippets
4 participants