Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider unterminated f-strings in FStringRanges #8154

Merged
merged 1 commit into from Oct 27, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Oct 24, 2023

Summary

This PR removes the debug_assertion in the Indexer to allow unterminated f-strings. This is mainly a fix in the development build which now matches the release build.

The fix is simple: remove the debug_assertion which means that the there could be FStringStart and possibly FStringMiddle tokens without a corresponding f-string range in the Indexer. This means that the code requesting for the f-string index need to account for the None case, making the code safer.

This also updates the code which queries the FStringRanges to account for the None case. This will happen when the FStringStart / FStringMiddle tokens are present but the FStringEnd token isn't which means that the Indexer won't contain the range for that f-string.

Test Plan

cargo test

Taking the following code as an example:

f"{123}

This only emits a FStringStart token, but no FStringMiddle or FStringEnd tokens.

And,

f"\.png${

This emits a FStringStart and FStringMiddle token, but no FStringEnd token.

fixes: #8065

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Oct 24, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

Comment on lines 95 to 97
if !self.start_locations.is_empty() {
debug!(
"Unterminated f-strings detected at: {:?}",
self.start_locations
);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't really required because we'd be providing diagnostics for the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonnder if it would be useful to store the range with a missing and position instead of omitting it entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To extend on this: It might even be more correct to simply assume that everything after the fstring start token is inside the fstring if the end token is missing. However, what the correct interpretation is might depend on the specific usage. E.g. adding noqa comments might be a bad idea for unclosed fstring. Rules testing if the range is part of a multiline string could get away by assuming the string goes to the end of the file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of the same in one of my proposed solution.

For rules, they're only being used to check W605 (invalid escape sequence) and ISC (implicit string concatenation) which will be fine. For the former, the fix will be created (see below) while for later the fix won't be created.

-f"\.png{
+rf"\.png{

The NoQA logic will need to be looked into though. We might have to somehow mark these ranges as "incomplete from source code point of view".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also fine with adding the entire range, but just omitting the range of an unterminated string seems wrong to me (I rather overestimate than underestimate the f-string range)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably avoid adding NoQA directives then otherwise it won't be lexed as a comment and the --add-noqa will keep on adding them.

f"\.png
  # noqa: W605
  # noqa: W605
  # ...

I'm not sure how to store this info. One solution could be to update the signature from BTreeMap<TextSize, TextRange> to BTreeMap<TextSize, FStringRange> where:

struct FStringRange {
	range: TextRange,
	// Is this f-string complete i.e., does it have both the start and end tokens?
	complete: bool
}

Or, while computing the NoQA mapping we can ignore these ranges by somehow checking if they're for a complete f-string or not.

@charliermarsh Any thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did change the type to TreeMap<TextSize, FStringRange>, where would we then use that information to avoid adding NOQA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier I thought that ignoring such ranges here would work:

// For nested f-strings, we expect `noqa` directives on the last line of the
// outermost f-string. The last f-string range will be used to skip over
// the inner f-strings.
let mut last_fstring_range: TextRange = TextRange::default();
for fstring_range in indexer.fstring_ranges().values() {
if !locator.contains_line_break(*fstring_range) {
continue;
}
if last_fstring_range.contains_range(*fstring_range) {
continue;
}
let new_range = TextRange::new(
locator.line_start(fstring_range.start()),
fstring_range.end(),
);
fstring_mappings.push(new_range);
last_fstring_range = new_range;
}

But, I don't think so as then, by default, Ruff will add the directive at the end of the line.

I think we can update this function to return a Option<TextSize> instead:

pub(crate) fn resolve(&self, offset: TextSize) -> TextSize {
let index = self.ranges.binary_search_by(|range| {
if range.end() < offset {
std::cmp::Ordering::Less
} else if range.contains(offset) {
std::cmp::Ordering::Equal
} else {
std::cmp::Ordering::Greater
}
});
if let Ok(index) = index {
self.ranges[index].end()
} else {
offset
}
}

which will be used here and continue on None:

let noqa_offset = noqa_line_for.resolve(diagnostic.start());

Note that even if we somehow support this, it'll be impossible to ignore any violation on that line manually. For example,

# Unterminated f-string
f"\.png{x}  # noqa: W605

Here, the noqa comment is part of the f-string and is not a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan you described here makes sense to me. Thanks for walking me through it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a local branch with a possible implementation for this but there are a few complexities involved. I'll proceed with merging this and open a new PR with that change instead.

@dhruvmanila dhruvmanila added the bug Something isn't working label Oct 24, 2023
@github-actions
Copy link

github-actions bot commented Oct 24, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

@konstin
Copy link
Member

konstin commented Oct 24, 2023

Shouldn't unterminated f-strings be caught in the parser?

@dhruvmanila
Copy link
Member Author

Shouldn't unterminated f-strings be caught in the parser?

Yes, they're being caught. The issue which was highlighted in the linked issue is a debug assertion present in constructing the f-string ranges in the indexer. And, the indexer uses the flattened tokens for that purpose which ignores the error tokens.

For the following code snippet,

f"{123}

We would've panicked but only in the debug build which is what this PR fixes.

Actually, the release build could panic as well with:

f"\.png${

And, running with:

$ pipx run "ruff==0.1.1" check --isolated --no-cache --select=W605,E999 ~/playground/ruff/fstring.py --fix
error: Panicked while linting /Users/dhruv/playground/ruff/fstring.py: This indicates a bug in Ruff. If you could open an issue at:

    https://github.com/astral-sh/ruff/issues/new?title=%5BLinter%20panic%5D

...with the relevant file contents, the `pyproject.toml` settings, and the following stack trace, we'd be very appreciative!

panicked at 'called `Option::unwrap()` on a `None` value', crates/ruff_linter/src/rules/pycodestyle/rules/invalid_escape_sequence.rs:175:18
Backtrace:    0: _rust_eh_personality
   1: _main
   2: _rust_eh_personality
   3: _rust_eh_personality
   4: _rust_eh_personality
   5: _rust_eh_personality
   6: __rjem_je_witnesses_cleanup
   7: __rjem_je_witnesses_cleanup
   8: _main
   9: _main
  10: _main
  11: _main
  12: _main
  13: _main
  14: _main
  15: _main
  16: start
  17: start
  18: _main

Comment on lines -113 to -124
(Tok::String { .. }, Tok::FStringStart) => (
*a_range,
indexer.fstring_ranges().innermost(b_range.start()).unwrap(),
),
(Tok::FStringEnd, Tok::String { .. }) => (
indexer.fstring_ranges().innermost(a_range.start()).unwrap(),
*b_range,
),
(Tok::FStringEnd, Tok::FStringStart) => (
indexer.fstring_ranges().innermost(a_range.start()).unwrap(),
indexer.fstring_ranges().innermost(b_range.start()).unwrap(),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unwrap's aren't safe at all as I thought earlier. Ruff can panic on certain edge cases. It's better to account for the None case.

@konstin
Copy link
Member

konstin commented Oct 24, 2023

Yes, they're being caught. The issue which was highlighted in the linked issue is a debug assertion present in constructing the f-string ranges in the indexer. And, the indexer uses the flattened tokens for that purpose which ignores the error tokens.

Sorry, i missed that the index is built during parsing, not after

@charliermarsh
Copy link
Member

I believe the Indexer is built before parsing, but not as part of parsing itself. We need to build it regardless of whether the source code is valid, because we do support linting invalid programs (at least for the token-based rules).

@dhruvmanila dhruvmanila force-pushed the dhruv/unterminated-fstring branch 2 times, most recently from 1e980fd to 29fb86e Compare October 25, 2023 13:43
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we express the fact that unterminated f-strings are not stored in the return type of innermost rather than just a note but are fine moving forward with this solution.

@dhruvmanila dhruvmanila linked an issue Oct 27, 2023 that may be closed by this pull request
@dhruvmanila dhruvmanila reopened this Oct 27, 2023
@dhruvmanila dhruvmanila changed the title Allow unterminated f-strings in the indexer Consider unterminated f-strings in FStringRanges Oct 27, 2023
@dhruvmanila dhruvmanila enabled auto-merge (squash) October 27, 2023 11:08
@dhruvmanila dhruvmanila merged commit 097e703 into main Oct 27, 2023
33 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/unterminated-fstring branch October 27, 2023 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Linter panic] Trouble handling a syntax error Panic with unfinished f-string
4 participants