Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and ignore Jupyter automagics #8398

Merged
merged 2 commits into from Nov 3, 2023
Merged

Detect and ignore Jupyter automagics #8398

merged 2 commits into from Nov 3, 2023

Conversation

charliermarsh
Copy link
Member

Summary

LangChain is attempting to use Ruff over their Jupyter notebooks (https://github.com/langchain-ai/langchain/pull/12677/files), but running into a bunch of syntax errors, the majority of which come from our inability to recognize automagic.

If you run this in a cell:

pip install requests

Jupyter will automatically treat that as:

%pip install requests

We need to ignore cells that use these automagics, since the parser doesn't understand them. (I guess we could support it in the parser, but that seems much harder?). The good news is that AFAICT Jupyter doesn't let you mix automagics with code, so by skipping these cells, we don't miss out on analyzing any Python code.

Test Plan

  1. cargo test
  2. Ran over LangChain and verified that there are no more errors relating to pip install automagics.

| "who_ls"
| "whos"
| "xdel"
| "xmode"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously some risk of this getting stale but I kind of doubt these change dramatically over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. There are user defined magics as well but I think that's out of scope 😅

@charliermarsh charliermarsh force-pushed the charlie/magics branch 2 times, most recently from 414bbde to 87af9fa Compare October 31, 2023 23:29
@MichaReiser MichaReiser added cli Related to the command-line interface and removed cli Related to the command-line interface labels Oct 31, 2023
@charliermarsh charliermarsh added the bug Something isn't working label Oct 31, 2023
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, leaving it to @dhruvmanila, the IPython expert to approve.


/// Returns `true` if a cell should be ignored due to the use of cell magics.
fn is_magic_cell(line: &str) -> bool {
let line = line.trim_start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Trim python whitespace only?

// ```
//
// See: https://ipython.readthedocs.io/en/stable/interactive/magics.html
if line.split_whitespace().next().is_some_and(|token| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Split on whitspace only

Copy link

PR Check Results

Ecosystem

✅ ecosystem check detected no linter changes.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable approach. In the future, we could alter the AST to include the magic command name separately from the rest of the command value. That could be beneficial to have dedicated logic for specific magic command.


/// Returns `true` if a cell should be ignored due to the use of cell magics.
fn is_magic_cell(line: &str) -> bool {
let line = line.trim_start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trim_start is required because the magics can be at any indentation level but it seems like the logic is different for auto-magics. For example, the following is invalid:

if True:
	pwd

But, then the following is valid:

	pwd
# ^^ unnecessary indentation

I think it's fine to go forward with this as it seems difficult to detect this without the surrounding context.

Screenshot 2023-11-01 at 08 52 43

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still trim, but only test the first line, rather than all subsequent lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we actually run the risk of some false positives here...

For example, I guess we'd now flag alias + 1 as an automagic, incorrectly.

We may need to try parsing this cell...? And fall back to automagics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still trim, but only test the first line, rather than all subsequent lines?

That might risk in not detecting cases where there's Python code before the magic commands.

For example, I guess we'd now flag alias + 1 as an automagic, incorrectly.

Are you referring alias as a variable name? Yes, that's basically a risk for all escape commands 😅

We may need to try parsing this cell...? And fall back to automagics?

How would this help?

| "who_ls"
| "whos"
| "xdel"
| "xmode"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. There are user defined magics as well but I think that's out of scope 😅

crates/ruff_notebook/src/notebook.rs Show resolved Hide resolved
@konstin
Copy link
Member

konstin commented Nov 2, 2023

image

😵‍💫

@charliermarsh
Copy link
Member Author

@dhruvmanila - Thought on this a bit more, and it seems like a reasonable approach for now. If a user does have a cell that consists of code, but starts with a standalone automagic, then the behavior of that cell will vary based on the order in which the cells are executed. So I think it's ok to assume it's an automagic. But I'm definitely open to refinement.

Copy link

github-actions bot commented Nov 3, 2023

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@charliermarsh charliermarsh merged commit f64c389 into main Nov 3, 2023
16 checks passed
@charliermarsh charliermarsh deleted the charlie/magics branch November 3, 2023 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants