Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling nosec for multi-line strings #915

Merged
merged 1 commit into from Feb 27, 2023

Conversation

kfrydel
Copy link
Contributor

@kfrydel kfrydel commented Jun 28, 2022

This commit improves handling nosecs
in multi-line strings, like:

1. nosec_not_working = f"""
2.     SELECT * FROM {table}
3. """  # nosec

Before this change, bandit was checking if there is
a nosec in line 1. Now, in the case of ast Constants
it searches for nosec in the last line of the expression.

This commit also moves detecting nosec without test number
to test phase from "pre-visit" phase.
It may increase the time of performing checks but avoids
counting the same nosec mark multiple times.
"pre-visit" phase is run separately for each part of multi-line
string split by FormattedValue items. Thus for the above example,
it would be run twice, the first time for "\n SELECT * FROM "
and the second time for "\n" making the nosec being counted twice.

Resolves: #880

Copy link
Member

@ericwb ericwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think using the last line of linerange will fail for Python 3.7. Currently Bandit incorrectly reports the last line number for Py37. See issue #820

@@ -6,6 +6,7 @@
import logging
import os.path
import sys
from _ast import Constant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use ast.Constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it should be ast. Fixed.

@kfrydel kfrydel force-pushed the issue_880 branch 2 times, most recently from 06b7c6a to 80b337d Compare July 8, 2022 13:34
@kfrydel
Copy link
Contributor Author

kfrydel commented Jul 8, 2022

So I think using the last line of linerange will fail for Python 3.7. Currently Bandit incorrectly reports the last line number for Py37. See issue #820

Thank you for taking a look at my changes.
I'm not sure if I understand. Do you mean that my changes are not going to work in Python 3.7?
get_nosec function returns context['lineno'] for strings in Python 3.7. Why? Because in the case of Python 3.7 strings are parsed to ast.Str. In the case of Python 3.10 they are parsed to ast.Constant. It looks like ast.Constant is used from 3.8 according to https://docs.python.org/3/library/ast.html:

Changed in version 3.8: Class ast.Constant is now used for all constants.

I did the following test:
file: string_examples.py

a = "string 1"
b = f"f-string {a}"
c = """multi-line
string
1"""
d = f"""multi-line
f-string
{a}"""

file: ast_example.py

import ast

with open('string_examples.py', 'r') as f:
    ast_tree = ast.parse(f.read())

    for node in ast_tree.body:
        print(ast.dump(node))

And the output is:

  1. python 3.7:
Assign(targets=[Name(id='a', ctx=Store())], value=Str(s='string 1'))
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Str(s='f-string '), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]))
Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='multi-line\nstring\n1'))
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Str(s='multi-line\nf-string\n'), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]))
  1. python 3.8:
Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value='string 1', kind=None), type_comment=None)
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Constant(value='f-string ', kind=None), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]), type_comment=None)
Assign(targets=[Name(id='c', ctx=Store())], value=Constant(value='multi-line\nstring\n1', kind=None), type_comment=None)
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Constant(value='multi-line\nf-string\n', kind=None), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]), type_comment=None)
  1. python 3.10:
Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value='string 1'))
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Constant(value='f-string '), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1)]))
Assign(targets=[Name(id='c', ctx=Store())], value=Constant(value='multi-line\nstring\n1'))
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Constant(value='multi-line\nf-string\n'), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1)]))

@frenzymadness
Copy link

Thank you for working on this. This also fixes #658 reported in December 2020.

@kfrydel
Copy link
Contributor Author

kfrydel commented Aug 23, 2022

Hi @ericwb , would you find a while to take a look at this pull request? If you think it should be fixed in another way, do not hesitate to say so. I'm willing to spend some more time on this if needed.

@sshishov
Copy link

Hello guys, can we expect this PR to be merged some time soon?

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include a test for implicit concatenation over multiple lines and the behavior for those cases? I'm not sure how python will report the lineno on 3.7 or the range on 3.8+. Also, uncertain where on those we should expect to see a nosec

This commit improves handling nosecs
in multi-line strings, like:

1. nosec_not_working = f"""
2.     SELECT * FROM {table}
3. """  # nosec

Before this change, bandit was checking if there is
a nosec in line 1. Now, it searches for nosec in all
lines of the expression.

In python 3.7, linerange for a multiline expression is sqeezed to
first line. Thus, if nosec is set  in the second or further line
then it is not taken into account by bandit.

This commit also moves detecting nosec without test number
to test phase from "pre-visit" phase.
It may increase the time of performing checks but avoids
counting the same nosec mark multiple times.
"pre-visit" phase is run separately for each part of multi-line
string split by FormattedValue items. Thus for the above example,
it would be run twice, the first time for "\n    SELECT * FROM "
and the second time for "\n" making the nosec being counted twice.

Resolves: PyCQA#880
@kfrydel
Copy link
Contributor Author

kfrydel commented Feb 27, 2023

I added tests for implicit concatenation cases and it turned out that my solution didn't work. I changed the approach and now I search for nosec in every line reported in linerange property. I also abandoned using lineno property and always use linerange so the code became a bit simpler.
nosec can be now placed at any line of "implicit concatenation" expression.
Except python 3.7. Indeed, as @ericwb said, we do not have the last line of the analyzed expression thus in the case of python 3.7 nosec has to be at the first line of the "implicit concatenation" expression. And to be honest, I have no idea on how to fix this.

In the case of a multiline string, as shown below, nosec is detected when it is placed at the last line:

f"""
SELECT *
FROM foo
WHERE id = {identifier}
"""  # nosec

independently from the python version.

@kfrydel kfrydel requested review from sigmavirus24 and removed request for ericwb February 27, 2023 12:10
@sigmavirus24 sigmavirus24 merged commit 72fa5a7 into PyCQA:main Feb 27, 2023
@kfrydel
Copy link
Contributor Author

kfrydel commented Feb 27, 2023

@sigmavirus24 Thank you!

@JRemitz
Copy link

JRemitz commented Mar 10, 2023

Not sure but this change appears to have broken our Python 3.7 runs. We had tests getting skipped and now none a portion of them are. Still investigating root cause.

@kfrydel
Copy link
Contributor Author

kfrydel commented Mar 10, 2023

@JRemitz Please let me know if you have examples that should be skipped and are not.

@JRemitz
Copy link

JRemitz commented Mar 10, 2023

Definitely and I think I'm eating my words... we had a reduction in "skipped" tests, but the new ones were not previously skipped, they were two of the new rules that were added. I am able to correlate those, I just need to dig up why the skipped tests count dropped. I'll let you know if that has any relationship to this change. Apologies for the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#nosec doesn't work with multi-line strings and Python 3.10
6 participants