New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: new hook to detect CVE-2021-42574 (Trojan Source Attacks) #683
Comments
I don't think that's true but it is possible I don't think this is easy to implement especially if you are to handle homoglyphs (Α vs A) beyond banning all non-ASCII characters which isn't productive if you can come up with a concrete proposal it may be considered but in the current state it isn't feasible |
I was wondering about this myself. Maybe we could do this by looking at I'd default to rejecting any code which contains characters in the control sequence category ( If that's still too aggressive because it rejects legitimate control characters in strings, and we're interested in this specifically for python, I could look at writing something with |
Quoting from the paper, chapter V.F. Defenses:
So I'd say we can just implement that simplest defense approach, at least as a start. It can serve as a shield that targets any UTF-8 compatible language. For most projects, this should be just OK, as there is usually a workaround as explained in the quote. This could be the next iteration, but that depends on what is a string and a comment for each language, so it becomes more complex and specific:
|
as stated above that is insufficient |
@asottile, you mean it's insufficient for protecting against homoglyph confusion? That's true, but those attacks predate the recent realization that BiDi control characters can hide source. Most of our codebases are unlikely to contain directional control characters, even in strings and comments. I would like it best if such a hook were in this repo, but if not I can write one separately. |
just because it isn't "new" doesn't mean it isn't a problem: if user.type == 'ΑDMIN':
... |
Sure, I won't argue that it isn't a problem, and maybe I shouldn't have brought up its age. But, as you said, there's nothing productive to do about that short of banning non-ASCII text. I don't see these two problems as inherently related. Are you worried that even if the check is named |
I'm stating:
|
I think a check that verifies that files decode without errors in strict mode with the given encoding supported by Python would be useful to have on its own, not only for this purpose. For the thing at hand, configuring it to accept ASCII only would likely work just fine for many. For example call the hook |
check-encoding done in #685 |
I don't really think such a thing solves this problem -- most would use I think the correct design for "solving" this issue is to come up with a set of glyph ranges that are banned -- but we need to decide what those are (and perhaps augment them in the future) I think the best approach forward is to draft a set of characters to "ban" and build a checker around that. (the checker itself would require strict UTF-8 as well) thoughts? |
Re check-encoding, no other thoughts besides repeating myself: I think such an encoding checker is useful on its own completely irrespective of this issue (I know I've missed one myself at times), and for some who can afford ASCII or want it for other reasons, it would work for catching this one as well. I'm not suggesting it as a complete fix for this issue. |
any decent programming language already presents encoding issues as syntax errors so I'm not sure where it would be useful |
This test is not limited to programming languages. |
I understand but the vast majority of programming is writing code so I don't think it's that useful |
Checking file encodings does not seem to be a solution to this issue. I write plenty of code that's all ASCII, but I also write code that uses non-ASCII characters. How would such a check help me?
This makes sense to me -- I can imagine how to implement it. Some thoughts:
|
As said above, "I'm not suggesting it as a complete fix for this issue.". Even if it wouldn't help you, it could help others. Anyway that's not the primary reason for the check-encoding check, more like a side effect. Let's continue discussing check-encoding in #685, I've already posted my replies there to some comments related to it that are here. |
As explained in https://trojansource.codes/ and https://www.python.org/dev/peps/pep-0672/ it is very likely to receive trojan source attacks, where some UTF-8 strings contain invisible characters that make your program behave in unexpected ways.
Since those characters are proper UTF-8 characters and are invisible, even manual source code revision can bypass it. We need automated tools to do that.
It can affect many languages, so it would be great if a pre-commit hook is provided here that can forbid committing such poisoned sources.
The text was updated successfully, but these errors were encountered: