Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyses Byte code(.pyc) or source code(.py)? #27

Open
KarthickRaja2002 opened this issue May 11, 2023 · 1 comment
Open

Analyses Byte code(.pyc) or source code(.py)? #27

KarthickRaja2002 opened this issue May 11, 2023 · 1 comment

Comments

@KarthickRaja2002
Copy link

KarthickRaja2002 commented May 11, 2023

Hi @RootLUG ,

I need few clarifications on below mentioned questions:

  • In your document, you're entered like Aura can analyse both the binary and python files. If I'm giving a source file(.py), whether aura perform its analysis by converting(compiling) the source code to byte code(.py to .pyc) or it can perform over source code alone?

  • How Aura can be able to construct AST for both the Python version (py 2k & 3k) in same installation of Aura?

  • While giving source code as input it correctly finds all the detection. Meanwhile, I'm giving the respective byte code file, it shows zero(0) detection. Why it is so?
    Sample Case:
    if test.py is an input file, aura finds 3 detections.
    similarly, if test.pyc is an input file, aura finds 0 detections.

Thank you.

@RootLUG
Copy link
Contributor

RootLUG commented May 26, 2023

Hello, there are different things to consider:

  1. the binary files, or raw file detections (more appropriate term in this context) is intended for everything that is not a python source code. E.g. XML, html, txt, png, zip, exe and any other files. This is used to scan non-python code for problems/vulnerabilities/malicious intent like malformed XML, decompression bombs and generic things via Yara rules. It's not intended to function as python bytecode analyzer, or to put it different, there is no specific bytecode analyzer built into the Aura.
  2. Analyzing bytecode is a very hard problem. It is essentially analyzing a completely different language due to how it works (python VM is stack based interpreter). Every detection mechanism would need to be rewritten for it to work on bytecode, so essentially it would be like adding a support for any new language into Aura. There is a proof of concept script in this repository that is able to parse the bytecode but it's not used yet for anything, there may be a limited support in future for bytecode but that is not planned yet due to massive amount of work it would require to get it working, but contributions on this are welcome. Other alternative is to decompile/translate the bytecode back to python source code but that is also a no-go: very unreliable, hard to do for different python versions, no guarantee that you have an equivalent code, confusion with detection output for users etc... and most of the problems in decompiling are related to the famous "Halting problem" is it will work only for a subset of inputs at best.
  3. Aura is able to analyze py2k and py3k by using the "interpreters" configuration in yaml. It takes the configured interpreters (or default ones) and iterates over them to see which one is able to parse the given code via the builtin ast module. If it finds the right python version it will parse the code, extract metadata/enrich the code with additional information -> serialize to JSON and send back to Aura for analysis. So in theory Aura itself is not parsing the code, but it forks different python version on background and use them to parse the code. It also doesn't execute any code at any point in this due to security concern (possible malware), it only parses the input code via the built-in ast and serializes it. The code that is injected into the spawned python interpreter to parse the code is located at aura/analyzers/python_src_inspector.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants