Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inlining of callees during decompilation #4595

Merged
merged 15 commits into from
May 24, 2024

Conversation

zardus
Copy link
Member

@zardus zardus commented Apr 20, 2024

Traditionally, we decompile a function at a time, but this can be annoying with:

  • complex data flow through small handlers
  • wrapper functions
  • all sorts of cases where you don't want to keep clicking into things

This PR adds support to angr to inline functions during decompilation, subjecting the resulting super-function to our optimizations (such as constant propagation that can eliminate portions of inlined functions). The functionality is enabled by passing an inline_functions of Function objects (current FIXME: it just checks addresses) to Decompiler (which passes it on to Clinic).

TODOs:

  • add testcases
  • make inline_functions consistent with its type annotation
  • check to see if the callsite_maker hackiness is still needed now that we're more careful about optimizations

The core inlining code is based on original exploration by @mrT4ntr4 (which doesn't apply cleanly to modern Clinic, otherwise I'd included the commits to preserve history) --- thanks, Suraj!

@zardus zardus force-pushed the feat/inlined-decompilation branch 3 times, most recently from 7665311 to 224858c Compare April 20, 2024 21:08
@zardus
Copy link
Member Author

zardus commented Apr 21, 2024

Usage example, compiled as a .so with -O1:

Source:

#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int __attribute__((noinline)) five()
{
        return 5;
}

void __attribute__((noinline)) bar(int x)
{
        char buf[1024];
        read(0, buf, 2);
        read(0, buf, x);
}

void __attribute__((noinline)) foo(int x)
{
        if (x == 1337)
        {
                foo(42);
                char buf[128];
                read(0, buf, 8);
                read(0, buf, *(int *)buf);
        }
        else
        if (x) puts("T");
        else puts("F");
        bar(x);
}

char * __attribute__((noinline)) mylloc(int size)
{
        return malloc(size);
}

int main()
{
        char buf[1024];
        puts(">");
        read(0, buf, 10);
        foo(1);
        read(0, mylloc(five()), 10);
        foo(0);
        foo(1337);
        mylloc(five());
        bar(five()+1);
        bar(five()+five()+five());
        bar(five());
        bar(five()+1);
        mylloc(five()+five()+five());
        bar(10);
}
p = angr.Project("a.so", auto_load_libs=False)
cfg = p.analyses.CFG(normalize=True)
f = p.kb.functions['main']
d = p.analyses.Decompiler(
    p.kb.functions['main'], cfg,
    inline_functions={ p.kb.functions['mylloc'], p.kb.functions['five'] },
    options=[ ( angr.analyses.decompiler.decompilation_options.options[0], True ) ]
)
print(d.codegen.text)

Results in:

int main()
{
    char v0;  // [bp-0x428]

    puts(">");
    read(0, &v0, 10);
    foo(1);
    read(0, malloc(5), 10);
    foo(0);
    foo(1337);
    malloc(5);
    bar(6);
    bar(15);
    bar(5);
    bar(6);
    malloc(15);
    bar(10);
    return 0;
}

Or inline them all:

[ins] In [5]: import angr
         ...: p = angr.Project("a.so", auto_load_libs=False)
         ...: cfg = p.analyses.CFG(normalize=True)
         ...: f = p.kb.functions['main']
         ...: d = p.analyses.Decompiler(
         ...:     p.kb.functions['main'], cfg,
         ...:     inline_functions=f.functions_reachable(),
         ...:     options=[ ( angr.analyses.decompiler.decompilation_options.options[0], True ) ]
         ...: )
         ...: print(d.codegen.text)
int main()
{
    char v0;  // [bp-0x26c0]
    char v1;  // [bp-0x22a0]
    char v2;  // [bp-0x1e88]
    char v3;  // [bp-0x1a70]
    unsigned long long v4;  // [bp-0x1660]
    char v5;  // [bp-0x1658]
    unsigned long v6;  // [bp-0x1248]
    char v7;  // [bp-0x1238]
    char v8;  // [bp-0xe20]
    unsigned long v9;  // [bp-0xd90]
    char v10;  // [bp-0xd88]
    unsigned long long v12;  // [bp-0x8e0]
    char v13;  // [bp-0x8d8]
    char v15;  // [bp-0x428]
    unsigned long v17;  // rbx
    unsigned long long v18;  // r12
    unsigned long long v19;  // rbx
    unsigned long v20;  // rbp

    puts(">");
    read(0, &v15, 10);
    puts("T");
    read(0, &v13, 2);
    __read_chk(0, &v13, 1, 0x400);
    read(0, malloc(5), 10);
    v12 = 5;
    puts("F");
    read(0, &v10, 2);
    __read_chk(0, &v10, 0, 0x400);
    v9 = v12;
    foo(42);
    read(0, &v8, 8);
    __read_chk(0, &v8, *((int *)&v8), 128);
    read(0, &v7, 2);
    __read_chk(0, &v7, 1337, 0x400);
    v17 = v9;
    malloc(v17);
    v18 = v17 + 1;
    v6 = v17;
    read(0, &v5, 2);
    __read_chk(0, &v5, v18, 0x400);
    v19 = v6;
    v20 = v19 * 3;
    v4 = v17;
    read(0, &v3, 2);
    __read_chk(0, &v3, v20, 0x400);
    read(0, &v2, 2);
    __read_chk(0, &v2, v4, 0x400);
    read(0, &v1, 2);
    __read_chk(0, &v1, v18, 0x400);
    malloc(v20);
    read(0, &v0, 2);
    __read_chk(0, &v0, 10, 0x400);
    return 0;
}

@zardus
Copy link
Member Author

zardus commented Apr 21, 2024

Not sure why test_decompiling_fmt_main is failing, but it passes for me on 2 different machines with 3 different python versions, so I'm going to declare it Not My Problem.

@ltfish interested in your advice on https://github.com/angr/angr/pull/4595/files#diff-adef200972def43293b03d09ffab132238dc517eebb0d69250ddd3e5a5a31e91R339 . This is where I remove calls and their windups, but I'm hardcoding the cases here.

@ltfish
Copy link
Member

ltfish commented Apr 21, 2024

@zardus

 ======================================================================
ERROR: test_inlining (tests.analyses.decompiler.test_decompiler.TestDecompiler)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/angr/angr/build/src/angr/tests/analyses/decompiler/test_decompiler.py", line 3537, in test_inlining
    proj = angr.Project(bin_path, auto_load_libs=False)
  File "/__w/angr/angr/build/src/angr/angr/project.py", line 142, in __init__
    raise Exception("Not a valid binary file: %s" % repr(thing))
Exception: Not a valid binary file: '/__w/angr/angr/build/src/angr/tests/../../binaries/tests/x86_64/inline_gym.so'

Did you commit the inline_gym.so file?

@ltfish ltfish self-assigned this Apr 21, 2024
@zardus
Copy link
Member Author

zardus commented Apr 21, 2024

Yep, with the branch named the same. Not sure why that's not being picked up either.

@ltfish
Copy link
Member

ltfish commented Apr 21, 2024

@zardus Just push to master. PRs no longer pick up branches from other repos with the same name.

@zardus zardus force-pushed the feat/inlined-decompilation branch from 1e661ef to f242d6a Compare April 22, 2024 23:01
@mahaloz
Copy link
Member

mahaloz commented Apr 24, 2024

@zardus have you tested with Python 3.8 (that's the version running in CI IIRC)?

@zardus zardus force-pushed the feat/inlined-decompilation branch 2 times, most recently from 520ac58 to a7f4583 Compare April 27, 2024 03:01
@ltfish ltfish force-pushed the feat/inlined-decompilation branch from a16a0a6 to 1a1b5b2 Compare May 24, 2024 22:00
@ltfish ltfish added the enhancement Some subsystem of angr needs tweaking label May 24, 2024
@ltfish ltfish merged commit 6222a6c into master May 24, 2024
17 checks passed
@ltfish ltfish deleted the feat/inlined-decompilation branch May 24, 2024 22:31
@mahaloz
Copy link
Member

mahaloz commented May 24, 2024

Noooo @zardus officially beat me.

@zardus
Copy link
Member Author

zardus commented May 24, 2024

It be like that sometimes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Some subsystem of angr needs tweaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants