feat(r2): backtrace and control flow #1213

chinggg · 2022-08-24T02:21:05Z

Checklist

Which kind of PR do you create?

This PR only contains minor fixes.
This PR contains major feature update.
This PR introduces a new function/api for Qiling Framework.

Coding convention?

The new code conforms to Qiling Framework naming convention.
The imports are arranged properly.
Essential comments are added.
The reference of the new code is pointed out.

Extra tests?

No extra tests are needed for this PR.
I have added enough tests for this PR.
Tests will be added after some discussion and review.

Changelog?

This PR doesn't need to update Changelog.
Changelog will be updated after some proper review.
Changelog has been updated in my PR.

Target branch?

The target branch is dev branch.

One last thing

I have read the contribution guide

Now I have implemented callstack and backtrace, though the features have not been fully tested yet. I am currently working on deflat as a show case of control flow analysis ability

wtdcode · 2022-08-24T19:02:14Z

examples/extensions/r2/hello_r2.py

@@ -35,6 +35,7 @@ def my_sandbox(path, rootfs):
    ql.hook_address(func, r2.functions['main'].offset)
    # enable trace powered by r2 symsmap
    # r2.enable_trace()
+    r2.bt(0x401906)


Why do we need an extra argument?

The extra argument is the target address for setting the bt_hook, so we can print backtrace without emu_error(). I will test _backtrace_fuzzy without argument in emu_error()

wtdcode · 2022-08-25T14:12:09Z

qiling/extensions/r2/r2.py

+        insts = [Instruction(**dic) for dic in self._cmdj(f"pdj {n} @ {addr}")]
+        return insts
+
+    def _backtrace_fuzzy(self, at: int = None, depth: int = 128) -> Optional[CallStack]:


More arch support

elicn · 2022-09-01T13:57:47Z

I am failing to understand why you had to change the memory module..
Was there something missing or not behaving as expected?

wtdcode · 2022-09-01T13:58:47Z

I am failing to understand why you had to change the memory module.. Was there something missing or not behaving as expected?

By changing the memory module, Qiling and r2 will share the same memory and thus solve the previously mentioned memory sync problem.

elicn · 2022-09-01T14:23:16Z

By changing the memory module, Qiling and r2 will share the same memory and thus solve the previously mentioned memory sync problem.

Pfff.. thigs are getting more and more complicated. Having the memory content being managed in Python code is not a great idea.. I guess that is going to come with a great performance hit and tons of bugs. As a first exmaple, the memory range splitting is handling it wrong (ranges are splitted, but their data isn't).

wtdcode · 2022-09-01T14:28:44Z

By changing the memory module, Qiling and r2 will share the same memory and thus solve the previously mentioned memory sync problem.

Pfff.. thigs are getting more and more complicated. Having the memory content being managed in Python code is not a great idea.. I guess that is going to come with a great performance hit and tons of bugs. As a first exmaple, the memory range splitting is handling it wrong (ranges are splitted, but their data isn't).

For performance, I don't think it will have any performance overhead as we are passing the raw pointer to Unicorn. The only difference is that now the memory is allocated by Python. @chinggg maybe you could do a small benchmark on uc_mem_map and uc_mem_map_ptr with uc_mem_write/read

For bugs, the memory splitting handling is buggy indeed (and probably results in the CI crash @chinggg) but can be fixed easily. I don't think it will introduce too many bugs as we have enough tests. Note even unicorn wrote the memory split on its own instead of using any qemu code.

chinggg · 2022-09-01T15:20:38Z

I know little about Unicorn internal and cannot find methods to pass uc mem to r2, so we can either allocate memory in r2 or Python code if we want to sync memory between them. I admit it is weird to deal with memory management stuff in Python since that should be the strength of Unicorn. But it seems to be more acceptable than to malloc in r2 after making PoC of both approaches.
Now the memory is not handled well so all the CIs failed, but what is promising is that most simulations can still work without too much refactoring. I hope it can be fixed easily and the performance is not bad as @wtdcode assumed.
I have another concern for r2's functionality. Since we just pass mapped memory instead of the whole file, it may miss much information about symbols and file structures. But the static information is really what we want at first.

in addition to 'invalid' instruction

wtdcode · 2022-09-11T06:51:40Z

qiling/os/memory.py

            return

        if mem_info is not None:
-            self.map_info[info_idx] = (tmp_map_info[0], tmp_map_info[1], tmp_map_info[2], mem_info, tmp_map_info[4])
+            self.map_info[info_idx] = (tmp_map_info[0], tmp_map_info[1], tmp_map_info[2], mem_info, tmp_map_info[4], tmp_map_info[5])

    def get_mapinfo(self) -> Sequence[Tuple[int, int, str, str, str]]:


Data is missing.

Thanks for pointing out this, though now there is no existing usage of change_mapinfo that passes data as argument. When code execute into this branch, it is likely to be a change of label without any modification of perm and content since the last if body will return

chinggg · 2022-09-11T15:28:20Z

It seems now there leaves only two errors in test_android, but I cannot figure out why, maybe some multithreading issue. Hoping someone can have a quick review on it so I can solve it easily.

BREAKING CHANGE: mem is managed in Python instead of uc BREAKING CHANGE: MapInfoEntry now has 6 elements instead of 5 BREAKING CHANGE: r2 map io from ql.mem, no full binary, now missing symbols BREAKING CHANGE: del_mapinfo and change_mapinfo recreate and remap mem Add unit tests for ql mem operations Also fix potential bug in syscall_munmap

chinggg · 2022-09-15T12:34:21Z

qiling/os/memory.py

+                data = self.read(lbound, ubound - lbound)  # read instead of using data from map_info to avoid error
+                mem_dict['ram'].append((lbound, ubound, perm, label, data))


Actually, data in map_info will be different from the return value of uc.read here, which causes error in ql.mem.save/restore, so I made a hack here by reading mem again to avoid errors temporarily. But it seems to cover some unseen inconsistency between map_info and uc.mem, which could potentially be the source of other errors.

BUG: mips32 uc map 0x9000000 become 0x1000000

chinggg · 2022-09-17T05:17:48Z

I add assert_mem_equal check to ql mem operations and syscalls like brk and mmap so more errors occured. According to the CI testing result https://github.com/qilingframework/qiling/actions/runs/3065273326/jobs/4949204918, there seems to be 3 kind of error in test_elf:

For binary of MIPS architecture, the map address is wrong when exceeding 0x80000000 Wrong address range when mem_map on MIPS architecture unicorn-engine/unicorn#1708
the one in ld.so Memory region 3 0x47e1000 - 0x47e2000 != map_info[3] ld-linux.so.2
the one in libc.so Memory region 10 0x7fffb81c1000 - 0x7fffb81c3000 != map_info[10] [mmap] libc.so.6

I find that execution may still be successful if these assertions are removed, since there is only bytes level difference between uc mem and ql map_info according to the dump result.

c_byte to c_ubyte to avoid error when r_buf_new

chinggg force-pushed the r2-dev branch from e270ae2 to beea471 Compare August 24, 2022 02:27

wtdcode reviewed Aug 24, 2022

View reviewed changes

wtdcode reviewed Aug 25, 2022

View reviewed changes

feat(r2): CallStack and fuzzy backtrace hook

a77adac

chinggg force-pushed the r2-dev branch from beea471 to a77adac Compare August 25, 2022 14:47

feat(r2): interactive shell

023c3a3

chinggg force-pushed the r2-dev branch 2 times, most recently from d003c2e to 25f6ad7 Compare September 2, 2022 07:16

fix(r2): skip ill instruction in disassembler

cb7ca60

in addition to 'invalid' instruction

chinggg force-pushed the r2-dev branch 2 times, most recently from 3cd7ad8 to 99d21af Compare September 5, 2022 11:55

wtdcode reviewed Sep 11, 2022

View reviewed changes

chinggg force-pushed the r2-dev branch 4 times, most recently from b6f23fc to 1ce1ad0 Compare September 11, 2022 12:46

chinggg force-pushed the r2-dev branch from 1ce1ad0 to fce695d Compare September 12, 2022 00:46

chinggg commented Sep 15, 2022

View reviewed changes

chinggg added 4 commits September 16, 2022 01:53

feat(utils): assert_mem_equal between uc and ql

037dadb

debug(syscall): asser_mem_equal on mmap and brk

b76a9b4

feat(utils): assert_mem_equal dump bin

95c8faf

test(mem): mmap2 syscall

4f260a2

BUG: mips32 uc map 0x9000000 become 0x1000000

chinggg added 2 commits September 18, 2022 14:12

refactor: use stored cdata to setup mem for r2

1f206f3

c_byte to c_ubyte to avoid error when r_buf_new

feat(r2): oba to load bininfo and update flags

0ea4fa8

chinggg mentioned this pull request Oct 13, 2022

feat: R2Qiling with refactored memory and de-flatten plugin #1244

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(r2): backtrace and control flow #1213

feat(r2): backtrace and control flow #1213

chinggg commented Aug 24, 2022

wtdcode Aug 24, 2022

chinggg Aug 25, 2022 •

edited

wtdcode Aug 25, 2022

elicn commented Sep 1, 2022

wtdcode commented Sep 1, 2022

elicn commented Sep 1, 2022

wtdcode commented Sep 1, 2022

chinggg commented Sep 1, 2022 •

edited

wtdcode Sep 11, 2022

chinggg Sep 11, 2022 •

edited

chinggg commented Sep 11, 2022 •

edited

chinggg Sep 15, 2022 •

edited

chinggg commented Sep 17, 2022 •

edited

		data = self.read(lbound, ubound - lbound) # read instead of using data from map_info to avoid error
		mem_dict['ram'].append((lbound, ubound, perm, label, data))

feat(r2): backtrace and control flow #1213

Are you sure you want to change the base?

feat(r2): backtrace and control flow #1213

Conversation

chinggg commented Aug 24, 2022

Checklist

Which kind of PR do you create?

Coding convention?

Extra tests?

Changelog?

Target branch?

One last thing

wtdcode Aug 24, 2022

Choose a reason for hiding this comment

chinggg Aug 25, 2022 • edited

Choose a reason for hiding this comment

wtdcode Aug 25, 2022

Choose a reason for hiding this comment

elicn commented Sep 1, 2022

wtdcode commented Sep 1, 2022

elicn commented Sep 1, 2022

wtdcode commented Sep 1, 2022

chinggg commented Sep 1, 2022 • edited

wtdcode Sep 11, 2022

Choose a reason for hiding this comment

chinggg Sep 11, 2022 • edited

Choose a reason for hiding this comment

chinggg commented Sep 11, 2022 • edited

chinggg Sep 15, 2022 • edited

Choose a reason for hiding this comment

chinggg commented Sep 17, 2022 • edited

chinggg Aug 25, 2022 •

edited

chinggg commented Sep 1, 2022 •

edited

chinggg Sep 11, 2022 •

edited

chinggg commented Sep 11, 2022 •

edited

chinggg Sep 15, 2022 •

edited

chinggg commented Sep 17, 2022 •

edited