Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Heap scanning with data structure detection #15

Open
blechschmidt opened this issue Aug 28, 2016 · 27 comments
Open

Feature request: Heap scanning with data structure detection #15

blechschmidt opened this issue Aug 28, 2016 · 27 comments

Comments

@blechschmidt
Copy link

As soon as memory scanning is implemented, an additional feature allowing to detect simple data structures would be great.

For example, one could hook all malloc calls using the LD_PRELOAD environment variable in order to detect allocated units and graphically outline this in the memory viewer. Furthermore, if a byte sequence within a block of allocated memory represents a valid heap or stack address, this could be graphically highlighted as a possible pointer.

Thank you for the efforts which you put into this great project.

@korcankaraokcu
Copy link
Owner

korcankaraokcu commented Aug 28, 2016

Sure, why not, looks very useful. But this might be implemented at very last phases because I'm planning to finish debugger&code injection engine first in order to give scanmem team more time to develop libscanmem. Also, there are missing features in libscanmem, I'll try to help developing it when this project reaches code scanning phase. My current plan follows as:

setup.py-->refactoring of libPINCE for OOP usage-->basic debugger-->breakpoints-->code injection(single line then code cave injection)-->signal bypassing&Anti anti-debugger tricks in general-->final GUI tweaks/refactoring-->memory scanning-->pointer scanning&this feature

Thank you for the efforts which you put into this great project

Well, someone had to get the boulder rolling 😃

@sriemer
Copy link
Contributor

sriemer commented Aug 29, 2016

The data of hooking all malloc()s is huge and backtracing takes quite some time. So if you want to do this, then you should know what to look for and filter before backtracing. Otherwise, real-time libs like openGL notice a problem and exit the game. ugtrain already has dynamic memory discovery/hacking/adaption based on malloc() hooking and LD_PRELOAD. It has awesome Chromium B.S.U., Cube 2: Sauerbraten and Warzone 2100 examples based on this.

@korcankaraokcu
Copy link
Owner

Thanks @sriemer, I'll keep that in mind. Also I have a few concerns about LD_PRELOAD trick. Firstly, you have to restart the game, which is a huge drawback on games that has different state saving mechanisms(some games even disallow you from quitting, check OneShot rpg for instance), we should find a runtime solution for that. Secondly, some games have protected binary loaders and they might detect libraries loaded by LD_PRELOAD easily by checking /proc/$pid/maps for non-trusted paths.

@kekeimiku
Copy link

I made a pointer scanner, no need to rely on LD_PRELOAD, debuger and hook, it will not be detected by the game, only need a memory dump file, and then the game does not even need to run. Maybe it will help you: scanmem/scanmem#431

@korcankaraokcu
Copy link
Owner

korcankaraokcu commented May 4, 2023

@kekeimiku That looks very cool! But integrating it into PINCE is a bit unlikely since it's a direct extension of the scanmem functionality and it feels like it should be integrated into scanmem instead

If you would like to integrate it as a 3rd party tool, maybe we could look into changing PointerSearcher-X output format to PINCE cheat table format so they would be compatible. If you are up for it, I can create a new issue with detailed info on the format for this kind of integration. It's up to you

@kekeimiku
Copy link

kekeimiku commented May 5, 2023

@korcankaraokcu

The PINCE cheat table doesn't seem to support resolving something like libhello+0x1234 as a base address?

image

@korcankaraokcu
Copy link
Owner

PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function parse_and_eval to evaluate anything you give it to but apparently it doesn't support resolving shared libraries

More info on the symbols and gdb expressions:
https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions

Maybe the command info sharedlibrary could be used for this purpose. I'd either have to extend examine_expression functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parse pmap output to find base addresses. Which method would you like to proceed with?

@kekeimiku
Copy link

kekeimiku commented May 6, 2023

PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function parse_and_eval to evaluate anything you give it to but apparently it doesn't support resolving shared libraries

More info on the symbols and gdb expressions: https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions

Maybe the command info sharedlibrary could be used for this purpose. I'd either have to extend examine_expression functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parse pmap output to find base addresses. Which method would you like to proceed with?

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address.

@korcankaraokcu
Copy link
Owner

The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address

Yeah I agree, PINCE already uses a package called psutil for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:

PINCE stores cheat tables in pct extension. Save button trigger is handled by pushButton_Save_clicked In PINCE.py. It calls read_address_table_recursively which reads the entire table. The function responsible for item conversion is read_address_table_entries. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]

Save this as a pct file and load it in PINCE. You can also view it in here for clarity

Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by [2, 10, true, 0]. You can copy paste this for now, I can also explain it further if you wish. Any questions?

@kekeimiku
Copy link

kekeimiku commented May 7, 2023

The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address

Yeah I agree, PINCE already uses a package called psutil for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:

PINCE stores cheat tables in pct extension. Save button trigger is handled by pushButton_Save_clicked In PINCE.py. It calls read_address_table_recursively which reads the entire table. The function responsible for item conversion is read_address_table_entries. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]

Save this as a pct file and load it in PINCE. You can also view it in here for clarity

Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by [2, 10, true, 0]. You can copy paste this for now, I can also explain it further if you wish. Any questions?

Why is int32 indicated by [2, 10, true, 0]? what other types are indicated by? [2, 10, true, 0], []]] what is the last empty array?

@brkzlr
Copy link
Collaborator

brkzlr commented May 8, 2023

Because that first array is the value_type representation in the json format.

image

The first value in the array is the VALUE_INDEX which you can find in libpince/type_defs.py at line 157.
image

@kekeimiku
Copy link

Because that first array is the value_type representation in the json format.

image

The first value in the array is the VALUE_INDEX which you can find in libpince/type_defs.py at line 157. image

Thx

@korcankaraokcu
Copy link
Owner

@brkzlr Thanks for the explanation. I'll add a little more information on this

value_index: Type of the value
length: Length of the entry, only used if the entry has length, defaults to 10
zero_terminate: Determines if the string is zero terminated, only used for strings
value_repr: Representation of the value, can be found in type_defs.py. Determines if the value is being shown as unsigned, signed or hexadecimal

what is the last empty array?

It's the children of the entry. The table has the structure of a tree. The one I sent you is basically a list, so it has no child entries. The table below has an entry that has exactly one child. Load it in PINCE and observe for yourself:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], [["No Description", "printf", [2, 10, true, 1], []]]]]

@korcankaraokcu
Copy link
Owner

@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance, kwidgetsaddons5_qt.qm. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?

@kekeimiku
Copy link

kekeimiku commented May 11, 2023

@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance, kwidgetsaddons5_qt.qm. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?

Currently pointer searches only care about regions that have read permission and path does not contain /usr, /dev and meet the following rules [stack] [heap] path is binary path is empty.

For pince, you only need to search the first elf file with the specified name in /proc/pid/maps according to the input, and then get its starting address.

Example: maps

0x200001-0x3000008 r-- /home/aabb/hihihi
...
0x300001-0x4000008 r-- /home/aabb/hello.so
0x4000008-0x3000008 rw- /home/aabb/hello.so

Output of pointersearch hello.so+0x1

It should be parsed as 0x300002. That is 0x300001+0x1

Output of pointersearch hihihi+0x1

It should be parsed as 0x200002. That is 0x200001+0x1

My English is terrible/bad. please feel free to contact me if anything is unclear.

@korcankaraokcu
Copy link
Owner

My English is terrible/bad. please feel free to contact me if anything is unclear

Your English is very clear, don't worry

path is empty

But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?

@kekeimiku
Copy link

My English is terrible/bad. please feel free to contact me if anything is unclear

Your English is very clear, don't worry

path is empty

But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?

If there is no path, we can ignore it. Can return an error if an elf named xxx cannot be found.

@korcankaraokcu
Copy link
Owner

So, do we exclude those rules then? I mean, ignore if [stack] [heap] path is binary path is empty

@kekeimiku
Copy link

kekeimiku commented May 11, 2023

So, do we exclude those rules then? I mean, ignore if [stack] [heap] path is binary path is empty

We only need areas where the pathname is binary file. others can be ignored.

@korcankaraokcu
Copy link
Owner

Aight, thanks for clearing it up

@kekeimiku
Copy link

kekeimiku commented May 11, 2023

How do you feel about doing this in pointersearch, then just call scanmem/pointersearch.

I mean resolve the address of the pointer chain.

@kekeimiku
Copy link

Maybe we can move all pointer search related functions to scanmem, pince only needs to focus on scanmem.

@korcankaraokcu
Copy link
Owner

korcankaraokcu commented May 11, 2023

How do you feel about doing this in pointersearch, then just call scanmem/pointersearch

Users will eventually want to use .so symbols in their scripts, it makes sense for libpince to have this kind of symbol recognition. Don't worry, I'll most likely finish this by tomorrow. I was focused on some visual bugs that I noted in the past but I'm done with them now

@korcankaraokcu
Copy link
Owner

I've finished it but need to optimize it a bit before releasing, sorry for the delay

@korcankaraokcu
Copy link
Owner

Aight, I've finished it. Enjoy using this new feature. psutils was a bit slower than I've expected, 30ms on the first call, a bit slow for what it is. I can also parse by myself if this becomes a problem in the future or if we don't use extras of psutils

There's one caveat about this feature. examine_expression handles all of the symbol recognition, this new feature was implemented inside of it because it makes sense design wise. However, examine_expression uses gdb to resolve symbols so you'll have to stop the process in order to use this feature. I'll try to change the behavior of PINCE in the near future to make it usable even when process isn't stopped

@kekeimiku
Copy link

How does pince resolve pointer chains? Seems to be different than expected.

For example [["No Description", ["0x7f08fa222050", [0, 24, 16]], [2, 10, true, 0], []]]
It is expected that it should read a "ptr1" from "0x7f08fa222050+0", then "ptr2" from "ptr1+24", and finally "ptr2+16" to the target.
For example:

proc = OpenProcess(pid)
base_address = 0x7f08fa222050
buf = [0;8] //A 8-byte pointer-sized buf
proc.read(buf, base_address + 0) // read 8 bytes from `base_address + 0`
ptr1 = uint64(buf) // convert 8 bytes of buf to uint64
proc.read(buf, ptr1 + 24) // read 8 bytes from `ptr1 + 24`
ptr2 = uint64(buf) // convert 8 bytes of buf to uint64
target = ptr2 + 16

@korcankaraokcu
Copy link
Owner

This conversation has been moved to discord to not derail the original subject further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants