Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version check makes potentially invalid assumptions about ELF layout #174

Open
1 task done
jhance opened this issue May 13, 2024 · 8 comments · May be fixed by #183
Open
1 task done

Version check makes potentially invalid assumptions about ELF layout #174

jhance opened this issue May 13, 2024 · 8 comments · May be fixed by #183
Assignees
Labels
bug Something isn't working

Comments

@jhance
Copy link

jhance commented May 13, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I have been trying to debug why pystack thinks I am using python version 12.41 and it turns out that my python binary has a different layout. This problem seems to only occur (or at least I've only noticed thus far) when it attempts to find hte value of Py_Version. The version of python supplied by ubuntu has elf section for rodata that looks like this (obtained with readelf -S).

  [18] .rodata           PROGBITS         0000000000312000  00312000
       000000000008806d  0000000000000000   A       0     0     32

Mine looks like this:

  [16] .rodata           PROGBITS         00000000008741c0  004741c0
       000000000035f330  0000000000000000   A       0     0     64

Expected Behavior

The proper way to look up the address would be something like

<addr of Py_Version> - 0x00000000008741c0 + 004741c0

Since these values are the same in most python binaries, the issue would go unnoticed. I am not sure if this is some guarantee that the normal python build process makes or not, so this could also bite regular python versions later.

I was able to validate that the calculation above would work for my binary, where 0x0000000000ba2c68 is the address that pystack is attempting to look up.

 dd if=(path to python) bs=1 skip=$((0x0000000000ba2c68-0x00000000008741c0+0x004741c0)) count=8| hexdump16+0 records in
16+0 records out
16 bytes copied, 8.1777e-05 s, 196 kB/s
0000000 04f0 030b 0000 0000

Steps To Reproduce

I am not sure how you would easily reproduce this issue as you'd need to produce a python binary that has the rodata addresses like the one in my example. If you are able to do that the issue reproduces very easily and all functionality of pystack will fail.

Pystack Version

1.3.0

Python Version

3.11

Linux distribution

Ubuntu

Anything else?

No response

@jhance jhance added the bug Something isn't working label May 13, 2024
@pablogsal
Copy link
Member

Hi @jhance and thanks for opening the issue. We will take a look soon. Meanwhile, could you tell us what version of Ubuntu are you using and how are you obtaining Python (deadsnakes, main repo, pyenv...).

Also is this when analysing a live process or a core file?

@pablogsal
Copy link
Member

Also, just to ensure we use what you already debugged: what's making your rodata section different than a regular binary? (It's not immediate clear from your comment)

@jhance
Copy link
Author

jhance commented May 13, 2024

I am compiling Python from source myself with a crosstool targetting towards a non-distribution provided version of glibc. As such, I don't expect many people to be following the same process. I am not sure what what is resulting in this difference though, maybe because I use gold instead of ld?

I was analyzing a core file while pointing to the same python binary that the core file was extracted from.

I just recently found a workaround is to essentially disable the Py_Version check and rely on the RSS check which seems to work and correctly detect 3.11.

@pablogsal
Copy link
Member

pablogsal commented May 13, 2024

My suspicion here is that what's going on is that you have a first PT_LOAD segment that doesn't map to the start of the file.

When the linker loads the file in memory, the sections (such as .rodata) don't matter anymore and the only thing the linker sees its LOAD segments. For this reason, we just need to find where the first LOAD segment (the mount point) it's in the file and correct by that (which we aren't doing at the time).

To corroborate this, do you mind sending the output of readelf -a over the binary?

@jhance
Copy link
Author

jhance commented May 13, 2024

https://pastebin.com/4h7wPER6

(I cut out some sections that contain literally all of the Python symbols).

We pass -Wl,-I to the linker to set a custom ld.so as the interpreter, maybe that is the cause for having an offset before the first LOAD segment?

@pablogsal
Copy link
Member

Ah there you go:

  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000008a8860 0x00000000008a8860  R E    

I will try to make a patch this week

@pablogsal
Copy link
Member

We pass -Wl,-I to the linker to set a custom ld.so as the interpreter, maybe that is the cause for having an offset before the first LOAD segment?

Yeah that's also my guess but I have seen this on the wild as well so I don't think it's unique of this situation

@jhance
Copy link
Author

jhance commented May 13, 2024

Thanks for the quick help, I am hardly an ELF expert so ended up reading a lot of docs today to figure out why this was not working...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants