Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaying a recording fails with assert error on g_main_context_acquire or segfault #1391

Open
jstarink opened this issue Dec 5, 2023 · 4 comments

Comments

@jstarink
Copy link

jstarink commented Dec 5, 2023

Description

I am trying to get a basic record new or replay mechanism running. Following the documentation, I believe this should be either just a call to panda.run_replay or panda.record_cmd. However, there seems to be some nondeterminism in replaying the previously recorded traces with (Py)PANDA when doing so. Recording works fine, but replaying results in either a segfault or other assertion errors. This behavior seems consistent with both i386 and x86_64 images of both Linux and Windows. Am I doing something wrong or is this a bug?

How to reproduce

  1. Run a new panda instance from docker hub
    $ docker run --rm -it -v $(pwd):/local pandare/panda /bin/bash
  2. Create a new basic test.py file in /local with the following contents:
    from pandare import Panda
    import os
    
    panda = Panda(generic='x86_64')
    
    @panda.queue_blocking
    def driver():
        recording_name = "linux64recording"
        # Does a recording exist? If so, replay it. Otherwise, make a new recording.
        if os.path.isfile(f"{recording_name}-rr-snp"):
            print("Replaying")
            panda.run_replay(recording_name)
        else:
            print("Recording")
            panda.revert_sync('root')
            panda.record_cmd("whoami", recording_name=recording_name)
        
        panda.stop_run()
    
    panda.run()
  3. Run the script once and observe normal execution with output similar to the following:
    root@5a4e2fa4f486:/local# python test.py
    using generic x86_64
    Downloading required file: https://www.dropbox.com/s/4avqfxqemd29i5j/bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2?dl=1
    /root/.panda/bionic-server-cloud 100%[========================================================>]   2.84G  22.2MB/s    in 2m 33s
    Validating file hash
    os_name=[linux-64-ubuntu:4.15.0-72-generic-noaslr-nokaslr]
    PANDA[core]:os_familyno=2 bits=64 os_details=ubuntu:4.15.0-72-generic-noaslr-nokaslr
    [PYPANDA] Panda args: [/usr/local/lib/python3.8/dist-packages/pandare/data/x86_64-softmmu/libpanda-x86_64.so -L /usr/local/share/panda /root/.panda/bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2 -display none -m 1024 -serial unix:/tmp/pypanda_sf2yccvkw,server,nowait -monitor unix:/tmp/pypanda_mxmdsta9o,server,nowait]
    Replaying
    loading snapshot
    ... done.
    opening nondet log for read :	./linux64recording-rr-nondet.log
    ./linux64recording-rr-nondet.log:     3253417 instrs total.
    Replay completed successfully
    Exiting cpu_handle_execption loop
    
  4. Now run the script a second time, and notice it crashing. I have seen several variations of output for this, but the following two seem to be the most prevalent:
    root@5a4e2fa4f486:/local# python test.py
    using generic x86_64
    os_name=[linux-64-ubuntu:4.15.0-72-generic-noaslr-nokaslr]
    PANDA[core]:os_familyno=2 bits=64 os_details=ubuntu:4.15.0-72-generic-noaslr-nokaslr
    [PYPANDA] Panda args: [/usr/local/lib/python3.8/dist-packages/pandare/data/x86_64-softmmu/libpanda-x86_64.so -L /usr/local/share/panda /root/.panda/bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2 -display none -m 1024 -serial unix:/tmp/pypanda_sifghicd1,server,nowait -monitor unix:/tmp/pypanda_m8xy0ak1w,server,nowait]
    Replaying
    loading snapshot
    loading snapshot
    rdev-iowatch-serial0: Illegal RAM offset ffc000
    rdev-iowatch-serial0: error while loading state section id 4(ram)
    Failed to load vmstate
    Failed to start replay
    rdev-iowatch-serial0: Unknown savevm section 4
    Failed to load vmstate
    Failed to start replay
    free(): corrupted unsorted chunks
    Segmentation fault (core dumped)
    
    root@5a4e2fa4f486:/local# python test.py
    using generic x86_64
    os_name=[linux-64-ubuntu:4.15.0-72-generic-noaslr-nokaslr]
    PANDA[core]:os_familyno=2 bits=64 os_details=ubuntu:4.15.0-72-generic-noaslr-nokaslr
    [PYPANDA] Panda args: [/usr/local/lib/python3.8/dist-packages/pandare/data/x86_64-softmmu/libpanda-x86_64.so -L /usr/local/share/panda /root/.panda/bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2 -display none -m 1024 -serial unix:/tmp/pypanda_s079elggc,server,nowait -monitor unix:/tmp/pypanda_mdbfu5knm,server,nowait]
    Replaying
    loading snapshot
    loading snapshot
    ... done.
    opening nondet log for read :	./linux64recording-rr-nondet.log
    ... done.
    opening nondet log for read :	./linux64recording-rr-nondet.log
    ./linux64recording-rr-nondet.log:     3253417 instrs total.
    Replay completed successfully
    Exiting cpu_handle_execption loop
    **
    GLib:ERROR:../../../glib/gmain.c:3375:g_main_context_acquire: assertion failed: (context->owner_count == 0)
    Bail out! GLib:ERROR:../../../glib/gmain.c:3375:g_main_context_acquire: assertion failed: (context->owner_count == 0)
    Aborted (core dumped)
    

Interestingly enough, the second output seems to indicate the replay completed successfully, but then still crashes afterwards.

Additional Info

Host OS: MANJARO 23.1.0
Kernel: 6.1.64
Docker Version: 24.0.7
panda-system-x86_64 --version: QEMU emulator version 2.9.1 (-dirty)
CPU: Intel i7-11800H
RAM: 16GB

@AndrewFasano
Copy link
Member

I have a guess as to what's going wrong. When the recording exists, you call run_replay which runs the replay. Then you still hit the call to run which will try to resume the emulation from where things left off after the replay which will be something weird. If that's the issue, I think we should raise an error instead of crashing in this strange way.

Does the issue go away if you refactor things to avoid the call to run after run_replay, like this:

from pandare import Panda
import os

panda = Panda(generic='x86_64')
recording_name = "linux64recording"

if os.path.isfile(f"{recording_name}-rr-snp"):
    print("Replaying")
    panda.run_replay(recording_name)
else:
    @panda.queue_blocking
    def driver():
          print("Recording")
          panda.revert_sync('root')
          panda.record_cmd("whoami", recording_name=recording_name)
          panda.stop_run()
    panda.run()

@jstarink
Copy link
Author

jstarink commented Dec 5, 2023

Perfect, that seems to have fixed the problem!

This makes me a bit confused about the naming of the available functions though. Considering that there are other run_xxx methods (such as run_serial_cmd) provided in the examples folder that are actually put into a panda.queue_blocking tagged function. Is run_replay substantially different from these other run methods? When should something be put in a queue and when shouldn't it be queued?

Also, indeed, a more descriptive error would've been nice :).

@AndrewFasano
Copy link
Member

The run method runs a live guest while run_replay runs a replay, generally you want one or the other, not both.

I definitely see what you're saying where it's confusing how the run_serial_cmd method has a similar style name and being totally different! Open to suggestions for how we could improve these names in the future!

@jstarink
Copy link
Author

jstarink commented Dec 6, 2023

@AndrewFasano Thanks for the explanation, that definitely clears it up!

I definitely see what you're saying where it's confusing how the run_serial_cmd method has a similar style name and being totally different! Open to suggestions for how we could improve these names in the future!

Some ideas come to mind:

  • I would consider distinguishing between running a PANDA analysis, and actually executing a program in the guest VM, perhaps using the run_ and exec_/execute_ prefixes.
  • I would also consider renaming run() to something like run_live_guest(), to make it more explicit and no confusion can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants