Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with old plugin not unloading correctly. (TLS + Linux) #35

Open
Neopallium opened this issue Mar 29, 2019 · 15 comments
Open

Issues with old plugin not unloading correctly. (TLS + Linux) #35

Neopallium opened this issue Mar 29, 2019 · 15 comments

Comments

@Neopallium
Copy link
Contributor

I found a way to catch panics from Rust guest plugins. But there seems to be a problem when cr does a reload after a rollback. Maybe a segfault happens in the cr reload code and it tries to siglongjmp back to cr_plugin_main()?

Might be useful to have an option to disable the setjmp/exception handling for a plugin.

This is the commit I was using:
Neopallium/cr-sys@8b7853c

With CR_DEBUG defined in src/host.cpp.

Output running under valgrind:
cr-sys-rollback-reload.log

@Neopallium
Copy link
Contributor Author

Looks like a crash from cr_plugin_sections_reload during a reload after a rollback. I disabled the signal handler.

  1. start host. guest version (1)
  2. compile new guest version (2)
  3. compile a panicking version (3)
  4. cr does rollback to version (2)
  5. compile a 'fixed version (4)
  6. cr tries to reload new version (4) crashes (Tries to longjmp back to cr_plugin_main?)

out-no-longjmp.log

@Neopallium
Copy link
Contributor Author

I found the problem.

  1. host starts with guest version 1 libraw_guest0.so
  2. re-compile with panic.
  3. reload guest version 2 libraw_guest1.so
  4. panic.
  5. rollback "unload" libraw_guest1.so
  6. reload previous guest version 1 libraw_guest0.so
  7. re-compile without panic.
  8. reload guest version 2 (cr reuses the version of the panicked guest). libraw_guest1.so

At step 5 libraw_guest1.so is not fully unloaded. It is still mapped. At step 8 the new version doesn't seem to be mapped or loaded correctly.

I did a quick hack in cr.h to make sure the file names are not re-used. If the file already exists it skips to the next version number. This fixes the crash on reload after a rollback.

@fungos
Copy link
Owner

fungos commented Mar 30, 2019

Interesting find. I think we may be missing something when closing the handle or it may be something related to file inodes. Not sure, I'll try to do some research on this, but looking at this, it is bad. We need to find the source and fix there.

Can you please confirm you're able to reproduce this only with the cr c samples?

Just to be sure this is nothing related to how rustc is building the app/libs.

@Neopallium
Copy link
Contributor Author

I was thinking it was do to TLS usage causing dlclose to not really unload the shared library. But I just found this stack overflow answer about -fno-gnu-unique

@Neopallium
Copy link
Contributor Author

I haven't been able to cause a C guest to stay loaded. I was able to load a Rust guest plugin in the basic_host c++ sample. The Rust plugin wouldn't unload. So it seems to be caused by how Rust shared libraries are created.

@Neopallium
Copy link
Contributor Author

Neopallium commented Mar 31, 2019

Final made a Rust plugin that unloads correctly:

use std::os::raw::{c_int, c_void};

#[no_mangle]
pub fn cr_main(_ctx: *mut c_void, _cr_op: c_int) -> c_int {
    //println!("From Rust!");  // this print will prevent the plugin from unloading.
    return 1; // change number with each compile.
}

I read somewhere that print! in Rust uses TLS. Still not sure how that is keeping it from unloading.

@Neopallium Neopallium changed the title Corrupted setjmp/longjmp state after rollback and reload? Issues with old plugin not unloading correctly. Apr 1, 2019
@Neopallium Neopallium changed the title Issues with old plugin not unloading correctly. Issues with old plugin not unloading correctly. (Rust + TLS + Linux) Apr 1, 2019
@Neopallium
Copy link
Contributor Author

This problem doesn't happen on Windows. The Rust plugin rollback and reload seem to work just fine. I haven't tested MacOS. Maybe this is a Rust only issue on Linux.

@Neopallium Neopallium changed the title Issues with old plugin not unloading correctly. (Rust + TLS + Linux) Issues with old plugin not unloading correctly. (TLS + Linux) Apr 1, 2019
@Neopallium
Copy link
Contributor Author

Final made a c++ example that uses TLS destructors that has the same issue:

#include <string>
#include <stdio.h>
#include "../cr.h"

thread_local std::string tls_str("hello tls");

CR_EXPORT int cr_main(struct cr_plugin *ctx, enum cr_op operation) {
    fprintf(stdout, "tls_str = '%s'\n", tls_str.c_str());
    return 0;
}

@Neopallium
Copy link
Contributor Author

Compiling the C++ TLS plugin with -fno-use-cxa-atexit flag fixes the unload problem for that plugin. Now to find the same option for Rust plugins. I haven't checked if the TLS destructors are called on unload.

@Neopallium
Copy link
Contributor Author

FYI, found that compiler flag on this stackoverflow answer

@Neopallium
Copy link
Contributor Author

I can't find anyway to disable __cxa_thread_atexit usage in Rust. There doesn't seem to be a compiler flag for Rust. Atleast C++ module can avoid the problem. I might look into manually calling __cxa_thread_atexit from the plugin to see if that releases the shared object.

But I think the only safe option right now is to either not unload the plugin or use unique file names.

@fungos
Copy link
Owner

fungos commented Apr 1, 2019

That is a really interesting finding. Yeah, so I'd imagine that you'll need to provide this symbol in your crate in someway. This may help: rust-lang/rust#36826

@Neopallium
Copy link
Contributor Author

I tried defining __cxa_thread_atexit and __cxa_thread_atexit_impl in the guest, host, guest and host.

Only defining it in the host (disabling TLS destructors for the whole program) will make it work for Rust plugins.

C++ plugins can work with __cxa_thread_atexit defined in the guest, then it doesn't need the -fno-use-cxa-atexit compiler flag.

But doing those hack will disable thread_local destructors, which could be useful. Until I find a way to force dlclose to run the destructors or manually intercept/manage them for each plugin, I think the only work around right now is to use unique file names for each plugin version.

Also I found out why rollback + reload was crashing before when the file name was reused. The rollback didn't unload the old version (still had live references from TLS), so the next dlopen returned a reference to the old library, but cr.h had already changed the contents (file copy). So dlsym returned an invalid pointer for cr_main (loaded from the new copy?).

P.S. I did find a way to detect if a shared library used TLS data. Call dlinfo and request RTLD_DI_TLS_DATA.

@fungos
Copy link
Owner

fungos commented Apr 3, 2019

The pointer returned from dlsym was really invalid or was the same from the previous version that did not completely unload? Knowing this may be helpful in a way.

Anyway, good job investigating all this another c++ foot gun instance. In the end, C reload is basically unaware of C++ idiosyncrasies, unfortunately this seems to pretty much map into Rust domain too.

@Neopallium
Copy link
Contributor Author

It really was invalid. I was even tracking the cr_main pointer for each version and comparing it to /proc/<host pid>/maps address spaces of each loaded plugin file. The previous version was still loaded in the same address space, it didn't change.

It almost seems to be a bug in dlsym, since it doesn't load the new plugin file into a new address space (each loaded SO always gets a new random address, ASLR security). I haven't checked if the cr_main address should have been valid for the new plugin. Valgrind doesn't show any bad memory accesses, just the crashing invalid jump to cr_main.

I think the next step would be to dive into dlopen/dlclose/dlsym to see what is really going on. I don't really have time to go that deep right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants