Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well #636

ManifoldFR · 2024-03-13T12:27:11Z

Hi!

In the following line in flextGL.h, the flextGL global variable has the attribute FLEXTGL_EXPORT which is the empty CORRADE_VISIBILITY_STATIC when the build is static (hence the symbol is hidden, following the -fvisibility compile flag):

magnum/src/MagnumExternal/OpenGL/GL/flextGL.h

Line 2629 in b141901

extern FLEXTGL_EXPORT FlextGL flextGL;

But since it's a global and we could make a shared library against a static build of Magnum, shouldn't we check for MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS just like in GL/Context.cpp ?

magnum/src/Magnum/GL/Context.cpp

Lines 672 to 684 in b141901

    
           #if defined(MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS) && !defined(CORRADE_TARGET_WINDOWS) 
        
           /* On static builds that get linked to multiple shared libraries and then used 
        
              in a single app we want to ensure there's just one global symbol. On Linux 
        
              it's apparently enough to just export, macOS needs the weak attribute. 
        
              Windows handled differently below. */ 
        
           CORRADE_VISIBILITY_EXPORT 
        
               #ifdef CORRADE_TARGET_GCC 
        
               __attribute__((weak)) 
        
               #else 
        
               /* uh oh? the test will fail, probably */ 
        
               #endif 
        
           #endif 
        
           Context* currentContext = nullptr;

I tried the following change where the global's visibility is set to default when we require unique globals:

extern
#if defined(MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS) && !defined(CORRADE_TARGET_WINDOWS)
    CORRADE_VISIBILITY_EXPORT
#else
    FLEXTGL_EXPORT
#endif
FlextGL flextGL;

Or maybe the FLEXTGL_EXPORT macro should be changed? For me, the above fixed a segmentation fault issue I was having when instantiating a custom shader defined in my shared library (linked against a static build of Magnum, working on Ubuntu with clang). I saw that there's a mention of FlextGL symbols in #453.

The text was updated successfully, but these errors were encountered:

mosra · 2024-03-13T14:11:23Z

Hi, thanks for a very detailed report!

There's two different things happening here. One is that the symbol isn't exported if Magnum is built as static, which is how it should be (no other "usual" symbols are exported from static builds either). So if you have a static build of Magnum linked into a shared library, the symbols are reachable just from the static library itself and not from outside, i.e. from another shared lib or an executable. Using Magnum from there means linking to its static libraries again, which is what you did (and if you wouldn't, the build would end with a linker error). The consequence of linking the static library to two shared libraries is that its code gets duplicated in both shared libraries, which is wasteful from a code size perspective, but other than that it's a valid usage. For the most part.

The other thing is that, unfortunately, even after I tried very hard to avoid mutable global state, there's still some left, in this particular case for the current GL context and the GL entrypoints. (Another mutable global state is debug/error output redirection or static plugin management in the plugin manager, the in-progress Vulkan wrapper OTOH doesn't rely on any global state.) Apart from this global state needing to be thread-local to avoid races, the main problem is that the globals get duplicated when you link the same static library to two different shared libraries, like you do in your case. So one shared library sees a global that's different from what the other sees, thus if a GL::Context setup populates the flextGL entrypoints in one shared library, the other shared library still has its own copy that's all untouched null pointers and 💥

There's various platform-specific ways to solve this, what the other globals rely on is __attribute__((weak)), which in all my testing ensured that just one copy of the global gets picked for all shared libraries. That's just Unix-like systems tho, on Windows this is worked around with having to call GetProcAddress() at runtime and hoping that the call picks the same symbol every time (with some more hacks piled on top because on Windows you can have a thing either thread-local or exported but not both, and GetProcAddress() needs to explicitly get the name of the DLL it looks into, there's no concept of "look just anywhere" like with dlsym()), then this is further complicated for example in case of Python bindings where I had to modify the Python's internal library loading flags in order to make them share the symbols instead of isolating every loaded module into its own namespace. Which then caused nasty conflicts with e.g. Pytorch, that started crashing in case something in the pile of code loaded into Python was built against a different libstdc++ than the rest. Etc., etc. So even with all those workarounds, it's still rather brittle, I don't trust it, and projects that use this kind of setup of static libraries linked into multiple shared libraries I'm trying to convince to switch to a fully shared build.

What didn't get this "global deduplication" treatment so far, and what #453 mentions in particular, is this flextGL struct, and the reason is performance. Because, compared to debug output redirection or shared plugin manager state which is accessed relatively seldom, the application may call into GL thousand times a frame, and the extra indirection caused by these workarounds, thread-local state + GetProcAddress() in particular, would affect the performance quite a lot I fear. So I didn't do that so far and was waiting for a better idea to arrive. And I'm still waiting.

But maybe being a bit slow is still better than causing nasty segfaults, heh. I'll think about this.

That it did work for you with just exporting the symbol is just a coincidence, I fear. With a slightly different linking order or a more complex library setup it can still result in a different global being picked for every shared lib, so there really needs to be something explicitly for ensuring the deduplication. Any other ideas? :)

ManifoldFR · 2024-03-13T16:52:53Z

Wow, thanks for this very detailed and in-depth answer! I learned a lot of things.

I thought I had the visibility thing figured out but definitely did not think about the overhead when sharing global state.
So in the end the lesson is that linking static to shared might be bad practice because downstream dependents could blow up?

I guess I will switch to a build with shared libraries.

mosra · 2024-03-13T17:33:32Z

I think one solution to your use case would be a static build of Magnum with all symbols exported, that gets linked only to one of your shared libraries, and the rest of the project / other shared libraries would then link to that shared library, not to (static) Magnum. Because linking to static Magnum would cause the duplicate globals issue to appear again.

This could be doable if I would provide a way to supply a custom value of CORRADE_VISIBILITY_STATIC from the command line. But that is also rather brittle I fear, you'd have to make the rest of your buildsystem think that your shared library is actually Magnum, which means not being able to use any of the CMake support that Magnum provides.

Generally, linking static libraries to shared ones is a valid use case for example if you have a middleware product that ships as a shared library, and you don't want its internals to be known, or used from outside. Then, if a user of given shared library would want to use Magnum on their end as well, they'd have to build its own and link to it again. The two instances (assuming [CORRADE,MAGNUM]_BUILD_STATIC_UNIQUE_GLOBALS is turned off in the builds) would then be completely independent of each other, having no shared state. This was the case for one project I worked on in the past, where we shipped a shared library that used Magnum for GPU data processing in a thread (and it wasn't desirable for third parties to peek into it), but also shipped a sample application that used Magnum for the GUI.

But besides this one use case, if it's desired to have Magnum used across shared libraries, the best is to either have it build as shared as well, or as static, but limit your code to just one shared library / just one executable. Lot less headaches that way :)

In any case, I'll attempt to fix the flextGL global in some way, eventually. One project that uses the static+shared build currently works around that in a particularly nasty way, by calling flextGLInit() directly from "the other" shared library, but that's nothing I would recommend either.

ManifoldFR · 2024-03-14T17:16:31Z

Generally, linking static libraries to shared ones is a valid use case for example if you have a middleware product that ships as a shared library, and you don't want its internals to be known, or used from outside. Then, if a user of given shared library would want to use Magnum on their end as well, they'd have to build its own and link to it again. The two instances (assuming [CORRADE,MAGNUM]_BUILD_STATIC_UNIQUE_GLOBALS is turned off in the builds) would then be completely independent of each other, having no shared state. This was the case for one project I worked on in the past, where we shipped a shared library that used Magnum for GPU data processing in a thread (and it wasn't desirable for third parties to peek into it), but also shipped a sample application that used Magnum for the GUI.

But besides this one use case, if it's desired to have Magnum used across shared libraries, the best is to either have it build as shared as well, or as static, but limit your code to just one shared library / just one executable. Lot less headaches that way :)

I think I'll go with the shared library option, because we'd want others to be able to look at the internals and extend functionality so it makes sense

Cheers, thanks for answering my questions and enlightening me :)

mosra added this to TODO in GL via automation Mar 13, 2024

mosra added this to the 2024.0a milestone Mar 13, 2024

mosra changed the title ~~flextGL: shouldn't MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS apply to the FlextGL global?~~ Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well #636

Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well #636

ManifoldFR commented Mar 13, 2024 •

edited

mosra commented Mar 13, 2024 •

edited

ManifoldFR commented Mar 13, 2024

mosra commented Mar 13, 2024

ManifoldFR commented Mar 14, 2024

Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well #636

Apply MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS for the flextGL globals as well #636

Comments

ManifoldFR commented Mar 13, 2024 • edited

mosra commented Mar 13, 2024 • edited

ManifoldFR commented Mar 13, 2024

mosra commented Mar 13, 2024

ManifoldFR commented Mar 14, 2024

ManifoldFR commented Mar 13, 2024 •

edited

mosra commented Mar 13, 2024 •

edited