Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-safe sf::err() #1724

Draft
wants to merge 2 commits into
base: 2.6.x
Choose a base branch
from
Draft

Thread-safe sf::err() #1724

wants to merge 2 commits into from

Conversation

Bromeon
Copy link
Member

@Bromeon Bromeon commented Dec 6, 2020

Very naive approach for fixing second problem in #1711.

sf::Err is not thread-safe, and since it's used as a global logger across many different classes, this can quickly lead to problems when the user doesn't have a single-threaded SFML application.

This implementation is not ready for merging, we need to discuss at least:

  • how sf::ThreadLocalPtr should be used properly
  • how to clean up (currently, there is a memory leak for each thread)

@Bromeon Bromeon changed the title Keep thread-local instances of sf::err() backing implementation (WIP) Thread-safe sf::err() Dec 6, 2020
@Bromeon
Copy link
Member Author

Bromeon commented Dec 9, 2020

@dec05eba if you find the time, could you check if this branch would also fix the problem you had?

@dec05eba
Copy link

dec05eba commented Dec 9, 2020

@Bromeon Confirmed that the crash is gone in this branch

@Bromeon
Copy link
Member Author

Bromeon commented Dec 10, 2020

Thank you for testing!

@SFML Team:
Before we can merge this, we would need to address the above-mentioned points:

  • how sf::ThreadLocalPtr should be used properly
  • how to clean up (currently, there is a memory leak for each thread)

@Bromeon
Copy link
Member Author

Bromeon commented Feb 28, 2021

Does anyone have a good idea how to address this without memory leak?

@Bromeon
Copy link
Member Author

Bromeon commented Sep 29, 2021

Experimented a bit more with this.
Newest version:

  1. Every time sf::err() is used from a new thread, a ThreadLocalErr instance is lazily initialized and assigned to the ThreadLocalPtr. Once the thread has its instance, it is reused.
  2. To counter the memory leak, I maintain a global GC list with all thread-local instances. I register an atexit hook that iterates through the list, deleting each instance.
  3. To make sure that sf::err() accesses after deletion don't wreak havoc, I return a no-op stream. Since I can't rely on global variables here (we're in the middle of global exit), I leak memory. This is a last resort that shouldn't happen anyway.
  4. The book-keeping variables, which exist once globally, are protected by a mutex.

At this point all that is quite hacky, but so are the rules of C++ global variables and initialization order. I'm only wondering if we shouldn't just leave everything as-is and use a mutex. But let's see how it works out.


Now we have a small example code for testing, but I don't get it to work -- not even the version that the issue author @dec05eba said would be OK. So I'm stuck and it would be nice if someone else could also look into it.

Reproducing code (?)

I used the one @dec05eba provided here, just changed the array to a vector and made the variables local.

For some reason I always get a stack overflow (in both my commits and the original SFML version).
Does that happen for anyone else?

#include <SFML/Graphics/Image.hpp>
#include <thread>

int main()
{
    const int num_images = 3;
    const std::string image_paths[3] = { "working-image.png", "non-existing-image.png", "Tennis.cpp" };

    const int num_threads = 4;
    std::vector<std::thread> threads;

    for(int i = 0; i < num_threads; ++i) {
        threads.emplace_back([&]() {
            while(true) {
                for(int i = 0; i < num_images; ++i) {
                    sf::Image image;
                    image.loadFromFile(image_paths[i]);
                }
            }
        });
    }

    std::this_thread::sleep_for(std::chrono::seconds(10000000));
    return 0;
}

@Bromeon
Copy link
Member Author

Bromeon commented Oct 11, 2021

Any feedback?

@binary1248
Copy link
Member

My opinion is that this change just shifts the problem from one place to another. Global initialization was/is/will always be a tricky subject in C++. While the problem has gotten less bad (yes... it still isn't good) with modern C++ (e.g. thread_local which is just compiler syntactic sugar/magic for doing things that are also possible by hand using platform-specific methods if you really want to), the problem hasn't gone away.

Over the years I have just trained myself to not access any non-POD, non-zero-initializable data that lives in namespace scope in my own code before main() starts. Whenever possible, complex initialization has to happen "downstream" from main() which allows me to specify deterministic construction/destuction ordering. Messing around with objects with complex initialization in global scope whose initialization order is not guaranteed is the literal definition of undefined behaviour. The same applies to destruction ordering even though it is less talked about. Even though it might not lead to noticeable effects in most cases, in my book all UB including relying on the global initialization order (i.e. accessing before main()) of complex objects should always be avoided at any cost. If we cannot guarantee correctness of our code (even if non-/correctness has no visible side-effects) then the program is not fit for distribution, which is ultimately what most people aim for.

In the current changeset, currentErr and activePtrs (and mutex if you go full pedantic mode, std::mutex's constructor is actually constexpr for this very reason) are technically non-POD with complex initialization (even relying in each case on dynamic allocation) and they are in namespace scope. In the unlucky scenario these objects are not yet initialized by the time err() is called you will have similar problems to what is possible now, just shifted to another location. An easy fix would be to lazily, dynamically allocate these objects as well before they are first accessed. Pointers are PODs and if they can be statically initialized to NULL, they will land in the .data segment and are guaranteed to have those values when the process image is loaded and before any code is executed.

I haven't evaluated the effort required, but I think instead of piling one workaround on top of another, we should just adopt air-tight habits and avoid indirect calls to err() from SFML library code in namespace scope whenever it is feasible.

@Bromeon
Copy link
Member Author

Bromeon commented Oct 12, 2021

An easy fix would be to lazily, dynamically allocate these objects as well before they are first accessed.

The thing is, we have two issues at hand: global initialization and multi-threaded access. If we use a global pointer that holds the lazy-initialized book-keeping logic, we run into the "who accesses first" problem. If two sf::err() calls happen around the same time, it's possible that this global pointer is initialized twice. So I don't see a way around a global sf::Mutex instance (given C++98) if we want to maintain thread safety.

But otherwise I agree that it's ugly, and I wish C++ had more idiomatic ways to address this. Ultimately it looks like all possible implementations are just best-effort and have certain shortcomings, as such we might as well argue that fixing the common case (multithreaded err() access during main()) is enough. After all, we also advertise to use SFML resources only during main().

TLDR: maybe we can trade off a "works in 95% of cases" problem with a "works in 99.5%" one.
Not great, not terrible.

@eXpl0it3r
Copy link
Member

eXpl0it3r commented Oct 12, 2021

If most of the complexity just comes from allowing usage in a global scope, then I too would say, this shouldn't really be done.
None of the SFML classes should really be used in a global scope. In Visual Studio any usage of sfml-graphics classes will lead to a crash in sf::Mutex. Other compilers may be less forgiving, but it's all UB. SFML simply doesn't work at a global scope, as such I don't think it's too much asked, to also extend this to sf::err().

@Bromeon
Copy link
Member Author

Bromeon commented Oct 12, 2021

If most of the complexity just comes from allowing usage in a global scope, then I too would say, this shouldn't really be done. None of the SFML classes should really be used in a global scope.

This is not the case here. If you look at the description of the original issue #1711 and the linked example, you see that the code is inside main() (the example uses global strings/ints which have nothing to do with the problem at hand, and they can be easily moved into main()).

The problem is in the title: sf::err() is not thread-safe.
Loading unrelated SFML resources even during main() leads to race conditions.

The global init issue is just part of the attempt to fix it. If I can rely on the assumption that no SFML code (i.e. any sf::err() calls) will be run before or after main(), I can simplify this implementation -- the mutex will always be ready by the time it is accessed, and std::atexit() will always be called after it is accessed.

@Bromeon
Copy link
Member Author

Bromeon commented Nov 2, 2021

So, how do you want me to implement this?

A discussed, I can go with the assumption that no SFML code (i.e. sf::err()) will be run before or after main(), that would simplify quite a bit. Namely:

  • the mutex will always be live
  • std::atexit() is guaranteed to be run after the last sf::err() call
  • I don't need any NullBuffer or any logic that deals with after-main sf::err() calls

Is the assumption always valid? Probably not.
Is it better than the current, broken implementation? Yes.

@Bromeon
Copy link
Member Author

Bromeon commented Dec 4, 2021

Could I get some feedback on this?
Do we want to port it to 2.6.x? I would say it's worth fixing for SFML 2.
Also, if we target master, #1863 will lead to extra conflicts, because we waited too long with this PR.

I would say we go with the last suggestion -- assumption no sf::err() will be called before/after main().

@eXpl0it3r
Copy link
Member

If you think it's important enough for 2.6, then lets add it there.
I personally consider any application that uses SFML objects at global scope invalid, as for most classes it really is UB and very often crashes with VS. Of course sf::Image and some other classes are pretty standalone, but even then using them non-globally is a better option.
Will the implementation with such an assumption lead to UB at global scope or explicitly crash?

For me personally, any solution that improves the situation, especially for multi-threading, seems okay.

@Bromeon
Copy link
Member Author

Bromeon commented Dec 4, 2021

I personally consider any application that uses SFML objects at global scope invalid, as for most classes it really is UB and very often crashes with VS.

As mentioned further above, that's not the issue I'm primarily trying to solve -- we have a race condition, when using SFML objects only in main().

We can just fix the cases where objects are used in main(), and that would already help.
It's not possible to change the target branch of a PR, is it? Otherwise I'd need to create a new one.

@eXpl0it3r
Copy link
Member

It's not possible to change the target branch of a PR, is it? Otherwise I'd need to create a new one.

Of course it is, just click "edit" at the top and there you can change it. Of course need to make sure not to include any changes from master

@Bromeon Bromeon changed the base branch from master to 2.6.x December 4, 2021 14:54
@eXpl0it3r
Copy link
Member

I think it's a lot easier to just make a decision to go one way, make assumptions that seem plausible and then based on that having a discussion, than to try and brain storm solutions on a GitHub issue. 😄
The assumption makes sense to me, but as your explainers after my comments show, I seem to not fully understand the situation... 😅

@eXpl0it3r eXpl0it3r linked an issue May 14, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sf::err() is not thread-safe
5 participants