Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prange/OpenMP issues under the free-threading CPython build #6198

Open
lysnikolaou opened this issue May 13, 2024 · 10 comments
Open

prange/OpenMP issues under the free-threading CPython build #6198

lysnikolaou opened this issue May 13, 2024 · 10 comments

Comments

@lysnikolaou
Copy link
Contributor

Describe your issue

Using multiple threads with prange and OpenMP leads to issues under the free-threading build. There's a few test failures, but the easier one to reproduce is this:

import cython
from cython.parallel import prange

def prange_with_gil(n: cython.int, x):
    i: cython.int
    s: cython.int = 0

    for i in prange(n, num_threads=3, nogil=True):
        with cython.gil:
            s += x * i

    return s

Running this with the gil enabled is okay:

>>> import prange_with_gil
<frozen importlib._bootstrap>:488: RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'prange_with_gil', which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0.
>>> prange_with_gil.prange_with_gil(10, 3)
135

But, running it with the GIL disabled leads to issues:

>>> import prange_with_gil
>>> prange_with_gil.prange_with_gil(10, 3)
prange_with_gil.c: prange_with_gil()
REFNANNY: Too many decrefs on line 2281, reference acquired on lines []
135

There are multiple issues here that need discussion:

  1. If we remove the with cython.gil line, Cython starts complaining about operations not being allowed without holding the GIL. Those errors should probably be removed in the context of the free-threading build.
  2. Running with the GIL disabled and doing with cython.gil altogether does not make much sense. What do we do in those cases? We continue to support with cython.gil and let it be a no-op when the GIL is disabled? Do we warn against it? Is it an error?
  3. And then there's the question of the thread safety issues that could arise. We need to implement locking around such cases.
  4. The refnanny itself assumes that it's holding the GIL by just calling PyThreadState_Get, which needs changing. We should be locking around it as well.
@da-woods
Copy link
Contributor

I suggest we ignore 1 and (most of) 2 for now. They're future questions about adding new functionality rather than getting existing functionality to work. For now I'd let with cython.gil be a no-op.

And then there's the question of the thread safety issues that could arise. We need to implement locking around such cases.

My feeling is that there's basically 4 internal variables that Cython uses to keep trace of state inside a prange loop:

PyObject *__pyx_parallel_exc_type = NULL, *__pyx_parallel_exc_value = NULL, *__pyx_parallel_exc_tb = NULL;
int __pyx_parallel_why;

These are assumed to be guarded by the GIL and so we need to put some locking around those in free-threading mode. I think doing that would solve a large chunk of problems (not everything, but possibly everything specific to prange)

The refnanny itself assumes that it's holding the GIL by just calling PyThreadState_Get, which needs changing. We should be locking around it as well.

refnanny is only an internal testing tool and is disabled while running user code. That means we could do something crude like:

  • disable it for free-threading builds
  • put some fairly coarse locking into it on free-threading builds

@lysnikolaou
Copy link
Contributor Author

put some fairly coarse locking into it on free-threading builds

I'd go with this option. Having a functional and correct refnanny will be useful in testing the free threading build and avoid introducing reference counting errors. Performance isn't a big issue, as long as the refnanny remains usable and doesn't become a blocker for CI.

@da-woods
Copy link
Contributor

I've had a quick go in #6199. With that said it's untested because I don't have a free-threading build installed anywhere, so I don't expect it to work.

@da-woods
Copy link
Contributor

I think to make things generally thread safe is going to be quite hard. Consider:

    cdef object val
    for i in prange(n, nogil=True):
        with cython.gil:
            val = i

What outcome you get is obviously arbitrary (and that isn't Cython's problem), but to make the reference counting of val thread-safe we're going to need to do all assignments to it atomically (even just to avoid problems where pointers are read in a half-written state). The same will apply to module globals, any closure variable, any class member. Function locals without a prange/parallel block are probably safe though.

I don't know how far we want to go here? Does it only apply PyObject* or do we need to make anything dealing with integers atomic too? That's a pretty drastic change.

For now I'm tempted to add an import-time warning to any module with with gil when running in free-threading mode to discourage people from doing things we know are unsafe.

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented May 16, 2024

This is correct. The reference counting itself can be considered thread-safe at the CPython level, but the fact that this stores pointers in temporaries and then calls DECREF on them means that the pointers might have changed by the time we update the reference count.

Also, locking inside the refnanny is probably not enough, right? If we have a shared variable that gets assigned the result of calling a C API that returns a strong reference and we call __Pyx_GOTREF on it, there's still the danger that it might have changed before we actually lock inside the GOTREF implementation.

@da-woods
Copy link
Contributor

Also, locking inside the refnanny is probably not enough, right?

Locking inside of refnanny is only really intended to make the refnanny internal state consistent. I wasn't imagining that it made the references that it acts on thread-safe.

Looking at something simple like x = call() we end up with:

__pyx_t_1 = __pyx_f_2gr_call(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF_SET(__pyx_v_x, __pyx_t_1);
__pyx_t_1 = 0;

I'm fairly sure the temps (e.g. __pyx_t_1 are thread-local in prange) so the __Pyx_GOTREF should probably be safe.

I think it's just __Pyx_DECREF_SET(__pyx_v_x, __pyx_t_1); that needs to be made atomic in this case. Although there are definitely lots of different variants of this scattered in the code.

@da-woods
Copy link
Contributor

I had a go at testing this, and couldn't get it to give me any refnanny errors unfortunately (I did manage to convince myself that it was running multiple threads and it was running without the GIL). So I don't know if my proposed changes make it better or worse.

@da-woods
Copy link
Contributor

2. We continue to support with cython.gil and let it be a no-op when the GIL is disabled? Do we warn against it? Is it an error?

I think we still need with cython.gil because it isn't actually a no-op. We've spun up a bunch of new C threads and it does (or at least should...) ensure that each one has a Python thread state associated with it.

@lysnikolaou
Copy link
Contributor Author

I think we still need with cython.gil

Yes, this is true, although in that case the naming is a bit confusing. It's more like cython.thread_state_ensure at that point, right? I'm not suggesting to rename, but we might want to explicitly mention that in the docs.

@da-woods
Copy link
Contributor

[...] in the docs.

I definitely plan to add some documentation for the current state of the freethreading build. I think I'd like to wait for either the Python 3.13 release or a Cython release to do that (and wait to see what we're documenting).


I think the main thing we're missing is a way to do gilstate-ensure once per thread rather than once per iteration. I imagine it's a bit expensive so we should eventually provide that. But I'd rather get existing features working first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants