Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: spatial: ensure thread-safety for KDTree #20655

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andfoy
Copy link
Contributor

@andfoy andfoy commented May 6, 2024

Reference issue

What does this implement/fix?

This PR adds a quick check to ensure that concurrent calls to KDTree do not cause any unexpected errors. This is specially handy in the context of No-GIL CPython.

Additional information

In principle, concurrent access to KDTree is allowed out-of-the-box, since there are no methods that can mutate the state of the tree besides the constructor.

@github-actions github-actions bot added scipy.spatial maintenance Items related to regular maintenance tasks labels May 6, 2024
@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

I am a bit uncertain if property .tree is thread-safe or not. It might be sufficiently serialized by the GIL though. It can mutate the inner state.

@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

In context of No-GIL CPython, there are several things that are not thread-safe. For example:

  • Calling query methods or pickling before __init__ has completed.
  • Property .tree
  • Serialization (depickling), I think
  • Rouge thread making call to __init__ (even extra calls cause problems, even now)

@andfoy
Copy link
Contributor Author

andfoy commented May 7, 2024

Thanks for the pointers @sturlamolden. I have the following questions and comments:

  1. Just to clarify, should we be locking the access to .tree before it gets initialized? If I understood correctly, the potential race condition is that T1 creates a KDTree instance with many points so the construction is deferred, then other thread T2 is created with a reference to the same tree as T1 and tries to call any method while the construction takes place in T1? I don't know if this is possible under No-GIL Python, maybe @ngoldbaum has more ideas about this kind of scenario?
  2. Is there a mechanism by which the tree state gets modified during a query call? I took a peek at some of the function definitions, and while concurrent calls would be sharing access to the tree values, they are not being modified on any capacity, unless there is some detail that we are overlooking here?
  3. Serialization is a good point regarding concurrency, however, since there's not a method that is able to add new points or modify the tree (assuming that 2 holds), there shouldn't be any issue regarding state storage. The only problem would be multiple threads trying to write to the same file, which should be (in my opinion) responsibility of the end user

@sturlamolden
Copy link
Contributor

The issue with .tree is that it is an expensive structure to create, and was added for the purpose of introspection and testing, i.e. to verify that a kd-tree is created as expected or for education (show students how it partitions space and data). Since it is normally not needed, its creation is deferred to when someone actually asks for it.

@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

The more serious issue is that __init__ is serialized by the GIL. Also it needs to complete before query functions are called. If we do not have the GIL, we need to serialize it in such a way that is has completed before queries are allowed. However, we should not serialize query functions, only have them wait for the __init__ to finish, so I guess we could set a threading.Event upon finishing __init__ and have query functions wait for it. Normally it will be set before they are called. We must also use a threading.Lock within __init__. It can also be used to serialize access to .tree.

@andfoy
Copy link
Contributor Author

andfoy commented May 7, 2024

Now I understand everything, so, basically we should lock on tree, and also add an atomic flag that signals if the tree structure is ready to use.

@sturlamolden
Copy link
Contributor

__init__ is a callable Python function. So it must be serialized by a lock and it must set an atomic flag upon completion.

.tree must wait for the flag and also be internally serialized by an rlock.

Query functions must wait for the atomic flag set by __init__.

@andfoy
Copy link
Contributor Author

andfoy commented May 7, 2024

Thanks for the helpful discussion @sturlamolden!

@sturlamolden
Copy link
Contributor

No problem. When I wrote the code I depended on the GIL for serialization. For example it ensures that __init__ has completed before queries can be called.

@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

Actually it is already broken as __init__ now calls Python functions instead of the NumPy C API directly, so context switch on the GIL is possible while running __init__. It is hard to create though, but it could result in a race conditon or segfault. Which means it needs the serialization described for No-GIL CPython.

@andfoy
Copy link
Contributor Author

andfoy commented May 7, 2024

Great! Then this PR will fix the concurrency issues in KDTree, let me add the changes and update the title

@andfoy andfoy changed the title TST: Add test to check for thread-safety in KDTree ENH: Ensure thread-safety for KDTree May 7, 2024
@andfoy andfoy changed the title ENH: Ensure thread-safety for KDTree ENH: spatial: Ensure thread-safety for KDTree May 7, 2024
@andfoy andfoy changed the title ENH: spatial: Ensure thread-safety for KDTree ENH: spatial: ensure thread-safety for KDTree May 7, 2024
@lucascolley lucascolley added enhancement A new feature or improvement no-GIL Items related to supporting free-threaded builds of CPython and removed maintenance Items related to regular maintenance tasks labels May 7, 2024
@sturlamolden
Copy link
Contributor

Note that my comments refer to cKDTree. KDTree is a Python wrapper on top of it so it will be necessary to fix both.

@ngoldbaum
Copy link

I don't know if this is possible under No-GIL Python, maybe @ngoldbaum has more ideas about this kind of scenario?

Yes, in free-threaded python when the GIL is disabled then something needs to explicitly lock access to prevent data races. If the data race happens in python-level code, then a threading.Lock or other python synchronization primitive is needed.

@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

We already make use of the threading module in cKDTree´s Cython code so it is not just for Python code. The problems I mentioned happens in Cython level code, which should have been protected by the GIL.

@ngoldbaum
Copy link

Right, C-level code will need to use C-level locking primitives.

@sturlamolden
Copy link
Contributor

Use the primitives in threading like the rest of cKDTree, not cpython.pythread.

@sturlamolden
Copy link
Contributor

sturlamolden commented May 7, 2024

Also you cannot use a bool as an atomic flag. It must be an atomic variable, which a bool is not. Either you use platform-dependet C++ code for atomic read and write on a std::atomic<bool>, or you use threading.Event which takes care of these details. Basically to the latter, because we do not need that kind of microoptimizations here. Also if you use atomic read and write, you must provide your own spinlocks in the query functions, which I am certain we do not want to mess with. Just use threading.Event.

Copy link
Contributor

@sturlamolden sturlamolden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am roughly -inf on the changes to _ckdtree.pyx.

Just use the threading module. The important optmisations are in the C++ code.

scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
scipy/spatial/_ckdtree.pyx Outdated Show resolved Hide resolved
@andfoy
Copy link
Contributor Author

andfoy commented May 10, 2024

@sturlamolden After an internal discussion with the Python free-threaded team, it has been decided that all the free-threaded work assumes that __init__ is effectively serialized and no reference leaks cannot take place (unless the implementation is changed manually) before __init__ has stopped execution, thus the proposed race condition related to initialization doesn't hold up.

Given this, the only potential race condition would be accessing the tree property concurrently.

cc @ngoldbaum @colesbury

@andfoy andfoy force-pushed the test_kdtree_concurrency branch 2 times, most recently from 1222137 to de4ff55 Compare May 14, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement no-GIL Items related to supporting free-threaded builds of CPython scipy.spatial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants