Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: interpolate: fix concurrency issues throughout #20671

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

andfoy
Copy link
Contributor

@andfoy andfoy commented May 8, 2024

Reference issue

See gh-20669

What does this implement/fix?

This PR will add concurrency tests for the main interpolation classes in SciPy. This is done in order to potentially discover any concurrency, race conditions or deadlocks that may appear when using multithreaded Python.

Additional information

  • BSpline: State is initialized in constructor, it shouldn't have any concurrency issues at first glance.
  • PPoly (and family): State (aka polynomial coefficients) can mutate under extend method, all other calls depend on the state
  • RbfInterpolator: There are no functions that can mutate an instance state, thus is thread-safe, as long KDTree is safe: ENH: spatial: ensure thread-safety for KDTree #20655
  • Rbf: There are no functions that can mutate an instance state.
  • NdBSpline: State is initialized in constructor, no concurrency issues found
  • NdPPoly: State is initialized in constructor, no concurrency issues found
  • NearestNDInterpolator: State is initialized in constructor, no mutation methods are available. It depends on KDTree being thread-safe: ENH: spatial: ensure thread-safety for KDTree #20655
  • RegularGridInterpolator: A wrapped NdBSpline instance can mutate during the call, introducing unwanted side effects on a concurrent scenario. Fixed on 7eb6d59
  • LinearNDInterpolator/CloughTocherInterpolator: The NDInterpolatorBase does not have any state mutation methods. Since they depend on QHull access, concurrent calls will be serialized ENH: spatial: serialize concurrent calls to QHull #20619
  • UnivariateSpline/_BivariateSplineBase (and family): According to Mark fitpack sources as recursive #16053 (comment), all FORTRAN code that is declared with the recursive keyword is reentrant and thread safe, as long as there are not any variables declared with the save keyword. The port of FITPACK used by SciPy is not using save, and all routines are declared as recursive. Classes that use the FITPACK backend base classes (aka UnivariateSpline and _BivariateSplineBase) do not have any state-mutating public function different from the constructor, so they should be thread-safe in principle.

@github-actions github-actions bot added scipy.interpolate maintenance Items related to regular maintenance tasks labels May 8, 2024
@andfoy andfoy changed the title TST: Add concurrency tests throughout scipy.interpolate ENH: Fix concurrency issues throughout scipy.interpolate May 14, 2024
Comment on lines +1016 to +787
with self._c_lock:
pass

Copy link
Contributor Author

@andfoy andfoy May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimistic-lock; it expects that multiple threads to be calling non-mutating functions most of the time, allowing them to run concurrently. It behaves like a turnstile, allowing threads that arrived first to pass, and then blocks if another thread calls extend, until it finishes updating the polynomial coefficients.

While this solution may yield dirty-reads for threads that execute before extend, it allows for better concurrency by not serializing each call that depends on the coefficients, which would slowdown the overall execution time when multiple threads are calling instructions different from extend.

An alternative would be to signal in the docs that the API is not thread-safe with respect to extend and let the user handle the concurrency explicitly when a mixture of extend and other calls happen

@Kai-Striega
Copy link
Member

@andfoy this looks like a good initiative. Let me know when you require someone to review.

@andfoy
Copy link
Contributor Author

andfoy commented May 16, 2024

Thanks @Kai-Striega, I'll expect to complete the overview by next week, however, feel free to comment and review on any of the current changes

@ev-br
Copy link
Member

ev-br commented May 16, 2024

The case of PPoly.extend is a curious one indeed.

  • the method mutates the state of self;
  • the only way to get into trouble in the nogil world is for a user to explicitly use multithreading on the python level.
  • the user load should be quite specific, maybe even contrived (N threads __call__ and (N+1)-th thread calls .extend, something like this)

Question: who's responsible for locking, 1) the library or 2) a user? More specifically, if there's a threading issue in user code, where the bug is: in the library or in the userland?

If 1), then every library in the ecosystem needs to add locking to all mutating methods. Thus any threading violation is valid bug report which we ought to fix.
If 2), it would be great to have a piece of documentation to point users to, ideally that docs contains a proper threading incantation with how to lock ("copy-paste this snippet and add level of indentation to your code").

ISTM there are two questions here, in fact. The first one is "who's responsible for locking" and the answer is, I strongly suspect is "it depends".
The second question is what is the default responsibility. If a method is mutating and is not explicitly documented as thread-safe or thread-unsafe, which one it is? I think explicit documentation is for exceptional cases, so the default is rather important to establish at least SciPy-wide, maybe even ecosystem-wide.

@lucascolley lucascolley changed the title ENH: Fix concurrency issues throughout scipy.interpolate ENH: interpolate: fix concurrency issues throughout May 17, 2024
@rgommers
Copy link
Member

ISTM there are two questions here, in fact. The first one is "who's responsible for locking" and the answer is, I strongly suspect is "it depends".

Not really - the library is responsible. Wherever we do something that isn't thread-safe, it requires a lock. That was already true before free-threaded CPython (e.g. we had a lock around messagestream in scipy._lib); the difference now is that a lot more code is potentially unsafe because the GIL no longer provides implicit locking for calls that go via a Python layer.

The second question is what is the default responsibility. If a method is mutating and is not explicitly documented as thread-safe or thread-unsafe,

For explicitly mutating methods (which should be very rare in SciPy), then of course the user has to be careful - doing that in multiple threads is not going to happen in a predefined order. I'm not sure why you're asking though - can you give an example of a SciPy function/method that is relevant here?

@ev-br
Copy link
Member

ev-br commented May 22, 2024

not sure why you're asking though - can you give an example of a SciPy function/method that is relevant here?

The original comment ( motivated by an offline discussion with @andfoy ) was about PPoly.extend. But in fact it's simpler:
PPoly.c is a public attribute, so a user may modify it in one thread and evaluate in another.

Basically, this PR adds locks all around. This guards against all sorts of contrived scenarios. And I'm talking about exactly that: what are the lengths we want to go to guard against contrived usage.

@ev-br
Copy link
Member

ev-br commented May 22, 2024

Now that it's testing threaded workloads, it'd be great to pick up tests cases from
#14162 and #11828

This is basically to double-check these issues do not make a comeback in a nogil world :-). This will also give us more confidence in general f2py wrapper fortran parts in various submodules.

@andfoy
Copy link
Contributor Author

andfoy commented May 22, 2024

The Alpine test segfault is fixed if scipy-openblas is used. I'll skip the test until #20585 gets merged

@rgommers
Copy link
Member

I'll skip the test until #20585 gets merged

That's done now at long last:)

@lucascolley lucascolley added enhancement A new feature or improvement and removed maintenance Items related to regular maintenance tasks labels May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants