Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to share context among threads, if not why? #306

Open
menglin0320 opened this issue Aug 4, 2021 · 10 comments
Open

Is there a way to share context among threads, if not why? #306

menglin0320 opened this issue Aug 4, 2021 · 10 comments

Comments

@menglin0320
Copy link

Basically I want to achieve concurrent work with multithreading and my current inference code is pycuda + tensorrt.
why I want to do so
I'm trying to optimize the inference throughput for a model with dynamic input. the size difference between samples can be quite significant. So I want to avoid padding but still do something similar to batching, I want to run several samples concurrently with the same engine. The inference time will still be bottlenecked by the biggest sample in the batch but a lot of flops are saved, also it prevents possible performance drop from padding too much.

my current understanding of the problem
From what I understood If works are in different cuda contexts there is no real parallel working, instead it is just better scheduling. Also one process can only have one cuda context but threads can share contexts. It may not be true for pycuda so I need to . But I didn't find anything talking about how to share one context among threads yet.

I found the official example here for using multithreading with pycuda link

Device.make_context()
There's not much difference between multithreading and multiprocess then. If each thread owns it's own context then there is no real concurrent work.

My question:
I just wonder if my understanding on context is right. And I wonder if there is a way to share context between different threads. I feel it should be possible, if it is not possible with pycuda, can anyone briefly explain why?

@menglin0320
Copy link
Author

menglin0320 commented Aug 4, 2021

I just learned abou GIL. So we have to use multiprocess and I can only dive into mpi to solve my problem if I want to stick with python?

@inducer
Copy link
Owner

inducer commented Aug 7, 2021

This also came up recently in #305 (comment). PyCUDA currently assumes that each context can only be active in a single thread. It appears that this was true up until CUDA 4, but this restriction was then lifted. I would welcome a PR that removes this restriction. It might be as simple as deleting the check for uniqueness of activation.

@menglin0320
Copy link
Author

Yes I also saw it. I may switch to polygraphy instead. I don't know much about cuda wrappers and I chose pycuda only because official tensorrt example used it. But the test code in tensorrt used polygraphy instead. But it seems like polygraphy hide all details about contexts. Hope that it can work.

@menglin0320
Copy link
Author

menglin0320 commented Aug 9, 2021

The nvidia guys told me that tensorrt inference releases GIL. That's a good news, if the new feature would be added it can be useful in this case.

@inducer
Copy link
Owner

inducer commented Aug 21, 2021

How come this got closed? The question you raised is a real concern to my mind, and I wouldn't be opposed to the issue staying open.

@menglin0320 menglin0320 reopened this Aug 21, 2021
@menglin0320
Copy link
Author

okay, just one quick question also. I found that pycuda is a lot quicker than polygraphy when doing memcpy. Do you know the reason?

@inducer
Copy link
Owner

inducer commented Aug 21, 2021

PyCUDA isn't doing anything special with memcpy. It just calls the corresponding CUDA function. For an additional speedboost, you can use "page-locked" memory (on the host side).

@menglin0320
Copy link
Author

k, I'll try to read the source code myself...

@zacario-li
Copy link

@menglin0320 similar situation with you, and I have a solution
How to perform different models in different gpu simultaneously

@bmerry
Copy link
Contributor

bmerry commented Sep 29, 2023

Would fixing this also make it possible for CUDA objects to be safely garbage collected in threads where the context is not current?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants