Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offering the option to use an existing wgpu device and queue in session initialisation #123

Open
philpax opened this issue Jul 30, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@philpax
Copy link
Contributor

philpax commented Jul 30, 2022

Is your feature request related to a problem? Please describe.
Currently, the API for creating wonnx Sessions requests the device and queue for you, and does not let you pass in your own. I'm looking at using wonnx as part of an existing wgpu context, and would like to reuse the resources I already have initialised.

Describe the solution you'd like
I'd like variants of the Session constructors, or a minor rearrangement of the API, so that users can pass in existing device and queue instances.

Describe alternatives you've considered
Trying to instantiate the session anyway. I'm not entirely sure what would happen if you request the device twice, and it may end up using the wrong device if the host application has explicitly chosen another device to run wgpu operations on.

Additional context
I am also not sure if this is a supported use-case to begin with (embedding wonnx into an existing wgpu application). Are there any potential issues with doing so?

@haixuanTao
Copy link
Collaborator

Have you considered the risk of conflict between several concurrent pipelines? To return the result of inference, we need to change and maintain the device.poll(wgpu::Maintain::Wait); state which might worsen the performance of the second pipeline. Naming can also be conflicting and there is many resource limit that can be hit as well...

Have you tried the alternatives with several Session and seen some performance bottleneck or some adapters issues? I think that I have personally tried and did not see any significant issues, but could be wrong...

@SludgePhD
Copy link
Contributor

One long-term feature I'd like to see is having wonnx expose its input wgpu::Buffer so that the application can use the GPU to produce the inference input instead of always uploading it from the CPU. This would be helpful for my use case, which involves some image preprocessing (which currently happens on the CPU, but should be moved to the GPU at some point).

@philpax
Copy link
Contributor Author

philpax commented Jul 30, 2022

Have you considered the risk of conflict between several concurrent pipelines? To return the result of inference, we need to change and maintain the device.poll(wgpu::Maintain::Wait); state which might worsen the performance of the second pipeline. Naming can also be conflicting and there is many resource limit that can be hit as well...

Have you tried the alternatives with several Session and seen some performance bottleneck or some adapters issues? I think that I have personally tried and did not see any significant issues, but could be wrong...

Honestly, I'm not sure. My plan is to integrate it into a Bevy application, which uses wgpu behind the scenes, and I figured I'd need to crowbar it in somehow. I'll see if I can give it a shot at some point and see how it works out.

One long-term feature I'd like to see is having wonnx expose its input wgpu::Buffer so that the application can use the GPU to produce the inference input instead of always uploading it from the CPU. This would be helpful for my use case, which involves some image preprocessing (which currently happens on the CPU, but should be moved to the GPU at some point).

This would also be very nice to have, for similar reasons. For example - one of the applications I have in mind is using ResNet/CLIP/similar to automatically classify models by rendering them to a texture and then running inference on the texture. Avoiding the unnecessary CPU copy would be superb.

@haixuanTao
Copy link
Collaborator

So, defining the input as a Buffer is totally conceivable, and fairly fast to implement normally ( haven't looked at the code for quite some time and want to do a big refactoring to integrate more tract code so can't help you out at the moment).

I think that the general consensus would be to limit wonnx to the computepipeline and leave wgpu manipulation to the user. Making it more low-level, with more leverage for optimisation. This is all fine for me. Probably going to need some refactoring,

I am personally all in for a minimal wonnx library, wrapping as less wgpu as possible.

@pixelspark pixelspark added the enhancement New feature or request label Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants