Offering the option to use an existing wgpu device and queue in session initialisation #123

philpax · 2022-07-30T15:15:26Z

Is your feature request related to a problem? Please describe.
Currently, the API for creating wonnx Sessions requests the device and queue for you, and does not let you pass in your own. I'm looking at using wonnx as part of an existing wgpu context, and would like to reuse the resources I already have initialised.

Describe the solution you'd like
I'd like variants of the Session constructors, or a minor rearrangement of the API, so that users can pass in existing device and queue instances.

Describe alternatives you've considered
Trying to instantiate the session anyway. I'm not entirely sure what would happen if you request the device twice, and it may end up using the wrong device if the host application has explicitly chosen another device to run wgpu operations on.

Additional context
I am also not sure if this is a supported use-case to begin with (embedding wonnx into an existing wgpu application). Are there any potential issues with doing so?

The text was updated successfully, but these errors were encountered:

haixuanTao · 2022-07-30T19:46:28Z

Have you considered the risk of conflict between several concurrent pipelines? To return the result of inference, we need to change and maintain the device.poll(wgpu::Maintain::Wait); state which might worsen the performance of the second pipeline. Naming can also be conflicting and there is many resource limit that can be hit as well...

Have you tried the alternatives with several Session and seen some performance bottleneck or some adapters issues? I think that I have personally tried and did not see any significant issues, but could be wrong...

SludgePhD · 2022-07-30T20:16:27Z

One long-term feature I'd like to see is having wonnx expose its input wgpu::Buffer so that the application can use the GPU to produce the inference input instead of always uploading it from the CPU. This would be helpful for my use case, which involves some image preprocessing (which currently happens on the CPU, but should be moved to the GPU at some point).

philpax · 2022-07-30T21:02:05Z

Have you considered the risk of conflict between several concurrent pipelines? To return the result of inference, we need to change and maintain the device.poll(wgpu::Maintain::Wait); state which might worsen the performance of the second pipeline. Naming can also be conflicting and there is many resource limit that can be hit as well...

Have you tried the alternatives with several Session and seen some performance bottleneck or some adapters issues? I think that I have personally tried and did not see any significant issues, but could be wrong...

Honestly, I'm not sure. My plan is to integrate it into a Bevy application, which uses wgpu behind the scenes, and I figured I'd need to crowbar it in somehow. I'll see if I can give it a shot at some point and see how it works out.

One long-term feature I'd like to see is having wonnx expose its input wgpu::Buffer so that the application can use the GPU to produce the inference input instead of always uploading it from the CPU. This would be helpful for my use case, which involves some image preprocessing (which currently happens on the CPU, but should be moved to the GPU at some point).

This would also be very nice to have, for similar reasons. For example - one of the applications I have in mind is using ResNet/CLIP/similar to automatically classify models by rendering them to a texture and then running inference on the texture. Avoiding the unnecessary CPU copy would be superb.

haixuanTao · 2022-07-30T21:09:28Z

So, defining the input as a Buffer is totally conceivable, and fairly fast to implement normally ( haven't looked at the code for quite some time and want to do a big refactoring to integrate more tract code so can't help you out at the moment).

I think that the general consensus would be to limit wonnx to the computepipeline and leave wgpu manipulation to the user. Making it more low-level, with more leverage for optimisation. This is all fine for me. Probably going to need some refactoring,

I am personally all in for a minimal wonnx library, wrapping as less wgpu as possible.

pixelspark added the enhancement New feature or request label Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offering the option to use an existing wgpu device and queue in session initialisation #123

Offering the option to use an existing wgpu device and queue in session initialisation #123

philpax commented Jul 30, 2022

haixuanTao commented Jul 30, 2022

SludgePhD commented Jul 30, 2022

philpax commented Jul 30, 2022

haixuanTao commented Jul 30, 2022

Offering the option to use an existing wgpu device and queue in session initialisation #123

Offering the option to use an existing wgpu device and queue in session initialisation #123

Comments

philpax commented Jul 30, 2022

haixuanTao commented Jul 30, 2022

SludgePhD commented Jul 30, 2022

philpax commented Jul 30, 2022

haixuanTao commented Jul 30, 2022