Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running float16 model on the CPU #199

Open
sniklaus opened this issue Feb 1, 2024 · 0 comments
Open

running float16 model on the CPU #199

sniklaus opened this issue Feb 1, 2024 · 0 comments

Comments

@sniklaus
Copy link

sniklaus commented Feb 1, 2024

Thank you for this library, really great tool!

I have a mixed-precision ONNX model, which is based on some OnnxCast nodes here and there. This works fine with GPU inference, but when trying to run it on the CPU there are various issues. Specifically, I am getting the following:

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

This makes sense, some of the layers are using half precision and those are not implemented for CPU runtime. So the next logical step is to cast the model parameters to float32 by using model.float(). However, this yields the following error now:

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I didn't dig too deep into this, but my hypothesis is that the OnnxCast nodes are still doing conversions to float16 even though all the model parameters are float32 now. I tried just modifying the FX graph and converting the OnnxCast nodes to noops, but I haven't been able to make that work yet. Maybe adding an argument to onnx2torch.convert would help with this scenario, for example something like cast=False which is cast=True by default. Just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant