running float16 model on the CPU #199

sniklaus · 2024-02-01T21:01:29Z

Thank you for this library, really great tool!

I have a mixed-precision ONNX model, which is based on some OnnxCast nodes here and there. This works fine with GPU inference, but when trying to run it on the CPU there are various issues. Specifically, I am getting the following:

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

This makes sense, some of the layers are using half precision and those are not implemented for CPU runtime. So the next logical step is to cast the model parameters to float32 by using model.float(). However, this yields the following error now:

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I didn't dig too deep into this, but my hypothesis is that the OnnxCast nodes are still doing conversions to float16 even though all the model parameters are float32 now. I tried just modifying the FX graph and converting the OnnxCast nodes to noops, but I haven't been able to make that work yet. Maybe adding an argument to onnx2torch.convert would help with this scenario, for example something like cast=False which is cast=True by default. Just a thought.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running float16 model on the CPU #199

running float16 model on the CPU #199

sniklaus commented Feb 1, 2024

running float16 model on the CPU #199

running float16 model on the CPU #199

Comments

sniklaus commented Feb 1, 2024