[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

zjgarvey · 2024-05-15T21:29:10Z

Related onnx.Resize issues Shark-Turbine #616

AmosLewis · 2024-05-16T17:04:47Z

python ./run.py --torchmlirbuild ../../torch-mlir/build --tolerance 0.001 0.001 --cachedir ./huggingface_cache --ireebuild ../../iree-build -f onnx -g models --mode onnx --report --tests onnx/models/RRDB_ESRGAN_vaiq_int8 --torchtolinalg
Have you test with shark-testsuites, on my local test, it still fail:

LLVM ERROR: checking for an interface (`mlir::ReifyRankedShapedTypeOpInterface`) that was promised by dialect 'tensor' but never implemented. This is generally an indication that the dialect extension implementing the interface was never registered.

And it would be better to also test with other resize op related model to make sure they all pass.

zjgarvey · 2024-05-16T18:07:21Z

python ./run.py --torchmlirbuild ../../torch-mlir/build --tolerance 0.001 0.001 --cachedir ./huggingface_cache --ireebuild ../../iree-build -f onnx -g models --mode onnx --report --tests onnx/models/RRDB_ESRGAN_vaiq_int8 --torchtolinalg Have you test with shark-testsuites, on my local test, it still fail:
LLVM ERROR: checking for an interface (`mlir::ReifyRankedShapedTypeOpInterface`) that was promised by dialect 'tensor' but never implemented. This is generally an indication that the dialect extension implementing the interface was never registered.
And it would be better to also test with other resize op related model to make sure they all pass.

here is the issue for this. It is unrelated:

#3352

AmosLewis · 2024-05-16T21:55:20Z

I cherry pick this patch and test locally. Looks like someother passed models failed again with this pr:

(half_pixel, linear)
- DeepLabV3_resnet50_vaiq_int8 passed
- FCN_vaiq_int8 passed
- LRASPP_vaiq_int8 passed -> failed
- U-2-Net_vaiq_int8 passed -> failed
(asymmetric, nearest)
- pytorch-3dunet_vaiq_int8
- RRDB_ESRGAN_vaiq_int8
- YoloNetV3_vaiq_int8 passed
- yolov8n_vaiq_int8 passed -> failed

python ./run.py --torchmlirbuild ../../torch-mlir/build --tolerance 0.001 0.001 --cachedir ./huggingface_cache --ireebuild ../../iree-build -f onnx -g models --mode onnx --report --tests onnx/models/U-2-Net_vaiq_int8 --torchtolinalg

| tests                        | model-run   | onnx-import   | torch-mlir   | iree-compile   | inference   |
|:-----------------------------|:------------|:--------------|:-------------|:---------------|:------------|
| onnx/models/LRASPP_vaiq_int8 | passed      | passed        | failed       | notrun         | notrun      |
| onnx/models/U-2-Net_vaiq_int8 | passed      | passed        | passed       | failed         | notrun      |
| onnx/models/yolov8n_vaiq_int8 | passed      | passed        | failed       | notrun         | notrun      |

LRASPP_vaiq_int8.default.torch-onnx.mlir:195:12: error: failed to legalize operation 'torch.aten.convolution' that was explicitly marked illegal
    %191 = torch.operator "onnx.Conv"(%178, %184, %190) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 16 : si64, torch.onnx.kernel_shape = [3 : si64, 3 : si64], torch.onnx.pads = [1 : si64, 1 : si64, 1 : si64, 1 : si64], torch.onnx.strides = [1 : si64, 1 : si64]} : (!torch.vtensor<[1,16,112,112],f32>, !torch.vtensor<[16,1,3,3],f32>, !torch.vtensor<[16],f32>) -> !torch.vtensor<[1,16,112,112],f32> 
           ^
LRASPP_vaiq_int8.default.torch-onnx.mlir:195:12: note: see current operation: %562 = "torch.aten.convolution"(%550, %552, %561, %238, %238, %238, %45, %240, %36) : (!torch.vtensor<[1,16,112,112],!torch.qint8>, !torch.vtensor<[16,1,3,3],!torch.qint8>, !torch.vtensor<[16],si32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int) -> !torch.vtensor<[1,16,112,112],si32>

yolov8n_vaiq_int8.default.torch-onnx.mlir:262:12: error: failed to legalize operation 'torch.aten.convolution' that was explicitly marked illegal
    %258 = torch.operator "onnx.Conv"(%245, %251, %257) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 1 : si64, torch.onnx.kernel_shape = [3 : si64, 3 : si64], torch.onnx.pads = [1 : si64, 1 : si64, 1 : si64, 1 : si64], torch.onnx.strides = [1 : si64, 1 : si64]} : (!torch.vtensor<[1,16,160,160],f32>, !torch.vtensor<[16,16,3,3],f32>, !torch.vtensor<[16],f32>) -> !torch.vtensor<[1,16,160,160],f32> 
           ^

U-2-Net_vaiq_int8.default.onnx.linalg.mlir:7124:12: error: 'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 204800 bytes
    %866 = linalg.generic {indexing_maps = [#map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} outs(%668 : tensor<1x512x20x20xf32>) {
           ^
U-2-Net_vaiq_int8.default.onnx.linalg.mlir:10:3: note: called from
  func.func @torch_jit(%arg0: tensor<1x3x320x320xf32>) -> (tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>, tensor<1x1x320x320xf32>) {
  ^

zjgarvey · 2024-05-16T23:34:11Z

I cherry pick this patch and test locally. Looks like someother passed models failed again with this pr:

* (**half_pixel, linear**)
  
  * DeepLabV3_resnet50_vaiq_int8  passed
  * FCN_vaiq_int8 passed
  * **LRASPP_vaiq_int8** **passed -> failed**
  * **U-2-Net_vaiq_int8** **passed -> failed**

* (**asymmetric, nearest**)
  
  * **pytorch-3dunet_vaiq_int8**
  * **RRDB_ESRGAN_vaiq_int8**
  * YoloNetV3_vaiq_int8 passed
  * **yolov8n_vaiq_int8** **passed -> failed**

Hi @AmosLewis , thanks for testing this out.

The failing with convolution op is happening because the following pr's have not been merged yet:

torch-mlir PR3341 which depends on upstream: llvm-project PR92136

This is not an issue related to this particular patch, but likely came about due to work being done on improving operand quantization in #3327 and #3332

I'm not sure exactly what causes the stack allocation limit issue. It seems to happen during some dequant ops, but this should not be new as far as I am aware. I can focus my attention on these issues if you'd like, but again, I don't think that issue is likely to be specific to this patch.

A good comparison would be to run those same tests at head and compare to this branch.

zjgarvey · 2024-05-16T23:44:44Z

@AmosLewis

Also for reference, a few days ago, I ran all of the onnx model tests and triaged the torch-mlir failures:

Test onnx/models/VideoResNet_vaiq_int8 failed [torch-mlir]
    onnx.constant??
Test onnx/models/MobileNetV3_small_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/RegNet_y_8gf_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/Inception_v4_vaiq_int8 failed [torch-mlir]
    average Pool
Test onnx/models/pytorch-3dunet_vaiq_int8 failed [torch-mlir]
    resize
Test onnx/models/ShuffleNet_v2_x2_0_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/MNASNet_1_3_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/LRASPP_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/RRDB_ESRGAN_vaiq_int8 failed [torch-mlir]
    resize
Test onnx/models/KeypointRCNN_vaiq_int8 failed [torch-mlir]
    onnx if
Test onnx/models/EfficientNet_v2_s_vaiq_int8 failed [torch-mlir]
    grouped q convolution
Test onnx/models/retinanet_resnet50_fpn_vaiq_int8 failed [torch-mlir]
    onnx if
Test onnx/models/ConvNeXt_vaiq_int8 failed [torch-mlir]
    grouped q convolution

All of the ones marked "grouped q convolution" have a fix incoming.

This list and the flags used to run them are in my most recent comment in this issue.

AmosLewis · 2024-05-17T00:28:59Z

A good comparison would be to run those same tests at head and compare to this branch.

Make sense, we need to test along with those patch #3341.

I don't think that issue is likely to be specific to this patch.

Agree. But still need test to double check.

stack allocation limit issue

Could you run on your machine, it might because my VM running out of memory?

zjgarvey · 2024-05-17T00:48:43Z

stack allocation limit issue

Could you run on your machine, it might because my VM running out of memory?

I'm not sure exactly what the guard is in place for. I was reading into someone else's similar issue recently: iree issue.

It might be possible to remove the guard by adding the flag --iree-llvmcpu-fail-on-out-of-bounds-stack-allocation=false to iree-compile, as Mahesh mentioned in that issue. When I tried this for RAFT_vaiq_int8, iree-compile just sat there for like 30 minutes.

…ering (llvm#3351) Addresses [Shark-Turbine llvm#196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine llvm#566](nod-ai/SHARK-Turbine#566) Related onnx.Resize issues [Shark-Turbine llvm#616](nod-ai/SHARK-Turbine#616)

zjgarvey added 2 commits May 15, 2024 18:02

Add support for dynamic dims in interpolate conversion

6310c63

update lit tests and xfails

47f7f84

zjgarvey force-pushed the resize_dyn branch from b5604c6 to 47f7f84 Compare May 15, 2024 21:40

Merge remote-tracking branch 'upstream/main' into resize_dyn

0a4d320

Merge remote-tracking branch 'upstream/main' into resize_dyn

6160bc0

rsuderman approved these changes May 17, 2024

View reviewed changes

rsuderman merged commit 6cba93b into llvm:main May 17, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

zjgarvey commented May 15, 2024

AmosLewis commented May 16, 2024 •

edited

zjgarvey commented May 16, 2024

AmosLewis commented May 16, 2024

zjgarvey commented May 16, 2024

zjgarvey commented May 16, 2024

AmosLewis commented May 17, 2024 •

edited

zjgarvey commented May 17, 2024

[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

Conversation

zjgarvey commented May 15, 2024

AmosLewis commented May 16, 2024 • edited

zjgarvey commented May 16, 2024

AmosLewis commented May 16, 2024

zjgarvey commented May 16, 2024

zjgarvey commented May 16, 2024

AmosLewis commented May 17, 2024 • edited

zjgarvey commented May 17, 2024

AmosLewis commented May 16, 2024 •

edited

AmosLewis commented May 17, 2024 •

edited