Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I get an error when try to use yolov8m.onnx in wonnx-wasm-example #174

Open
vitiok123 opened this issue Jul 17, 2023 · 9 comments
Open

I get an error when try to use yolov8m.onnx in wonnx-wasm-example #174

vitiok123 opened this issue Jul 17, 2023 · 9 comments

Comments

@vitiok123
Copy link

Describe the bug
A clear and concise description of what the bug is.
Hi,
I try to use your example (https://github.com/webonnx/wonnx-wasm-example).
When I try to use yolov8m.onnx (a coco datset exported with YOLOv8 to onnx), I get this error
SessionError 'IR error: output node for output /model.0/conv/Conv_output_0 not found'

To Reproduce
Steps to reproduce the behavior:

  1. const [modelBytes, initResult] = await Promise.all([fetchBytes("./data/models/yolov8m.onnx"), init()])
  2. const session = await Session.fromBytes(modelBytes)

Expected behavior
To not have error

Screenshots
image
image

Desktop (please complete the following information):

  • OS: Win10 Pro (10.0.19045 Build 19045)
  • Browser: Chrome
  • Version: 114.0.5735.199
@pixelspark
Copy link
Collaborator

Can you share the specific onnx file you are using?

The error in general means that the output is missing somehow. If it is in the ONNX file and properly connected, there may be an issue in the optimizer.

@vitiok123
Copy link
Author

Can you share the specific onnx file you are using?

The error in general means that the output is missing somehow. If it is in the ONNX file and properly connected, there may be an issue in the optimizer.

Hi,
you can find the onnx file in this repository (file: yolov8m.onnx)
https://github.com/AndreyGermanov/yolov8_onnx_python

I used python and yolov8 to export this file. When export it's possible to add some arguments. This is the list of arguments
https://docs.ultralytics.com/modes/export/#arguments
Maybe this will help to understand if the problem is in export settings.

@pixelspark
Copy link
Collaborator

Hm, the file linked does not have all its shapes inferred (nnx prepare is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).

After simplifying with onnx-simplifier (see README) there are still issues as the outputs of some Resize nodes are not inferred yet:

[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape
[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape

The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as roi is in this case):

image

This should however not pose an issue since the optimizer will move inputs to attributes for Resize and in that process, ignore the optional roi input.

So my suggestion would be to try again with the optimized version (obtained using python3 -m onnxsim ./model.onnx ./simplified.onnx).

@vitiok123
Copy link
Author

Hm, the file linked does not have all its shapes inferred (nnx prepare is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).

After simplifying with onnx-simplifier (see README) there are still issues as the outputs of some Resize nodes are not inferred yet:

[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape
[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape

The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as roi is in this case):

image This should however not pose an issue since the optimizer will move inputs to attributes for `Resize` and in that process, ignore the optional `roi` input.

So my suggestion would be to try again with the optimized version (obtained using python3 -m onnxsim ./model.onnx ./simplified.onnx).

Wow, Cool, Thanks a lot.
I will try and give you feedback.

@vitiok123
Copy link
Author

Hm, the file linked does not have all its shapes inferred (nnx prepare is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).

After simplifying with onnx-simplifier (see README) there are still issues as the outputs of some Resize nodes are not inferred yet:

[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape
[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape

The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as roi is in this case):

image This should however not pose an issue since the optimizer will move inputs to attributes for `Resize` and in that process, ignore the optional `roi` input.

So my suggestion would be to try again with the optimized version (obtained using python3 -m onnxsim ./model.onnx ./simplified.onnx).

After python3 -m onnxsim ./model.onnx ./simplified.onnx this is the statistic

Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add        │ 15             │ 14               │
│ Concat     │ 19             │ 19               │
│ Constant   │ 189            │ 183              │
│ Conv       │ 84             │ 84               │
│ Div        │ 2              │ 1                │
│ Gather     │ 1              │ 0                │
│ MaxPool    │ 3              │ 3                │
│ Mul        │ 80             │ 78               │
│ Reshape    │ 5              │ 5                │
│ Resize     │ 2              │ 2                │
│ Shape      │ 1              │ 0                │
│ Sigmoid    │ 78             │ 78               │
│ Slice      │ 2              │ 2                │
│ Softmax    │ 1              │ 1                │
│ Split      │ 9              │ 9                │
│ Sub        │ 2              │ 2                │
│ Transpose  │ 1              │ 1                │
│ Model Size │ 99.0MiB        │ 98.9MiB          │
└────────────┴────────────────┴──────────────────┘

Using simplified model I get this in console:

Info logs

transferring input split for op Split to i64 attribute (initializer data type: I64): [48, 48]
applying padding optimization to tensor model.2.m.0.cv1.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.0.cv2.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.1.cv1.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.1.cv2.conv.weight: strides data is 82944 bytes before, 110592 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [96, 96]
applying padding optimization to tensor model.4.m.0.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.0.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.1.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.1.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.2.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.2.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.3.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.3.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [192, 192]
applying padding optimization to tensor model.6.m.0.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.0.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.1.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.1.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.2.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.2.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.3.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.3.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [288, 288]
applying padding optimization to tensor model.8.m.0.cv1.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.0.cv2.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.1.cv1.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.1.cv2.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after

And after error

panicked at 'internal error: entered unreachable code', wonnx/src/optimizer.rs:95:67

Stack:

Error
    at imports.wbg.__wbg_new_abda76e883ba8a5f (http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx.js?v=a126f01e:481:21)
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1080]:0x14a444
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2887]:0x18e73a
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1666]:0x17502e
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1812]:0x17b84f
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2232]:0x187b4c
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2441]:0x18b798
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2273]:0x188936
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[180]:0x1b83e
    at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[189]:0x34dbb
Uncaught (in promise) RuntimeError: unreachable
    at wonnx_bg.wasm:0x175068
    at wonnx_bg.wasm:0x17b84f
    at wonnx_bg.wasm:0x187b4c
    at wonnx_bg.wasm:0x18b798
    at wonnx_bg.wasm:0x188936
    at wonnx_bg.wasm:0x1b83e
    at wonnx_bg.wasm:0x34dbb
    at wonnx_bg.wasm:0xef85f
    at wonnx_bg.wasm:0x3492e
    at wonnx_bg.wasm:0xef85f

@pixelspark
Copy link
Collaborator

Good news and bad news:

The above does seem to be a bug in the optimizer, it appears to attempt constant folding on the missing node. I just committed 5d20e96 to fix that. Now unfortunately I get a different issue:

RUST_LOG=wonnx=debug RUST_BACKTRACE=1 cargo run --release -- infer ~/Downloads/yolov8m-simplified-2.onnx
[2023-07-18T19:50:08Z DEBUG wonnx::gpu] sequence tensor onnx::Split_180 (outputs readable=false)
[2023-07-18T19:50:08Z WARN  wonnx::gpu] initializers with int64 data type are not supported, converting into int32 initializer
[2023-07-18T19:50:08Z INFO  wonnx::gpu] creating buffer: onnx::Split_180 8b
[2023-07-18T19:50:08Z DEBUG wonnx::gpu] sequence op: /model.2/Split_output_0 (Split) (outputs readable=false)
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_bind_group
      note: label = `/model.2/Split_output_0`
    Number of bindings in bind group descriptor (4) does not match the number of bindings defined in the bind group layout (3)

It does appear the split input (number 2) is properly transferred to an attribute:

[2023-07-18T19:55:15Z DEBUG wonnx::optimizer] locally_optimized_node_with NodeIdentifier(0x600001c81b40, "/model.2/Split_output_0") op: /model.2/Split (Split)
[2023-07-18T19:55:15Z INFO  wonnx::optimizer] transferring input split for op Split to i64 attribute (initializer data type: I64): [48, 48]

So for some reason it thinks there should be four buffers in one place but three in another. In the generated shader code, it has three (as expected: the split input is moved to an attribute earlier):

 @group(0) @binding(0)
    var<storage, read> input_0: Array;
    
    
    	@group(0) @binding(1)
    	var<storage, read_write> output_0: Array;
    
    	@group(0) @binding(2)
    	var<storage, read_write> output_1: Array;

Hence, there must still be two inputs in the IR (even after split is moved to an attribute) while only one is ever used by the shader (it expects all other input to be moved to attributes), which leads to the error.

This needs some further investigation (I don't have the time for it now) but at least we know where to look.

@vitiok123
Copy link
Author

Cool, good to know about this.
No problem, when will be done, will be done :)

Thank you a lot for your super fast help and answer.

@mersinvald
Copy link

Hi @pixelspark, I've encountered the same error trying to run YOLOv8 via wonnx. Have you had a chance to look into this issue yet?

If you don't have time for that, but could offer some guidance in debugging, that would be very much appreciated too :)

@pixelspark
Copy link
Collaborator

Hi @pixelspark, I've encountered the same error trying to run YOLOv8 via wonnx. Have you had a chance to look into this issue yet?

If you don't have time for that, but could offer some guidance in debugging, that would be very much appreciated too :)

I haven't (and frankly don't have the time), unfortunately.

If I were you I would start by investigating whether your ONNX file too has the issue with the Split operator, and check how many inputs it has. You might be able to rewrite (using Python onnx package) the ONNX file to something wonnx accepts. Another possibility would be to tweak the ONNX opset version (perhaps the issue is caused because there are different forms Split can take depending on the opset version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants