Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running ONNX Resnet18 model gets stuck with command ‘-O 99’ #324

Open
Alwinnnn opened this issue Jan 12, 2024 · 0 comments
Open

Running ONNX Resnet18 model gets stuck with command ‘-O 99’ #324

Alwinnnn opened this issue Jan 12, 2024 · 0 comments

Comments

@Alwinnnn
Copy link

Hi,
I have implemented Rocket64b1gem16 on my FPGA with default configs and 8GiB DDR3.
The ONNX Resnet18 Model sometimes can run with command '-O 99' and I can get the right result. But sometimes it gets stuck.
With the optimizing command '-O 1' , the model can run every time but it takes more time.
Besides, chipyard spike simulator can always run this model with '-O 1' and '-O 99' correctly.
The program always runs correctly on Rocket64b1gem8.
Here are the compared results.

Below is rocket64b1gem16 with '-O 99' result. This model can run correctly with '-O 99' occasionally.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
Number of inputs = 1
Input 0 : name=input.1, type=1, num_dims=4: [1, 3, 256, 256, ]
Number of outputs = 1
Output 0 : name=231, type=1, num_dims=4: [1, 21, 64, 64, ]
yolox init
pose init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.487418 s
normalize_transpose took 0 cycles 3.011997 s
Done! Pre Process 1 took 0 cycles 8.499517 s
Done! Inference 1 took 0 cycles 5.220877 s
Done! Pre Process 1 took 0 cycles 1.010774 s

Below is rocket64b1gem16 with '-O 99' stuck result. This model sometimes gets stuck at the same place.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -2 pose_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic

Below is rocket64b1gem16 with '-O 1' result. This model can run correctly with '-O 1'.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 1
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 25600, 147)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 16)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 32)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 64)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 100, 128)
Called into systolic matmul!
Using accelerated matmul with dimensions (512, 25, 2304)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 25, 512)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 9, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 9, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 4, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 4, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 1, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 400, 576)
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.440022 s
normalize_transpose took 0 cycles 2.139706 s
Done! Pre Process 1 took 0 cycles 7.579837 s
Done! Inference 1 took 0 cycles 17.962803 s
Done! Pre Process 1 took 0 cycles 1.224211 s

I also tried to run this model on Rocket64b1gem8. This model always runs correctly with '-O 99', and it's inference time is much shorter than gem16 which is weird.
Below is rocket64b1gem8 with '-O 99' result.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem8 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 1.830045 s
normalize_transpose took 0 cycles 1.073210 s
Done! Pre Process 1 took 0 cycles 2.903357 s
Done! Inference 1 took 0 cycles 1.933709 s
Done! Pre Process 1 took 0 cycles 0.445910 s

I also changed DDR to 2Gib DDR3, which I get the same result and the model gets stuck at the same place.
What might be the problem?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant