Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFLite GPUv2: ADD(x, 1e-5) results in severely wrong output #67216

Open
gustavla opened this issue May 9, 2024 · 1 comment
Open

TFLite GPUv2: ADD(x, 1e-5) results in severely wrong output #67216

gustavla opened this issue May 9, 2024 · 1 comment
Assignees
Labels
comp:lite TF Lite related issues TF 2.16 TFLiteGpuDelegate TFLite Gpu delegate issue type:bug Bug

Comments

@gustavla
Copy link
Contributor

gustavla commented May 9, 2024

System information

  • Samsung Galaxy S23 / Android 13 /Snapdragon® 8 Gen 2 | SM8550
  • GPUv2 delegate
  • TFLite 2.16.1

Assets:

Please take a look at two outputs in particular of this network:

  • key = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/moments/variance" (variance)
  • key2 = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/batchnorm/add" (add)

The variable variance gets fed into ADD(x, 0.000009999999747378752) and comes out as add.

image

I ran this on the CPU (xnnpack) and the GPU (GPUv2) and got totally different results.

variance looks like this across CPU and GPU (so far consistent):

(Pdb) p cpu[key][0].ravel()
array([0.8386347 , 0.8353483 , 0.83554685, 0.8366282 , 0.8377434 ,
       0.8369055 , 0.8419936 , 0.8433927 , 0.83845955, 0.83644855,
       0.8404068 , 0.8368349 , 0.8335228 , 0.8401757 , 0.83619094,
       0.8386446 ], dtype=float32)
(Pdb) p gpu[key][0].ravel()
array([0.8378906 , 0.83496094, 0.8354492 , 0.8364258 , 0.83691406,
       0.8364258 , 0.8417969 , 0.84375   , 0.83740234, 0.8354492 ,
       0.84033203, 0.83691406, 0.8334961 , 0.83984375, 0.8359375 ,
       0.8383789 ], dtype=float32)

add looks like this across CPU and GPU:

(Pdb) p cpu[key2][0].ravel()
array([0.83864474, 0.8353583 , 0.83555686, 0.8366382 , 0.8377534 ,
       0.8369155 , 0.8420036 , 0.84340274, 0.83846956, 0.83645856,
       0.8404168 , 0.8368449 , 0.8335328 , 0.8401857 , 0.83620095,
       0.83865464], dtype=float32)
(Pdb) p gpu[key2][0].ravel()
array([-0.78222656,  0.20910645, -0.72802734,  0.32421875, -0.78125   ,
        0.20947266, -0.72753906,  0.3244629 , -0.7817383 ,  0.2097168 ,
       -0.7265625 ,  0.3251953 , -0.78125   ,  0.2097168 , -0.72802734,
        0.3244629 ], dtype=float32)

Here, the values on the GPU has gone completely off the rails. They do not look random though, since there is a periodicity to the output (error alternates between around 1.6 and 0.6).

Standalone code to reproduce the issue
This should be simple to set up through benchmark tool or any other way to run GPUv2 directly. I ran it through Qualcomm's AI Hub (https://aihub.qualcomm.com), so I'm attaching the script that I used as a reference. This also shows how the example inputs can be loaded into python.

import numpy as np
import qai_hub as hub

inputs = np.load("67216_post_add_numerical_issues_inputs.npz")

model = hub.upload_model("67216_post_add_numerical_issues.tflite")
device = hub.Device("Samsung Galaxy S23")
input_data = hub.upload_dataset({
    "image": [inputs["image"]], 
    "feature_template": [inputs["feature_template"]],
    "pos_template": [inputs["pos_template"]],
    "pos_search": [inputs["pos_search"]],
})

job_cpu = hub.submit_inference_job(
    model,
    device=device,
    inputs=input_data,
    options="--compute_unit cpu",
)

job_gpu = hub.submit_inference_job(
    model,
    device=device,
    inputs=input_data,
    options="--compute_unit gpu",
)

cpu = job_cpu.download_output_data()
gpu = job_gpu.download_output_data()

key = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/moments/variance"
key2 = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/batchnorm/add"

print(gpu[key2][0].ravel())
@gustavla gustavla added the comp:lite TF Lite related issues label May 9, 2024
@tilakrayal tilakrayal added TFLiteGpuDelegate TFLite Gpu delegate issue TF 2.16 type:bug Bug labels May 9, 2024
@sawantkumar
Copy link

Hi @gustavla ,

I replicated your issue using Qualcom Ai hub, and i got the same results as you. Let me verify the same through an Android app and I will get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues TF 2.16 TFLiteGpuDelegate TFLite Gpu delegate issue type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants