Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpuarray.dot() works too slow at the first calling #309

Open
decoli opened this issue Aug 25, 2021 · 4 comments
Open

gpuarray.dot() works too slow at the first calling #309

decoli opened this issue Aug 25, 2021 · 4 comments

Comments

@decoli
Copy link

decoli commented Aug 25, 2021

I found it will cost much time when the first calling of gpuarray.dot().
Here is my code:

...
# the first time calling
start.record()
# res_gpu = gpuarray.dot(coef_gpu, image_gpu)
gpuarray.dot(coef_gpu, image_gpu)
end.record()
end.synchronize()
secs = start.time_till(end)
print("\ntime cost: {:.3f}ms\n".format(secs)) # time cost: 813.931ms


# the second time calling
start.record()
# res_gpu = gpuarray.dot(coef_gpu, image_gpu)
gpuarray.dot(coef_gpu, image_gpu)
end.record()
end.synchronize()
secs = start.time_till(end)
print("\ntime cost: {:.3f}ms\n".format(secs)) # time cost: 0.056ms
...

Why it will happen? And how can I solve the problem?

@inducer
Copy link
Owner

inducer commented Aug 25, 2021

That's because the first time the function is called, a few kernels are compiled behind the scenes to do the work. The basic assumption is that your program will run for long enough (otherwise, why are you using a GPU to speed it up?) that this cost will be more than amortized. Also, that cost should only be incurred once. The kernels should be in the disk cache after that, making them quick to load.

@decoli
Copy link
Author

decoli commented Aug 26, 2021

Thanks for your reply.
I guess...before actual use of gpuarray.dot(), I can call it for the kernels being compiled, like code:

...
gpuarray.dot(like_coef_gpu, like_image_gpu) # just for the kernels being compiled
...
...
gpuarray.dot(coef_gpu, image_gpu) # really calling

Is this a good solution for it?

@inducer
Copy link
Owner

inducer commented Aug 26, 2021

If that works for your use case, then yes, that should avoid compilation/module load delays on subsequent runs of the kernel.

@decoli
Copy link
Author

decoli commented Aug 26, 2021

Oh... I found gpuarray.dot() is different from numpy.dot().

It seems that

import skcuda.linalg as linalg
linalg.dot()

can be regarded as a package that can run on the GPU and can be used with pycuda.

However, it will get error:
CUSOLVER library only available in CUDA 7.0 and later

New problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants