Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Textures not working with uint8 type #388

Open
ichlubna opened this issue Oct 24, 2022 · 5 comments
Open

3D Textures not working with uint8 type #388

ichlubna opened this issue Oct 24, 2022 · 5 comments
Labels

Comments

@ichlubna
Copy link

ichlubna commented Oct 24, 2022

I have edited the test example and changed the type from float to uint8, expecting to use it as unsigned char in the kernel. The output values are, however not the same, they seem to be bit-shifted or misaligned when being read from the memory. Below is the code, the expected result is that the printed values at the end are the same (simplified to copying of just the first value). Now I am getting the expected 20 and the wrong 161. I apologize if this is my lack of understanding but I haven't found anything regarding this topic in the docs and thus I consider this a bug.

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

shape = (2, 4, 8)

a = np.asarray(np.full(shape,20), dtype=np.uint8, order="F")
descr = drv.ArrayDescriptor3D()
descr.width = shape[0]
descr.height = shape[1]
descr.depth = shape[2]
descr.format = drv.dtype_to_array_format(a.dtype)
descr.num_channels = 1
descr.flags = 0
ary = drv.Array(descr)
copy = drv.Memcpy3D()
copy.set_src_host(a)
copy.set_dst_array(ary)
copy.width_in_bytes = copy.src_pitch = a.strides[1]
copy.src_height = copy.height = shape[1]
copy.depth = shape[2]
copy()

mod = SourceModule(
    """
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

__global__ void copy_texture(unsigned char *dest)
{
  if(threadIdx.x + threadIdx.y + threadIdx.z == 0)
      dest[0] = tex3D(mtx_tex, 0, 0, 0);
}
"""
)
copy_texture = mod.get_function("copy_texture")
mtx_tex = mod.get_texref("mtx_tex")
mtx_tex.set_array(ary)
dest = np.zeros(shape, dtype=np.uint8, order="F")
copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex])
print(a[0][0][0])
print(dest[0][0][0])

This seems to be related to the memory alignment rules. When the descriptor is set to uint32 (descr.format = drv.dtype_to_array_format(np.uint32)) instead of the input array type, the right values can be obtained when accessed as unsigned char in the kernel. This is a workaround, or a correct solution? In case of 4 8bit channels, it would be necessary to pass the values reinterpreted as an array of integers?

@ichlubna ichlubna added the bug label Oct 24, 2022
@ichlubna
Copy link
Author

Here is the working example of both one and four channels. Is this really the right approach?

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

width = 100
height = 200
count = 5
blockSize = (256,1,1)

def copyAndGetArr(a):
    descr = drv.ArrayDescriptor3D()
    descr.width = width
    descr.height = height
    descr.depth = count
    descr.format = drv.dtype_to_array_format(np.uint32)
    descr.num_channels = 1
    descr.flags = 0
    ary = drv.Array(descr)
    copy = drv.Memcpy3D()
    copy.set_src_host(a)
    copy.set_dst_array(ary)
    copy.width_in_bytes = copy.src_pitch = a.strides[1]
    copy.src_height = copy.height = height
    copy.depth = count
    copy()
    return ary

def one():
    a = np.full((count, height, width), 20, np.uint8)
    ary = copyAndGetArr(a)

    mod = SourceModule(
        """
    texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
          printf("\\n %d \\n", tex3D(mtx_tex, 0, 2, 1));
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    shape = (256,1,1)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

def four():
    channels = 4
    a = np.full((count, height, width, channels), 20, np.uint8)
    a[0][0][0][0] = 1;
    a[0][0][0][1] = 2;
    a[0][0][0][2] = 3;
    a[0][0][0][3] = 4;
    ary = copyAndGetArr(a.view(">u4"))

    mod = SourceModule(
        """
    texture<int, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
      {
        auto px{tex3D(mtx_tex, 0, 0, 0)};
        auto p{*reinterpret_cast<uchar4*>(&px)};
        printf("\\n %d %d %d %d \\n", p.x, p.y, p.z, p.w);
      }
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

one()
four()

@inducer
Copy link
Owner

inducer commented Oct 24, 2022

What format comes out of descr.format = drv.dtype_to_array_format(a.dtype)? Is that the correct one in your view?

@ichlubna
Copy link
Author

ichlubna commented Oct 25, 2022

@inducer Yes, the line is taken from the test example, right? I would expect it to simply contain the type of one element since we can also provide number of channels etc. It is this in the first code I posted:

print(a.dtype)
descr.format = drv.dtype_to_array_format(a.dtype)
print(descr.format)

Output:

uint8
UNSIGNED_INT8

I expected the descriptor to deal with the alignment since we need to set all the attributes.

Where is the problem? Is the second example I posted the way it should be used? Why can't I access the elements in kernel as uchar4 directly? Or how to do so?

@inducer
Copy link
Owner

inducer commented Oct 25, 2022

That looks OK. Could you reference the Nvidia docs on how this is intended to work? It's been years since I've directly worked with textures, I don't remember.

@ichlubna
Copy link
Author

Hmm it seems like the plain Cuda way is slightly different, looking here and here. What I would expect in pycuda is this:

a = np.full((count, height, width), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = 1
...
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex

We have a texture of unsigned char values. The type is set to the right type and one channel. When reading the values with tex3D, I would expect to see the one channel value as the result. Now I am getting 161 instead of 20 which I don't understand, since all values are bytes - how would alignment break such values?

Here:

channels = 4
a = np.full((count, height, width, channels), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = channels
...
texture<uchar4, 3, cudaReadModeElementType> mtx_tex

I would expect the classic RGBA texture. Right now, it seems like the num_channels doesn't do much to the result. The only way I made it work are the examples above.

My concern or misunderstanding is about the combination of descr.format and descr.num_channels in regards to the texture definition and reading in the kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants