3D Textures not working with uint8 type #388

ichlubna · 2022-10-24T11:00:30Z

I have edited the test example and changed the type from float to uint8, expecting to use it as unsigned char in the kernel. The output values are, however not the same, they seem to be bit-shifted or misaligned when being read from the memory. Below is the code, the expected result is that the printed values at the end are the same (simplified to copying of just the first value). Now I am getting the expected 20 and the wrong 161. I apologize if this is my lack of understanding but I haven't found anything regarding this topic in the docs and thus I consider this a bug.

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

shape = (2, 4, 8)

a = np.asarray(np.full(shape,20), dtype=np.uint8, order="F")
descr = drv.ArrayDescriptor3D()
descr.width = shape[0]
descr.height = shape[1]
descr.depth = shape[2]
descr.format = drv.dtype_to_array_format(a.dtype)
descr.num_channels = 1
descr.flags = 0
ary = drv.Array(descr)
copy = drv.Memcpy3D()
copy.set_src_host(a)
copy.set_dst_array(ary)
copy.width_in_bytes = copy.src_pitch = a.strides[1]
copy.src_height = copy.height = shape[1]
copy.depth = shape[2]
copy()

mod = SourceModule(
    """
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

__global__ void copy_texture(unsigned char *dest)
{
  if(threadIdx.x + threadIdx.y + threadIdx.z == 0)
      dest[0] = tex3D(mtx_tex, 0, 0, 0);
}
"""
)
copy_texture = mod.get_function("copy_texture")
mtx_tex = mod.get_texref("mtx_tex")
mtx_tex.set_array(ary)
dest = np.zeros(shape, dtype=np.uint8, order="F")
copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex])
print(a[0][0][0])
print(dest[0][0][0])

This seems to be related to the memory alignment rules. When the descriptor is set to uint32 (descr.format = drv.dtype_to_array_format(np.uint32)) instead of the input array type, the right values can be obtained when accessed as unsigned char in the kernel. This is a workaround, or a correct solution? In case of 4 8bit channels, it would be necessary to pass the values reinterpreted as an array of integers?

The text was updated successfully, but these errors were encountered:

ichlubna · 2022-10-24T15:39:17Z

Here is the working example of both one and four channels. Is this really the right approach?

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

width = 100
height = 200
count = 5
blockSize = (256,1,1)

def copyAndGetArr(a):
    descr = drv.ArrayDescriptor3D()
    descr.width = width
    descr.height = height
    descr.depth = count
    descr.format = drv.dtype_to_array_format(np.uint32)
    descr.num_channels = 1
    descr.flags = 0
    ary = drv.Array(descr)
    copy = drv.Memcpy3D()
    copy.set_src_host(a)
    copy.set_dst_array(ary)
    copy.width_in_bytes = copy.src_pitch = a.strides[1]
    copy.src_height = copy.height = height
    copy.depth = count
    copy()
    return ary

def one():
    a = np.full((count, height, width), 20, np.uint8)
    ary = copyAndGetArr(a)

    mod = SourceModule(
        """
    texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
          printf("\\n %d \\n", tex3D(mtx_tex, 0, 2, 1));
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    shape = (256,1,1)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

def four():
    channels = 4
    a = np.full((count, height, width, channels), 20, np.uint8)
    a[0][0][0][0] = 1;
    a[0][0][0][1] = 2;
    a[0][0][0][2] = 3;
    a[0][0][0][3] = 4;
    ary = copyAndGetArr(a.view(">u4"))

    mod = SourceModule(
        """
    texture<int, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
      {
        auto px{tex3D(mtx_tex, 0, 0, 0)};
        auto p{*reinterpret_cast<uchar4*>(&px)};
        printf("\\n %d %d %d %d \\n", p.x, p.y, p.z, p.w);
      }
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

one()
four()

inducer · 2022-10-24T19:52:34Z

What format comes out of descr.format = drv.dtype_to_array_format(a.dtype)? Is that the correct one in your view?

ichlubna · 2022-10-25T06:22:45Z

@inducer Yes, the line is taken from the test example, right? I would expect it to simply contain the type of one element since we can also provide number of channels etc. It is this in the first code I posted:

print(a.dtype)
descr.format = drv.dtype_to_array_format(a.dtype)
print(descr.format)

Output:

uint8
UNSIGNED_INT8

I expected the descriptor to deal with the alignment since we need to set all the attributes.

Where is the problem? Is the second example I posted the way it should be used? Why can't I access the elements in kernel as uchar4 directly? Or how to do so?

inducer · 2022-10-25T14:28:00Z

That looks OK. Could you reference the Nvidia docs on how this is intended to work? It's been years since I've directly worked with textures, I don't remember.

ichlubna · 2022-10-26T09:08:20Z

Hmm it seems like the plain Cuda way is slightly different, looking here and here. What I would expect in pycuda is this:

a = np.full((count, height, width), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = 1
...
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex

We have a texture of unsigned char values. The type is set to the right type and one channel. When reading the values with tex3D, I would expect to see the one channel value as the result. Now I am getting 161 instead of 20 which I don't understand, since all values are bytes - how would alignment break such values?

Here:

channels = 4
a = np.full((count, height, width, channels), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = channels
...
texture<uchar4, 3, cudaReadModeElementType> mtx_tex

I would expect the classic RGBA texture. Right now, it seems like the num_channels doesn't do much to the result. The only way I made it work are the examples above.

My concern or misunderstanding is about the combination of descr.format and descr.num_channels in regards to the texture definition and reading in the kernel.

ichlubna added the bug label Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3D Textures not working with uint8 type #388

3D Textures not working with uint8 type #388

ichlubna commented Oct 24, 2022 •

edited

ichlubna commented Oct 24, 2022

inducer commented Oct 24, 2022

ichlubna commented Oct 25, 2022 •

edited

inducer commented Oct 25, 2022

ichlubna commented Oct 26, 2022

3D Textures not working with uint8 type #388

3D Textures not working with uint8 type #388

Comments

ichlubna commented Oct 24, 2022 • edited

ichlubna commented Oct 24, 2022

inducer commented Oct 24, 2022

ichlubna commented Oct 25, 2022 • edited

inducer commented Oct 25, 2022

ichlubna commented Oct 26, 2022

ichlubna commented Oct 24, 2022 •

edited

ichlubna commented Oct 25, 2022 •

edited