Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not bitwise operator causing PTX compile error #339

Open
Benco11-developement opened this issue Feb 21, 2024 · 1 comment
Open

Not bitwise operator causing PTX compile error #339

Benco11-developement opened this issue Feb 21, 2024 · 1 comment

Comments

@Benco11-developement
Copy link

Describe the bug

PTX compilation fails when using the not bitwise operator "~" on an int :

.version 7.6
.target sm_61
.address_size 64

.visible .entry s0_t0_invert_arrays_intarray_arrays_intarray_4096(.param .u64 .ptr .global .align 8 kernel_context, .param .u64 .ptr .global .align 8 a, .param .u64 .ptr .global .align 8 b, .param .align 8 .u64 size) {
        .reg .s64 rsd<3>;
        .reg .u32 rui<5>;
        .reg .u64 rud<7>;
        .reg .pred rpb<2>;
        .reg .s32 rsi<8>;

BLOCK_0:
        ld.param.u64    rud0, [kernel_context];
        ld.param.u64    rud1, [a];
        ld.param.u64    rud2, [b];
        mov.u32 rui0, %nctaid.x;
        mov.u32 rui1, %ntid.x;
        mul.wide.u32    rud3, rui0, rui1;
        cvt.s32.u64     rsi0, rud3;
        mov.u32 rui2, %tid.x;
        mov.u32 rui3, %ctaid.x;
        mad.lo.s32      rsi1, rui3, rui1, rui2;

BLOCK_1:
        mov.s32 rsi2, rsi1;
LOOP_COND_1:
        setp.lt.s32     rpb0, rsi2, 4096;
        @!rpb0 bra      BLOCK_3;

BLOCK_2:
        add.s32 rsi3, rsi2, 6;
        cvt.s64.s32     rsd0, rsi3;
        shl.b64 rsd1, rsd0, 2;
        add.u64 rud4, rud1, rsd1;
        ld.global.s32   rsi4, [rud4];
        add.u64 rud5, rud2, rsd1;
        not.rn.b32      rsi5, rsi4;
        st.global.s32   [rud5], rsi5;
        add.s32 rsi6, rsi0, rsi2;
        mov.s32 rsi2, rsi6;
        bra.uni LOOP_COND_1;

BLOCK_3:
        ret;
}

[TornadoVM-PTX-JNI] ERROR : cuModuleLoadData -> Returned: 218
PTX to cubin JIT compilation failed! (218)
PTX JIT compilation failed!
[Bailout] Running the sequential implementation. Enable --debug to see the reason.

How To Reproduce

Just run the following code :

import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;
import uk.ac.manchester.tornado.api.types.arrays.IntArray;

public class Main {

    public static void invert(IntArray a, IntArray b, int size) {
        for (@Parallel int i = 0; i < size; i++) {
            b.set(i, ~a.get(i));
        }
    }

    public static void main(String[] args) {
        int size = 4096;

        IntArray a = new IntArray(size);
        IntArray b = new IntArray(size);
 
        a.init(1);
        b.init(0);

        TaskGraph graph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)
                .task("t0", Main::invert, a, b, size)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, b);

        ImmutableTaskGraph immutableTaskGraph = graph.snapshot();
        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph);

        executionPlan.execute();
    }
}

Expected behavior

It should compile and run normally.

Computing system setup (please complete the following information):

  • OS: Windows 10
  • CUDA : 12.3
  • Latest commit 77dfc9b

Additional context

This works fine with an opencl backend.


@jjfumero
Copy link
Member

Hi @Benco11-developement , Thank you for the report. We will take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants