Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent result with --sparsification-and-bufferization and tensor.empty #92069

Open
Anonymous15592 opened this issue May 14, 2024 · 1 comment
Labels

Comments

@Anonymous15592
Copy link

Consider the following MLIR program:
a.mlir:

module {
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = tensor.extract %arg0[%idx0] : tensor<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %5 = tensor.empty() : tensor<1xi32> // using empty
    // %5 = tensor.from_elements %c0_i32 : tensor<1xi32>
    
    %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32>
    %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32
    %308 = tensor.extract %5[%c0] : tensor<1xi32>
    // vector.print %31 : i32
    vector.print %308 : i32
    return
  }
}

It will output two different results when applying two different optimization pass sequences:
pass sequence1: --sparsification-and-bufferization --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts
pass sequence2: --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts

The pass sequence1 outputs the executable that outputs 1, while the latter outputs 0.
The difference between pass sequence1 and pass sequence2 is that there is an additional --sparsification-and-bufferization at the begining of the pass sequence1.

I futher analyze the output of these two sequences:
pass1: --sparsification-and-bufferization --tensor-bufferize
pass2: --tensor-bufferize
The result of pass1 is:

module {
  func.func @tensor_i32(%arg0: memref<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = memref.load %arg0[%idx0] : memref<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    memref.store %c1_i32, %alloc[%c0] : memref<1xi32>
    %0 = call @tensor_i32(%alloc) : (memref<1xi32>) -> i32
    %1 = memref.load %alloc[%c0] : memref<1xi32>
    vector.print %1 : i32
    return
  }
}

The result of pass2 is:

module {
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %0 = bufferization.to_memref %arg0 : memref<1xi32>
    %idx0 = index.constant 0
    %1 = memref.load %0[%idx0] : memref<1xi32>
    return %1 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    memref.copy %alloc, %alloc_0 : memref<1xi32> to memref<1xi32>
    memref.store %c1_i32, %alloc_0[%c0] : memref<1xi32>
    %0 = bufferization.to_tensor %alloc_0 : memref<1xi32>
    %1 = call @tensor_i32(%0) : (tensor<1xi32>) -> i32
    %2 = memref.load %alloc[%c0] : memref<1xi32>
    vector.print %2 : i32
    return
  }
}

It seems that --sparsification-and-bufferization --tensor-bufferize treats the operand and the result of tensor.insert as same tensor(memref),
when the operand of tensor.insert is created by tensor.empty.

If I replace the tensor.empty with tensor.from_element, or just wrap the tensor.empty with a function. The modified MLIR program will output the same result.
The modified program:

module {
  func.func @gen_tensor_i32() -> tensor<1xi32> {
    %c0_i32 = arith.constant 0 : i32
    %5 = tensor.empty() : tensor<1xi32>
    return %5 : tensor<1xi32>
  }
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = tensor.extract %arg0[%idx0] : tensor<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %5 = call @gen_tensor_i32() : () -> tensor<1xi32>
    // %5 = tensor.empty() : tensor<1xi32> // using empty
    // %5 = tensor.from_elements %c0_i32 : tensor<1xi32>
    
    %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32>
    %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32
    %308 = tensor.extract %5[%c0] : tensor<1xi32>
    // vector.print %31 : i32
    vector.print %308 : i32
    return
  }
}

I wonder if there is some thing wrong with --sparsification-and-bufferization and tensor.empty.
This result inconsistency may not be a problem because tensor.empty should only contains the shpae information.

git version: 2163ae7

@llvmbot
Copy link
Collaborator

llvmbot commented May 14, 2024

@llvm/issue-subscribers-mlir

Author: anonymous (Anonymous15592)

Consider the following MLIR program: a.mlir: ``` module { func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 { %idx0 = index.constant 0 %0 = tensor.extract %arg0[%idx0] : tensor<1xi32> return %0 : i32 } func.func @func1() { %c1_i32 = arith.constant 1 : i32 %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %5 = tensor.empty() : tensor<1xi32> // using empty // %5 = tensor.from_elements %c0_i32 : tensor<1xi32>
%inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor&lt;1xi32&gt;
%31 = call @<!-- -->tensor_i32(%inserted_28) : (tensor&lt;1xi32&gt;) -&gt; i32
%308 = tensor.extract %5[%c0] : tensor&lt;1xi32&gt;
// vector.print %31 : i32
vector.print %308 : i32
return

}
}


It will output two different results when applying two different optimization pass sequences:
```pass sequence1```: ```--sparsification-and-bufferization --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts```
```pass sequence2```: ```--tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts```

The ```pass sequence1``` outputs the executable that outputs 1, while the latter outputs 0.
The difference between ```pass sequence1``` and ```pass sequence2``` is that there is an additional ```--sparsification-and-bufferization``` at the begining of the ```pass sequence1```.

I futher analyze the output of these two sequences:
pass1: ```--sparsification-and-bufferization --tensor-bufferize```
pass2: ```--tensor-bufferize```
The result of ```pass1``` is:

module {
func.func @tensor_i32(%arg0: memref<1xi32>) -> i32 {
%idx0 = index.constant 0
%0 = memref.load %arg0[%idx0] : memref<1xi32>
return %0 : i32
}
func.func @func1() {
%c1_i32 = arith.constant 1 : i32
%c0 = arith.constant 0 : index
%alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
memref.store %c1_i32, %alloc[%c0] : memref<1xi32>
%0 = call @tensor_i32(%alloc) : (memref<1xi32>) -> i32
%1 = memref.load %alloc[%c0] : memref<1xi32>
vector.print %1 : i32
return
}
}

The result of ```pass2``` is:

module {
func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
%0 = bufferization.to_memref %arg0 : memref<1xi32>
%idx0 = index.constant 0
%1 = memref.load %0[%idx0] : memref<1xi32>
return %1 : i32
}
func.func @func1() {
%c1_i32 = arith.constant 1 : i32
%c0_i32 = arith.constant 0 : i32
%c0 = arith.constant 0 : index
%alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
%alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
memref.copy %alloc, %alloc_0 : memref<1xi32> to memref<1xi32>
memref.store %c1_i32, %alloc_0[%c0] : memref<1xi32>
%0 = bufferization.to_tensor %alloc_0 : memref<1xi32>
%1 = call @tensor_i32(%0) : (tensor<1xi32>) -> i32
%2 = memref.load %alloc[%c0] : memref<1xi32>
vector.print %2 : i32
return
}
}


It seems that ```--sparsification-and-bufferization --tensor-bufferize``` treats the operand and the result of ```tensor.insert``` as same tensor(memref), 
when the operand of ```tensor.insert``` is created by ```tensor.empty```.

If I replace the ```tensor.empty``` with ```tensor.from_element```, or just wrap the ```tensor.empty``` with a function. The modified MLIR program will output the same result.
The modified program:

module {
func.func @gen_tensor_i32() -> tensor<1xi32> {
%c0_i32 = arith.constant 0 : i32
%5 = tensor.empty() : tensor<1xi32>
return %5 : tensor<1xi32>
}
func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
%idx0 = index.constant 0
%0 = tensor.extract %arg0[%idx0] : tensor<1xi32>
return %0 : i32
}
func.func @func1() {
%c1_i32 = arith.constant 1 : i32
%c0_i32 = arith.constant 0 : i32
%c0 = arith.constant 0 : index
%5 = call @gen_tensor_i32() : () -> tensor<1xi32>
// %5 = tensor.empty() : tensor<1xi32> // using empty
// %5 = tensor.from_elements %c0_i32 : tensor<1xi32>

%inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor&lt;1xi32&gt;
%31 = call @<!-- -->tensor_i32(%inserted_28) : (tensor&lt;1xi32&gt;) -&gt; i32
%308 = tensor.extract %5[%c0] : tensor&lt;1xi32&gt;
// vector.print %31 : i32
vector.print %308 : i32
return

}
}


I wonder if there is some thing wrong with ```--sparsification-and-bufferization``` and ```tensor.empty```.
This result inconsistency may not be a problem because ```tensor.empty``` should only contains the shpae information.

git version: 2163ae761808ca0e5478357384f6ddbacce279eb
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants