Compilation of Quantized Model Failed #3220

alexsifivetw · 2024-04-24T09:23:39Z

Issue Description

I'm trying to compile the int8 quantized bert-large-uncased model and encountered the following error:

  File "/XXX/py3.11/lib/python3.11/site-packages/torch_mlir/__init__.py", line 460, in compile
    run_pipeline_with_repro_report(
  File "/XXX/py3.11/lib/python3.11/site-packages/torch_mlir/compiler_utils.py", line 73, in run_pipeline_with_repro_report
    raise TorchMlirCompilerError(trimmed_message) from None
torch_mlir.compiler_utils.TorchMlirCompilerError: Lowering TorchScript IR -> Torch Backend IR failed with the following diagnostics:


python exception: Failure while executing pass pipeline:
error: unknown: unsupported by backend contract: module initializers
note: unknown: see current operation: "torch.initialize.global_slots"(%161, %165, %169, %173, %179, %183, %189, %193, %197, %201, %207, %211, %217, %221, %225, %229, %235, %239, %245, %249, %253, %257, %263, %267, %273, %277, %281, %285, %291, %295, %301, %305, %309, %313, %319, %323, %329, %333, %337, %341, %347, %351, %357, %361, %365, %369, %375, %379, %385, %389, %393, %397, %403, %407, %413, %417, %421, %425, %431, %435, %441, %445, %449, %453, %459, %463, %469, %473, %477, %481, %487, %491, %497, %501, %505, %509, %515, %519, %525, %529, %533, %537, %543, %547, %553, %557, %561, %565, %571, %575, %581, %585, %589, %593, %599, %603, %609, %613, %617, %621, %627, %631, %637, %641, %645, %649, %655, %659, %665, %669, %673, %677, %683, %687, %693, %697, %701, %705, %711, %715, %721, %725, %729, %733, %739, %743, %749, %753, %757, %761, %767, %771, %777, %781, %785, %789, %795, %799, %805, %809, %813, %817, %823, %827, %834, %839) <{slotSymNames = [@model.bert.encoder.layer.0.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.0.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.0.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.0.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.0.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.0.output.dense._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.1.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.1.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.1.output.dense._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.2.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.2.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.2.output.dense._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.3.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.3.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.3.output.dense._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.4.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.4.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.4.output.dense._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.5.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.5.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.5.output.dense._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.6.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.6.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.6.output.dense._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.7.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.7.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.7.output.dense._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.8.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.8.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.8.output.dense._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.9.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.9.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.9.output.dense._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.10.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.10.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.10.output.dense._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.11.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.11.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.11.output.dense._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.12.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.12.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.12.output.dense._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.13.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.13.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.13.output.dense._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.14.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.14.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.14.output.dense._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.15.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.15.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.15.output.dense._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.16.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.16.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.16.output.dense._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.17.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.17.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.17.output.dense._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.18.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.18.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.18.output.dense._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.19.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.19.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.19.output.dense._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.20.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.20.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.20.output.dense._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.21.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.21.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.21.output.dense._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.22.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.22.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.22.output.dense._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.23.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.23.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.23.output.dense._packed_params._packed_params, @model.cls.predictions.transform.dense._packed_params._packed_params, @model.cls.predictions.decoder._packed_params._packed_params]}> : (!torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams) -> ()
note: unknown: this is likely due to InlineGlobalSlots being unable to inline a global slot

For Torch-MLIR developers, the error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='builtin.module(torchscript-module-to-torch-backend-pipeline{backend-legal-ops=aten.flatten.using_ints,aten.adaptive_avg_pool1d extra-library=})' /tmp/HuggingFaceModel.mlir
Add '-mlir-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.

Any ideas?

Steps to Reproduce

Run the following python script:

import torch.nn
import torch_mlir
from transformers import BertTokenizer, BertForMaskedLM
from torch.quantization import quantize_dynamic

class HuggingFaceModel(torch.nn.Module):
    def __init__(self, model_name, quant):
        super().__init__()
        self.model = BertForMaskedLM.from_pretrained(model_name)
        if quant == "f16":
            self.model.to(torch.half)
        elif quant == "int8":
            self.model = torch.quantization.quantize_dynamic(
                self.model,        # the model to quantize
                {torch.nn.Linear}, # the types of layers to quantize
                dtype=torch.qint8, # the data type to quantize to
            )
        self.model.eval()

    def forward(self, inputs, attention):
        return self.model(input_ids=inputs, attention_mask=attention).logits

pytorch_model = HuggingFaceModel("bert-large-uncased", "int8")
mlir_model = torch_mlir.compile(
    pytorch_model,
    [torch.tensor([[0 * 384]]), torch.tensor([[0 * 384]])], # not important for this issue
    output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
    use_tracing=True)

Attachments

HuggingFaceModel.mlir: https://gist.github.com/alexsifivetw/4a233ebe923aeb88451e4d701809e0e9

The text was updated successfully, but these errors were encountered:

stellaraccident · 2024-04-24T15:49:46Z

Another user using an old API that definitely will not work. Need to update docs. Discussing on discord.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation of Quantized Model Failed #3220

Compilation of Quantized Model Failed #3220

alexsifivetw commented Apr 24, 2024

stellaraccident commented Apr 24, 2024

Compilation of Quantized Model Failed #3220

Compilation of Quantized Model Failed #3220

Comments

alexsifivetw commented Apr 24, 2024

Issue Description

Steps to Reproduce

Attachments

stellaraccident commented Apr 24, 2024