Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of Quantized Model Failed #3220

Open
alexsifivetw opened this issue Apr 24, 2024 · 1 comment
Open

Compilation of Quantized Model Failed #3220

alexsifivetw opened this issue Apr 24, 2024 · 1 comment

Comments

@alexsifivetw
Copy link

Issue Description

I'm trying to compile the int8 quantized bert-large-uncased model and encountered the following error:

  File "/XXX/py3.11/lib/python3.11/site-packages/torch_mlir/__init__.py", line 460, in compile
    run_pipeline_with_repro_report(
  File "/XXX/py3.11/lib/python3.11/site-packages/torch_mlir/compiler_utils.py", line 73, in run_pipeline_with_repro_report
    raise TorchMlirCompilerError(trimmed_message) from None
torch_mlir.compiler_utils.TorchMlirCompilerError: Lowering TorchScript IR -> Torch Backend IR failed with the following diagnostics:


python exception: Failure while executing pass pipeline:
error: unknown: unsupported by backend contract: module initializers
note: unknown: see current operation: "torch.initialize.global_slots"(%161, %165, %169, %173, %179, %183, %189, %193, %197, %201, %207, %211, %217, %221, %225, %229, %235, %239, %245, %249, %253, %257, %263, %267, %273, %277, %281, %285, %291, %295, %301, %305, %309, %313, %319, %323, %329, %333, %337, %341, %347, %351, %357, %361, %365, %369, %375, %379, %385, %389, %393, %397, %403, %407, %413, %417, %421, %425, %431, %435, %441, %445, %449, %453, %459, %463, %469, %473, %477, %481, %487, %491, %497, %501, %505, %509, %515, %519, %525, %529, %533, %537, %543, %547, %553, %557, %561, %565, %571, %575, %581, %585, %589, %593, %599, %603, %609, %613, %617, %621, %627, %631, %637, %641, %645, %649, %655, %659, %665, %669, %673, %677, %683, %687, %693, %697, %701, %705, %711, %715, %721, %725, %729, %733, %739, %743, %749, %753, %757, %761, %767, %771, %777, %781, %785, %789, %795, %799, %805, %809, %813, %817, %823, %827, %834, %839) <{slotSymNames = [@model.bert.encoder.layer.0.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.0.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.0.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.0.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.0.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.0.output.dense._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.1.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.1.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.1.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.1.output.dense._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.2.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.2.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.2.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.2.output.dense._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.3.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.3.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.3.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.3.output.dense._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.4.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.4.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.4.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.4.output.dense._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.5.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.5.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.5.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.5.output.dense._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.6.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.6.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.6.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.6.output.dense._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.7.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.7.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.7.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.7.output.dense._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.8.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.8.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.8.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.8.output.dense._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.9.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.9.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.9.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.9.output.dense._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.10.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.10.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.10.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.10.output.dense._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.11.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.11.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.11.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.11.output.dense._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.12.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.12.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.12.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.12.output.dense._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.13.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.13.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.13.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.13.output.dense._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.14.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.14.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.14.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.14.output.dense._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.15.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.15.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.15.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.15.output.dense._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.16.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.16.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.16.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.16.output.dense._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.17.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.17.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.17.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.17.output.dense._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.18.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.18.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.18.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.18.output.dense._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.19.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.19.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.19.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.19.output.dense._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.20.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.20.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.20.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.20.output.dense._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.21.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.21.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.21.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.21.output.dense._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.22.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.22.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.22.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.22.output.dense._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.query._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.key._packed_params._packed_params, @model.bert.encoder.layer.23.attention.self.value._packed_params._packed_params, @model.bert.encoder.layer.23.attention.output.dense._packed_params._packed_params, @model.bert.encoder.layer.23.intermediate.dense._packed_params._packed_params, @model.bert.encoder.layer.23.output.dense._packed_params._packed_params, @model.cls.predictions.transform.dense._packed_params._packed_params, @model.cls.predictions.decoder._packed_params._packed_params]}> : (!torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams, !torch.LinearParams) -> ()
note: unknown: this is likely due to InlineGlobalSlots being unable to inline a global slot

For Torch-MLIR developers, the error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='builtin.module(torchscript-module-to-torch-backend-pipeline{backend-legal-ops=aten.flatten.using_ints,aten.adaptive_avg_pool1d extra-library=})' /tmp/HuggingFaceModel.mlir
Add '-mlir-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.

Any ideas?

Steps to Reproduce

Run the following python script:

import torch.nn
import torch_mlir
from transformers import BertTokenizer, BertForMaskedLM
from torch.quantization import quantize_dynamic

class HuggingFaceModel(torch.nn.Module):
    def __init__(self, model_name, quant):
        super().__init__()
        self.model = BertForMaskedLM.from_pretrained(model_name)
        if quant == "f16":
            self.model.to(torch.half)
        elif quant == "int8":
            self.model = torch.quantization.quantize_dynamic(
                self.model,        # the model to quantize
                {torch.nn.Linear}, # the types of layers to quantize
                dtype=torch.qint8, # the data type to quantize to
            )
        self.model.eval()

    def forward(self, inputs, attention):
        return self.model(input_ids=inputs, attention_mask=attention).logits

pytorch_model = HuggingFaceModel("bert-large-uncased", "int8")
mlir_model = torch_mlir.compile(
    pytorch_model,
    [torch.tensor([[0 * 384]]), torch.tensor([[0 * 384]])], # not important for this issue
    output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
    use_tracing=True)

Attachments

HuggingFaceModel.mlir: https://gist.github.com/alexsifivetw/4a233ebe923aeb88451e4d701809e0e9

@stellaraccident
Copy link
Collaborator

Another user using an old API that definitely will not work. Need to update docs. Discussing on discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants