You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import torch.nn
import torch_mlir
from transformers import BertTokenizer, BertForMaskedLM
from torch.quantization import quantize_dynamic
class HuggingFaceModel(torch.nn.Module):
def __init__(self, model_name, quant):
super().__init__()
self.model = BertForMaskedLM.from_pretrained(model_name)
if quant == "f16":
self.model.to(torch.half)
elif quant == "int8":
self.model = torch.quantization.quantize_dynamic(
self.model, # the model to quantize
{torch.nn.Linear}, # the types of layers to quantize
dtype=torch.qint8, # the data type to quantize to
)
self.model.eval()
def forward(self, inputs, attention):
return self.model(input_ids=inputs, attention_mask=attention).logits
pytorch_model = HuggingFaceModel("bert-large-uncased", "int8")
mlir_model = torch_mlir.compile(
pytorch_model,
[torch.tensor([[0 * 384]]), torch.tensor([[0 * 384]])], # not important for this issue
output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=True)
Issue Description
I'm trying to compile the
int8
quantizedbert-large-uncased
model and encountered the following error:Any ideas?
Steps to Reproduce
Run the following python script:
Attachments
HuggingFaceModel.mlir: https://gist.github.com/alexsifivetw/4a233ebe923aeb88451e4d701809e0e9
The text was updated successfully, but these errors were encountered: