Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: you can only change requires_grad flags of leaf variables. #5754

Open
Cindytjj opened this issue Mar 8, 2024 · 0 comments
Open

Comments

@Cindytjj
Copy link

Cindytjj commented Mar 8, 2024

Describe the bug:
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
when I use NNI to prune my customed transformer ,it first looks great with these
**[2024-03-08 11:11:12] Update indirect mask for call_function: truediv,
[2024-03-08 11:11:12] Update indirect mask for call_function: sqrt,
[2024-03-08 11:11:12] Update indirect mask for call_function: getitem_13,
[2024-03-08 11:11:12] Update indirect mask for call_function: getattr_3,
[2024-03-08 11:11:12] Update indirect mask for call_method: transpose_2, output mask: 0.0000
[2024-03-08 11:11:12] Update indirect mask for call_method: view_2, output mask: 0.0000
[2024-03-08 11:11:12] Update indirect mask for call_module: encoder_encoder_layers_0_attention_value_projection, weight: 0.0000 bias: 0.0000 , output mask: 0.0000 **
until it throw an issue
Traceback (most recent call last):
File "F:\研究生学习文件\研二\时序预测算法\transformer\pythonProject2\0305\4.py", line 219, in
ModelSpeedup(model, dummy_input, masks).speedup_model()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 435, in speedup_model
self.update_indirect_sparsity()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 306, in update_indirect_sparsity
self.node_infos[node].mask_updater.indirect_update_process(self, node)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\mask_updater.py", line 160, in indirect_update_process
output = getattr(model_speedup, node.op)(node.target, args_cloned, kwargs_cloned)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\torch\fx\interpreter.py", line 289, in call_method
return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

我在使用NNI对自己定义的transformer模型进行剪枝的时候报出这个错误,我尝试用L1NormPruner和MovementPruner进行,并且参考了NNI官方对transformer模型的剪枝案例(没有使用案例中的知识蒸馏),都尝试无果,会在speedup的过程中报出以上错误,我无法判断是我对NNI的设置有问题还是我自己定义的transformer模型不符合NNI的标准,故而寻求帮助

Environment:

  • NNI version:3.0
  • Training service (local|remote|pai|aml|etc):local
  • Python version:3.8.0
  • PyTorch version:2.1.2
  • Cpu or cuda version:12.2(cuda)

Reproduce the problem

  • Code|Example:

模型剪枝

from nni.compression.pruning import MovementPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer

print(model)
config_list = [{
'op_types': ['Linear'],
'op_names_re': ['encoder.encoder_layers.0.attention.*'],
'sparse_threshold': 0.1,
'granularity': [4, 4]
}]
pruner = MovementPruner(model, config_list, evaluator, warmup_step=10, cooldown_begin_step=20, regular_scale=20)
pruner.compress(40, 4)
print(model)
pruner.unwrap_model()
masks = pruner.get_masks()
dummy_input = (torch.randint(0, 1, (32, 16, 1)).to(device).float(), torch.randint(0, 1, (32, 16, 1)).to(device).float())

replacer = TransformersAttentionReplacer(model)

ModelSpeedup(model, dummy_input, masks).speedup_model()

  • How to reproduce:
  • this is my customed Transformer stucture I use it to predict time series

CustomTransformer(
(embedding): Linear(in_features=1, out_features=64, bias=True)
(positional_encoding): PositionalEncoding(
(dropout): Dropout(p=0, inplace=False)
)
(encoder): Encoder(
(encoder_layers): ModuleList(
(0): Encoderlayer(
(attention): AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.1, inplace=False)
)
(query_projection): Linear(in_features=64, out_features=64, bias=True)
(key_projection): Linear(in_features=64, out_features=64, bias=True)
(value_projection): Linear(in_features=64, out_features=64, bias=True)
(out_projection): Linear(in_features=64, out_features=64, bias=True)
)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(linear): Linear(in_features=64, out_features=64, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(linear_layers): ModuleList(
(0): Linear(in_features=64, out_features=64, bias=True)
)
)
(decoder): Decoder(
(decoder_layers): ModuleList(
(0): Decoderlayer(
(self_attention): AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.1, inplace=False)
)
(query_projection): Linear(in_features=64, out_features=64, bias=True)
(key_projection): Linear(in_features=64, out_features=64, bias=True)
(value_projection): Linear(in_features=64, out_features=64, bias=True)
(out_projection): Linear(in_features=64, out_features=64, bias=True)
)
(cross_attention): AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.1, inplace=False)
)
(query_projection): Linear(in_features=64, out_features=64, bias=True)
(key_projection): Linear(in_features=64, out_features=64, bias=True)
(value_projection): Linear(in_features=64, out_features=64, bias=True)
(out_projection): Linear(in_features=64, out_features=64, bias=True)
)
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=64, out_features=256, bias=True)
(linear2): Linear(in_features=256, out_features=64, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
)
(fc_in): Linear(in_features=64, out_features=64, bias=True)
(relu): ReLU()
(dropout): Dropout(p=0.1, inplace=False)
(fc_out): Linear(in_features=64, out_features=1, bias=True)
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant