Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] enable faster rcnn and sd model with oneflow backend #10439

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

crazy-JiangDongHua
Copy link
Contributor

oneflow backend 对接 torch compile ,在关闭和打开动态形状的时候,跑通了 faster rcnn 和 sd 模型。相关 issue: oneflow backend 对接 torch compile ,运行 faster rcnn

主要改动包括:

  1. 修复 oneflow 模型转 torch 模型时, 部分 torch.nn.functional.func 转换失败的 bug
  2. 在 oneflow backend 中打开 nn.Graph 的动态形状支持,环境变量对其 oneflow compile
  3. 在 oneflow backend 中对推理场景添加了 flow.no_grad ,避免了编译时错误:RuntimeError: The gradient function for op fused_multi_head_attention_inference is not found. Please check whether it has been implemented and registered correctly.
  4. 补全了对 nn.Graph 的不同返回数据类型的处理


of_g = OfGraph()
of_g._dynamic_input_graph_cache.set_cache_size(9)
of_g._dynamic_input_graph_cache.enable_shared(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个参数是不是对应了 compile_from_torch 接口 optionsizedynamic 参数。torch.compile接口参数中有dynamic 参数,我理解应该使用用户传进来的dynamic 参数而不是固定值 Truesize 这里设置为默认的9,可以定义一个常量表示,不使用魔鬼数字。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

基本上是对应的。size 这个确实可以改一下,我给加一个常量。dynamic 这个参数我觉得不用改,一是用户的参数传给了 torch,oneflow backend 拿不到,二是因为 torch compile 这个前端的存在,这里 dynamic 写死为 True 和 设置成用户传的值,两者是等价的。

@levi131 levi131 requested a review from strint March 6, 2024 03:30
return self.fx_md(*args, **kwargs)
if self.fx_md.training:
return self.fx_md(*args, **kwargs)
with flow.no_grad():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

训练或者推理模式的区分,with flow.no_grad,理论上不应该在这里的build函数中体现,而是在用户模型表达中。对于issue中提到的报错,可以确认一下是不是真的缺少对应的反向算子,通过补充反向算子解决问题。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我问了开发 fused_multi_head_attention_inference 的俊丞,他说这个算子只实现了前向,没实现反向。如果不在build 里面添加,那要修改 test compile 仓库里面的代码?我测试了只用 model.eval() 无法规避 issue中提到的报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants