Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] simplify gazelle_python.yaml #1890

Open
hunshcn opened this issue May 10, 2024 · 4 comments
Open

[FR] simplify gazelle_python.yaml #1890

hunshcn opened this issue May 10, 2024 · 4 comments
Labels

Comments

@hunshcn
Copy link
Contributor

hunshcn commented May 10, 2024

🚀 feature request

Relevant Rules

gazelle_python.yaml

Description

A clear and concise description of the problem or missing capability...

gazelle_python.yaml will list all libs on packages

    matplotlib.backend_tools: matplotlib
    matplotlib.backends: matplotlib
    matplotlib.backends.backend_agg: matplotlib
    matplotlib.backends.backend_cairo: matplotlib
    matplotlib.backends.backend_gtk3: matplotlib
    matplotlib.backends.backend_gtk3agg: matplotlib
    matplotlib.backends.backend_gtk3cairo: matplotlib
    matplotlib.backends.backend_gtk4: matplotlib
    matplotlib.backends.backend_gtk4agg: matplotlib
    matplotlib.backends.backend_gtk4cairo: matplotlib
    matplotlib.backends.backend_macosx: matplotlib
    matplotlib.backends.backend_mixed: matplotlib
    matplotlib.backends.backend_nbagg: matplotlib
    matplotlib.backends.backend_pdf: matplotlib
    matplotlib.backends.backend_pgf: matplotlib
...

I have a three lines only requirements.in

numpy
requests
seaborn

20 packages requirements.txt

but received a 110KB gazelle_python.yaml which has 2517 keys at .manifest.modules_mapping

It is too big.

Describe the solution you'd like

If you have a solution in mind, please describe it.

Can we simplify it and make gazelle_python.yaml have only top level package name?

In my shallow cognition, such a long imports seems meaningless.

I'd like to support it if I can.

Describe alternatives you've considered

Have you considered any alternative solutions or workarounds?

no

@aignas
Copy link
Collaborator

aignas commented May 13, 2024

I am not sure if the complexity is worth it as the package imports may overlap at the top level. It could be possible to improve it, but I am not sure how realistic it is.

Let's imagine a case where you write code

import foo.bar

and the foo.bar import is only provided by foo-bar-plugin python package, but what you have in the gazelle_python.yaml is data that tells gazelle that foo maps to foo python package, so gazelle will be happy and will add the wrong dependency instead of erroring here.

I think this is related to #1877, but might not be.

If somebody was to implement it, I think it should:

  • Do the right thing in the case outlined above and I am not sure what that right thing is yet.
  • Handle cases where we have platform specific imports.
  • The smarter gazelle plugin that can work with the smaller gazelle_python.yaml should still work with the previous gazelle_python.yaml.

@aignas aignas added help wanted gazelle Gazelle plugin related issues labels May 13, 2024
@hunshcn
Copy link
Contributor Author

hunshcn commented May 13, 2024

One possibility is that we only simplify the generated yaml. During the generation, we find the top bifurcation and then cut the branches. And this is well compatible.

for example

    foo: foo
    foo.bar : for-bar
    foo.bar.abc: for-bar
    foo.bar.xyz: for-bar
    foo.bar.xyz.sub: for-bar

to

    foo: foo
    foo.bar : for-bar

@aignas
Copy link
Collaborator

aignas commented May 17, 2024

One extra idea how we could simplify the gazelle_python.yaml file by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has a pyproject.toml file and I only specify airflow as the dependency in the project, gazelle_python.yaml will be enormous because of all of the transitive airflow dependencies that rules_python knows about. However, the only things that we should actually include in the gazelle_python.yaml are the things that the requirements.in or pyproject.toml mentions.

Such thing could be achieved in two ways:

  1. Modify the pip extension to return a list of whls that are the direct references of the project. This would mean that the pip.parse should accept the requirements.in or the pyproject.toml and use that to generate a list of whl targets that are the direct references. That way the modules_mapping file will only create gazelle_python.yaml entries that are the direct references.
  2. Modify the gazelle tooling to accept a pyproject.toml and/or requirements.in file in order to filter out the unnecessary wheels.

I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different visibility values in the hub repo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in the requirements.in and/or pyproject.toml. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.

The second approach would be more doable as golang could be definitely taught how to parse pyproject.toml and that would be something that the user can opt into.

That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.

@hunshcn
Copy link
Contributor Author

hunshcn commented May 20, 2024

One extra idea how we could simplify the gazelle_python.yaml file by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has a pyproject.toml file and I only specify airflow as the dependency in the project, gazelle_python.yaml will be enormous because of all of the transitive airflow dependencies that rules_python knows about. However, the only things that we should actually include in the gazelle_python.yaml are the things that the requirements.in or pyproject.toml mentions.我们如何大幅简化 gazelle_python.yaml 文件的一个额外想法是仅包含项目直接引用的依赖项。例如,如果我的项目有一个 pyproject.toml 文件,并且我只指定 airflow 作为项目中的依赖项,则 gazelle_python.yaml 将非常庞大,因为所有传递性 airflow rules_python 知道的依赖项。但是,我们实际上应该在 gazelle_python.yaml 中包含的唯一内容是 requirements.inpyproject.toml 提到的内容。

Such thing could be achieved in two ways:这样的事情可以通过两种方式实现:

  1. Modify the pip extension to return a list of whls that are the direct references of the project. This would mean that the pip.parse should accept the requirements.in or the pyproject.toml and use that to generate a list of whl targets that are the direct references. That way the modules_mapping file will only create gazelle_python.yaml entries that are the direct references.修改 pip 扩展以返回作为项目直接引用的whls列表。这意味着 pip.parse 应该接受 requirements.inpyproject.toml 并使用它来生成 whl 目标列表,这些目标是直接参考。这样, modules_mapping 文件将仅创建作为直接引用的 gazelle_python.yaml 条目。
  2. Modify the gazelle tooling to accept a pyproject.toml and/or requirements.in file in order to filter out the unnecessary wheels.修改瞪羚工具以接受 pyproject.toml 和/或 requirements.in 文件,以过滤掉不必要的轮子。

I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different visibility values in the hub repo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in the requirements.in and/or pyproject.toml. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.我个人认为 1. 可能是更好的解决方案,但涉及更多一点,因为我们还可以在 hub 存储库中为传递和非传递依赖项设置不同的 visibility 值项目的所有直接依赖项都位于 requirements.in 和/或 pyproject.toml 中。然而,这可能是一个品味问题,我不确定这对每个人来说是否是一件好事。

The second approach would be more doable as golang could be definitely taught how to parse pyproject.toml and that would be something that the user can opt into.第二种方法更可行,因为 golang 绝对可以被教导如何解析 pyproject.toml 并且用户可以选择这样做。

That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.这样我们就可以大幅削减文件大小,而无需使用任何更高级的技术,例如上面评论中建议的技术。

Maybe I missed something.

Whether it is 1 or 2, users need to ensure that all direct dependencies need to live in requirements.in, otherwise we will not be able to resolve this dependency. This requirement is reasonable, but it is difficult to achieve (most projects should not be able to do it because python does not have the same visibility restrictions as bzlmod).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants