Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: sys.path weirdness: local files/dirs shadow third-party packages #175

Open
georgevreilly opened this issue Aug 11, 2023 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@georgevreilly
Copy link

What happened?

I'm migrating a large codebase from rules_python to rules_py. One problem that I've had to address multiple times is that directory names and filenames are now shadowing third-party packages.

  • I had to rename the langchain directory not to conflict with https://pypi.org/project/langchain/. import langchain.llms found our langchain directory first in sys.path.
  • Similarly, for a pyspark directory.
  • The built-in copy module ended up importing from our org.py:

When using rules_py, a number of tests fail with a callstack like this:

Traceback (most recent call last):
  File "/mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/blah_blah/tools/build_rules/py/private/pytest_loader.py", line 6, in <module>
    import pytest
...
  File "/mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/py_deps_main/attrs/attrs/attr/_make.py", line 5, in <module>
    import copy
  File "<frozen zipimport>", line 259, in load_module
  File "/mnt/tmpfs/BAZEL_OUTPUT/execroot/blah_blah/external/python_interpreter/lib/python38.zip/copy.py", line 60, in <module>
  File "//mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/blah_blah/tasks/tasks/org.py", line 3, in <module>

If you look at the copy.py source, it's actually trying to do from org.python.core import PyStringMap. This is Jython support; a handled ImportError is raised in CPython. I worked around this by renaming org.py.

sys.path is substantially longer in rules_py. There are 1193 entries for the above test, versus 484 for rules_python. As far as I can tell, the primary difference is that every directory in our source tree that we import from seems to be in sys.path when using rules_py.

Version

Development (host) and target OS/architectures:
Linux x86_64, Ubuntu 20.02
Output of bazel --version:
bazel 6.2.0
Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file:
0.3.0
Language(s) and/or frameworks involved:
Python, Scala, Java, ...

How to reproduce

No response

Any other information?

No response

@georgevreilly georgevreilly added the bug Something isn't working label Aug 11, 2023
@github-actions github-actions bot added the untriaged Requires traige label Aug 11, 2023
@alexeagle
Copy link
Member

Thanks for the feedback @georgevreilly ! We've finally found some funding to get started on a 1.0 release for rules_py and hope to reach that around BazelCon. Is your employer interested in sponsoring the work?
(If not, excellent contributions like issue reports and verifying the fixes are much appreciated, of course)

@alexeagle alexeagle removed the untriaged Requires traige label Aug 29, 2023
@alexeagle alexeagle added this to the 1.0 milestone Aug 29, 2023
@alexeagle
Copy link
Member

I think this is a consequence of a fundamental design decision. rules_python puts the name of your repository on the sys.path and therefore your local top-level folders were namespaced by that, and didn't collide with third-parties. However, that was a Bazel-only convention, not how idiomatic Python is written. (it also causes problems with bzlmod where "the name of your repository" depends on the context where the name is accessed, see https://bazelbuild.slack.com/archives/C014RARENH0/p1695268845206649)

Most clients I've seen have a repeated folder structure for their top-level, so something like my-repo/my-repo/path/to/file so that import my-repo.path.to works correctly under Python outside of Bazel. I think you'll have to do a one-time refactor like this.

If you decide to do this, note that one option is to use git filter-repo to "relocate" your commits under that folder (see https://blog.aspect.dev/otherrepo-to-monorepo) so that you don't pollute the blame layer with a giant file move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants