Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging binaries with runfiles tree #579

Closed
mvukov opened this issue May 6, 2022 · 14 comments · Fixed by #754
Closed

Packaging binaries with runfiles tree #579

mvukov opened this issue May 6, 2022 · 14 comments · Fixed by #754
Assignees
Labels
P1 An issue that must be resolved. Must have an assignee
Milestone

Comments

@mvukov
Copy link

mvukov commented May 6, 2022

I have a simple use-case (I believe) where I want to package a binary with runfiles tree correctly set up inside the archive -- pretty much exactly what rules_docker does to package a binary in a Docker layer. pkg_zip flattens everything and that doesn't work for me (at least that was the case when I tried to use an older version of this repo).

So, I took what I need from app_layer from rules_docker and created rule binary_pkg_tar. This is interesting when one wants to cross-compile a binary, package it (with all deps, data), transfer to a target device, unpack and run the binary.

I just wanted to share this here in hope folks find it useful.

@aiuto
Copy link
Collaborator

aiuto commented May 7, 2022 via email

@mvukov
Copy link
Author

mvukov commented May 27, 2022

The goal of this little project was to create a rule that creates a valid runfiles folder for a packaged binary; simplest case: one wants to package a cc_binary/py_binary with a bit more complex deps, e.g. with data fields. At the time I created this (~1 year ago), pkg_tar would flatten the binary structure, which simply couldn't work -- in particular with py_binary targets which have external pip deps. On the other hand, rules_docker had exactly what I needed -- they basically copy+adjust runfiles folder of a binary target in /app folder inside an archive that represents a Docker layer. I took what I needed from rules_docker repo and created a new rule for my purpose. In essence, I don't create a Docker layer, but reuse the layer creation logic used to package binaries correctly in archives.

My starting point was app_layer that, among other things, creates an archive (layer) given the binary. The logic inside that rule (that I copied) eventually calls build_tar.py script to build an archive.

rules_pkg has a very similar build_tar.py, but I didn't have time to instigate the diffs and use that one. I think the script from rules_docker in fact reuses some logic from this repo (c.f. bazelbuild/rules_docker#1556).

There is a high chance the code I put together could use build_tar.py from this repo and work equally well. If there is enough interest, the new rule I cooked could become a part of this repo.

Please let me know if you need more info.

@aiuto
Copy link
Collaborator

aiuto commented May 28, 2022

A few thoughts

  1. Don't think about build_tar.py. All new features should be in terms of pkg_files and building a provider structure that gets the layout we want. Then we can load to pkg_tar or pkg_zip or whatever.
  2. What is your plan for separating the data runfiles from the binary runfiles?

Consider a python binary that contains some SWIGed up C++ code and some data files.

some/app/BUILD:

cc_library(
  name = "foo",
  srcs = ["..."],
  data = ["messages.cat"],
  link = shared,
)

swig_wrapper(name = "foo_wrapped", target = "foo")

py_binary(name = "my_app", srcs=["my_app.py", ...], deps=["foo_wrapped"])

Bazel would normally lay that out something like

some/app/my_app.par

which is a self extracting script that gives you

some/app/my_app.runfiles/some/app/__init__.py
some/app/my_app.runfiles/some/app/my_app.py
some/app/my_app.runfiles/some/app/libfoo.so
some/app/my_app.runfiles/some/app/messages.cat

A pleasing packaging might be something like

/usr/share/my_app/bin/my_app.py
/usr/share/my_app/bin/libfoo.so
/usr/share/my_app/lib/messages.cat

where the binary parts go in .../bin and the data goes in .../lib.

Bazel does not distinguish between the runfiles which are data vs those that are part of the binary. How would we remap things in a nice way? Now think of a real app, where the data dependencies might come in from hundreds of transitive deps.

@kaplan2539
Copy link

kaplan2539 commented Jun 3, 2022

l have a very similar use-case and love your binary_pkg_tar() rule @mvukov.
It would be awesome if there's an even more generic solution.
Taking a few steps back, IMHO this boils down to the following problem(s):

  • Awesome, I can use bazel to build!
  • Even better, by typing bazel run I can magically run what created with py_binary() (or whatever_binary())
  • But wait, can I somehow transfer this to a different computer that does not have bazel and my full mono repo? 🧐
  • Oh, nice there are rules_docker which allow me to upload everything to a Docker daemon as a Docker image
  • Hmm but I don't have/want Docker on my development machine
  • And even worse it is impossible to run Docker on the different computer I need to run my software on built with bazel
  • Now what? There are the rules_pkg but in when using e.g. pkg_tar I have to take additional steps to preserve the runfile tree structure created by e.g. py_binary()
  • If I understand correctly, with @mvukov's binary_pkg_tar() rule I get a tar file that preserves the runfile tree structure and adds a launcher script. I can now extract that tar file on a different computer that does not have bazel or Docker, and simply launch my program by running the launcher script. Ideally it packs everything needed to run the app, e.g. python interpreter, python modules, their native .so dependencies etc.

Now I imagine the following, even more generic work flow :

  • We let py_binary() or whatever_binary() define the runfiles tree structure
  • If that's not good enough, I create my own custom_py_binary() which defines the runfiles tree structure that works
  • And then there's a rule maybe called pkg_runfiles(), that produces a pkg_files provider structure
    which can be fed into pkg_tar(), pkg_zip(), ....
  • Oh have I mentioned how cool a pkg_sqfs() rule would be?

I have only recently started to work with bazel, so maybe the above is totally crazy. But I'd love to hear your thoughts @aiuto @mvukov. I've also found https://github.com/dropbox/dbx_build_tools and they have dbx_pkg_sqfs() but I haven't tried it out yet.

@mvukov
Copy link
Author

mvukov commented Jun 5, 2022

I agree that generalization of binary_pkg_tar would be a rule that at some intermediate step returns a provider with all info to dump into a provider that can be fed into another function that assembles an archive.

The rule as-is supports at the moment pip-deps and binaries. In fact, the use-case that I have in rules_ros repo is that the final app is a py_binary that has a number of pip-deps (via deps field) and a number of C++ apps as data-deps (via data field); it doesn't matter whether there are 5 or 100 pip deps. I believe Python extensions would work equally well (it would be, as you pointed out, a yet another data-dep).

Regarding the "pleasing packaging" to /usr/bin and /usr/lib is something I didn't consider TBH. If am I not wrong, this would mean splitting the binary' runfiles tree into multiple folders.

Bazel does not distinguish between the runfiles which are data vs those that are part of the binary.

I don't see any problem with this. Can you clarify, please?

OK, all in all it looks like I would need to wrap my head around providers in rules_pkg.

@mvukov
Copy link
Author

mvukov commented Jun 5, 2022

binary_pkg_tar() rule I get a tar file that preserves the runfile tree structure and adds a launcher script. I can now extract that tar file on a different computer that does not have bazel or Docker, and simply launch my program by running the launcher script.

Yes, that's the gist :) I tested the rule with cc_binary and py_library, but I don't see why wouldn't this work with any _binary.

And then there's a rule maybe called pkg_runfiles(), that produces a pkg_files provider structure
which can be fed into pkg_tar(), pkg_zip(), ....

Yes, I envision something like this would do the job correctly.

@aiuto aiuto added this to the runfiles milestone Jul 26, 2022
@iangudger
Copy link

iangudger commented Apr 19, 2023

It looks like rules_docker is now deprecated in favor of rules_oci, which relies on rules_tar for packaging. As a result, this feature is needed now more than ever.

@purkhusid
Copy link

purkhusid commented May 30, 2023

I'm hitting the same issue when trying to adopt rules_oci. My applications expects to be able to use the runfiles libs to locate runfiles but pkg_tar does not package them in a way that works for the runfiles libs. @thesayyn @alexeagle do you know of any workaround for this issue when using rules_oci?

@aiuto
Copy link
Collaborator

aiuto commented Jun 1, 2023 via email

@iangudger
Copy link

@purkhusid I have been using https://gist.github.com/pauldraper/7bc811ffbef6d as a workaround.

@aiuto
Copy link
Collaborator

aiuto commented Sep 19, 2023

I've ben playing with runfiles over the weekend. I have a PR (#754) that does runfiles as per my example above. That is, you get something just like the bazel layout. For example.

some/app/my_app.runfiles/some/app/__init__.py
some/app/my_app.runfiles/some/app/my_app.py
some/app/my_app.runfiles/some/app/libfoo.so
some/app/my_app.runfiles/some/app/messages.cat

And this is great, because it provides a layout where the runfiles from every app are isolated from each other. This is necessary for correctness.

But... it's a breaking change from the existing runfiles, so I'm going to have to implement a legacy mode. That will be a phased migration of

  • pkg_tar.runfiles_legacy_mode(default=True)
  • old style runfiles
  • N months or releases. TBD
    • users must migrate over that time
  • filp the default

Another approach I am exploring is doing legacy and correct at the same time. That will create multiple copies of the runfiles. One under the app .runfiles folder, and one at the package path. It might also be possible to just add symlinks from legacy location to the new one, so the file size does not explode. I have not looked yet.

The first step will be some refactoring of code to make it easier to pick alternate behaviors.

@alexeagle
Copy link
Collaborator

Since pkg_tar runfiles have been broken in the past, I'd expect approx. no one is using them, so changing the layout would not be a breaking change which requires that degree of care and a prolonged rollout period. Do you have some way to know who relies on the legacy mode?

@aiuto
Copy link
Collaborator

aiuto commented Sep 26, 2023

A few hundred instances inside Google rely on the legacy mode. Those are the only ones I know of.
It's not hard to support both for a while. The refactoring PR has been out for reveiw for a while. With that in place I can make both modes work without making pkg_tar_impl too ugly. In a few months, I can migrate my users to the new style and drop the legacy mode.

@aiuto aiuto added the P1 An issue that must be resolved. Must have an assignee label Oct 16, 2023
@purkhusid
Copy link

purkhusid commented Oct 24, 2023

Wanted to point out that the issue of the repo mapping manifest(If you have bzlmod enabled) not being accessable in starlark will also have to handled for this feature to be useful for bzlmod users.

More context here:
bazelbuild/bazel#19937
bazelbuild/bazel#17941

aiuto added a commit to aiuto/rules_pkg that referenced this issue Nov 22, 2023
The bulk of the change is to have pkg_tar use pkg_files%add_label_list to
process srcs.  That brings it in conformance with pkg_files and pkg_zip.
This changes the behavior of runfiles to be useful, but it is a breaking
change.

Fixes: bazelbuild#153
Fixes: bazelbuild#579

Other bits:
- do not fail verify_archive_test without a min_size or max_size
- change `data` file in the sample executable from BUILD to foo.cc so
  that it is easier to disambiguate from other files.
- fix bug in runfiles support where files were not added to file_deps list.
- fix bug in runfiles support where the runfiles base was not computed
  correctly.
- buildifier to make the linters happy.
aiuto added a commit that referenced this issue Nov 28, 2023
* Bring tar runfiles up to feature parity with pkg_files.runfiles.

The bulk of the change is to have pkg_tar use pkg_files%add_label_list to process `srcs`.  That brings it in conformance with pkg_files and pkg_zip. This changes the behavior of runfiles to be useful, but it is a breaking
change.

Fixes: #153
Fixes: #579

Other bits:
- do not fail verify_archive_test without a min_size or max_size
- change `data` file in the sample executable from BUILD to foo.cc so
  that it is easier to disambiguate from other files.
- fix bug in runfiles support where files were not added to file_deps list.
- fix bug in runfiles support where the runfiles base was not computed
  correctly.
- some buildifier fixes to make the linters happy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 An issue that must be resolved. Must have an assignee
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants