-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running and publishing workflows in Cromwell on GCP with GCR images exposes developers to unexpected egress costs #6442
Comments
|
Thanks for pointing out the Cromwell/PAPI v2 beta support for docker image caches, I wasn't aware of this feature. After reading the documentation at https://cromwell.readthedocs.io/en/develop/backends/Google/#docker-image-cache-support, I have a few additional questions about how this could help with the problem described in this issue. It looks like docker image caching needs to be configured at the Cromwell server level with a manifest file indicating the location of images. Are the images listed in the manifest truly global (as indicated in their paths, ie For an external user already running their own cromwell instance in a different region who wants to run our pipeline, we'd need to publish an image caching configuration stanza and manifest file along with our workflow, and the administrator of the cromwell server would need to modify their server configuration to point to our manifest file and restart the server -- is that correct? If their cromwell server was already configured with a different image cache manifest, is there a way to add a second manifest, or would they have to edit their own cache manifest to include the entries in ours? If I'm right in my understanding (please correct me if I'm not) it seems like this solution could help, but is quite cumbersome and requires very savvy external users who are willing to take extra steps to help prevent saddling us with egress charges. It would be great if a cache manifest could be configured at the workflow level -- perhaps in the wdl or the workflow options. |
I feel like overall WDL developers need to have some way to control that docker images be cached within the WDL. Tasks that are not scattered are likely not relevant here, as one single download is unlikely to incur large egress charges. On the other hand for scattered tasks there should be a way for the WDL developer to demand caching, rather than relying on the user to do the right thing. Ideally this should all be handled by Google and container images, when downloaded, should be cached for a pre-determined amount of time. There is something called |
I've been working on this issue in parallel. My general recommendation would be to not use GCR for public images as there is a perpetual risk of egress charges to the image owner. GCR is great for keeping more of your workflow infrastructure inside Google Cloud, but egress charges will be billed to you as the owner of the storage bucket. Artifact registry seems to have its own set of egress charges, similar to GCS egress charges. Alternatively, services like quay.io offer unlimited storage and serving of public repos. The consequence of not using GCR is that docker images are now hosted outside of Google Cloud, meaning workflows will need to make an external network call to download the image. The external call will require VMs to have an external IP address. Large parallel workflows will need quota for several external IP addresses, and may run into quota limits. To alleviate this, workflow runners could be instructed to make a copy of the image into their own GCR. This also has the advantage that the workflow runner can use a repository in the same location as their VMs. Workflow publishers should include instructions on how to upload the image to a private GCR. How does that sound as a set of guidelines for the community? |
It seems to me very cumbersome to ask users to think this through. Currently my WDLs have a variable:
But I have dockers uploaded on each of |
I've been looking into a solution that uses VPC SC settings to restrict bucket egress to specific locations. The gist of it is the owner of the docker image puts the container registry in a project that is within a VPC SC perimeter. The docker image owner will need to configure the perimeter such that only VMs from specific ipRanges can access the bucket/docker image. I've put the details and instructions in this doc that I've currently shared with the Broad. |
@mcovarr I also echo @cwhelan that the solution provided is quite cumbersome. What exactly would be the complexity in devising a solution where instead you could simply define a variable like |
As discussed in #6235, developers of workflows for GCP who store their images in Google Container Repositories can be exposed to large Google GCS egress charges when users attempt to run workflows in different continental regions, resulting in many trans-continental container pulls. There currently does not seem to be a satisfactory way to guard against this:
Some possible ideas were suggested by @freeseek in #6235:
The text was updated successfully, but these errors were encountered: