Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Container Registry images left after Functions deployment #3439

Merged
merged 10 commits into from Jun 21, 2021

Conversation

inlined
Copy link
Member

@inlined inlined commented Jun 1, 2021

Woohoo! Functions customers will no longer have a $0.02 bill that is both annoying and can cause major problems when their credit card refuses the bill and GCP locks their billing profile for failed payment.

@inlined inlined requested a review from joehan June 1, 2021 18:59
@google-cla google-cla bot added the cla: yes Manual indication that this has passed CLA. label Jun 1, 2021
@inlined
Copy link
Member Author

inlined commented Jun 1, 2021

So I was reviewing some notes on ArtifactRegistry. ArtifactRegistry also implements the Docker API as well as serving its own One Platform API. I'm wondering if gcp/containerregistry.ts should be smething more like gcp/docker.ts and it can take a URL. Then I can move the domain calculations into containerCleaner.ts where it will pivot wether to use GCR or AR based on a preview flag artifactregistry.

Copy link
Contributor

@joehan joehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the idea of switching gcp/containerregistry.ts to a more generic gcp.docker.ts - however, I have two concerns:
1 - Will the AR still use the weird extra fields on tag that GCR has? If not, I could see the code getting pretty sharded & ugly.
2 - Since the Docker API is so complex, would it be simpler to just use the OnePlatform API for AR, and then eventually get rid of the Docker/GCR code?

logger.debug("Failed to delete container registry artifacts with error", err);
utils.logLabeledWarning(
"functions",
"Unhnandled error cleaning up build files. This could result in a small monthly bill if not corrected"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to include a console link to where you can clean these up, plus a list of the files that were not cleaned up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've split the difference between what I think you're asking for and what I think I can reliably deliver (since I can't do a recursive LS to find all files that need to be deleted when I can't trust API calls to succeed).

I've made the code try to resume from as many errors as possible by catching and throwing at the end of each process. Since this can lead to multiple errors, I throw a random error but log all errors. When there are any errors, I print the top-level directory for GCF's images in that multi-region. I have a list format for 2 or more regions and an inline format for one region. I've manually tested a case where there was:

  1. No errors
  2. An error in one region
  3. An error in two regions on the same multi-region
  4. An error in multiple regions that are stored in multiple multi-regions

logger.debug("Failed to delete container registry artifacts with error", err);
utils.logLabeledWarning(
"functions",
"Unhnandled error cleaning up build files. This could result in a small monthly bill if not corrected"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in unhandled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Let children ("directories") be cleaned up in parallel while we clean
// up the "files" in this location.

const deleteTags = stat.tags.map((tag) => this.client.deleteTag(path, tag));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete all tags before starting to delete the images? If not, we could combine these await Promise.all()'s and do them in parallel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a little concerned about how this behaves if just one call fails - for example, if the first Promise.all rejects, this errors and we never try to deleteImages... but we already made our recursive call, so that has started but will get 'cut off' at some arbitrary point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Tags pin an image and must be removed before an image can be removed. I've added comments to clarify this.
  2. Agreed. I didn't feel super warm and fuzzy about these errors. I've made rm recurse as far as it can irrespective of errors and throw a random error it encounters. Rather than investing in a multi-error type to aggregate FirebaseErrors, I'm just debug logging the error for future reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me - this is definitely a tricky one to handle and it seems likely that if one call fails, the rest are high likelihood to as well. I feel better about this strategy tho.

it("Handles cleanup of first function in the region", async () => {
const cleaner = new containerCleaner.ContainerRegistryCleaner();

// Any cast because the stub apparently isn't stubbing getNode as a priavte member.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in private

it("Handles cleanup of second function in the region", async () => {
const cleaner = new containerCleaner.ContainerRegistryCleaner();

// Any cast because the stub apparently isn't stubbing getNode as a priavte member.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in private

it("Leaves other directories alone", async () => {
const cleaner = new containerCleaner.ContainerRegistryCleaner();

// Any cast because the stub apparently isn't stubbing getNode as a priavte member.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in private

// Let children ("directories") be cleaned up in parallel while we clean
// up the "files" in this location.

const deleteTags = stat.tags.map((tag) => this.client.deleteTag(path, tag));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me - this is definitely a tricky one to handle and it seems likely that if one call fails, the rest are high likelihood to as well. I feel better about this strategy tho.

@inlined inlined merged commit 90c9ff2 into master Jun 21, 2021
@inlined inlined deleted the inlined.gcr-cleanup branch June 21, 2021 19:33
@ghost
Copy link

ghost commented Jun 22, 2021

Do I need to redeploy functions for this to take effect?

@inlined
Copy link
Member Author

inlined commented Jun 22, 2021

Yes, run npm install -i firebase-tools to get 9.14.0 or greater and re-deploy. The CLI will only clean up images for functions that are still in your deployment; we don't just rm -rf everything.

@c0dification
Copy link

c0dification commented Jun 29, 2021

@inlined - I upgraded to 9.14 last week and did a few deployments since then. But since yesterday morning, when I deploy a function, the script successfully deploys the function and then just sits on the last line, i functions: cleaning up build files..., and never terminates. I have to cntr-c to get my terminal back. I've let it sit for up to an hour in the terminal without any change. Any thoughts on this?

Here's an example.

firebase deploy --only functions:scheduledFirestoreExport

=== Deploying to 'vizzn-poc'...

i  deploying functions
Running command: npm --prefix "$RESOURCE_DIR" run lint

> functions@3.22.0 lint
> tslint --project tsconfig.json

Running command: npm --prefix "$RESOURCE_DIR" run build

> functions@3.22.0 build
> rimraf ./lib && ./node_modules/.bin/tsc && cpx "./src/**/*.hbs" ./lib --clean

✔  functions: Finished running predeploy script.
⚠  functions: package.json indicates an outdated version of firebase-functions.
Please upgrade using npm install --save firebase-functions@latest in your functions directory.
i  functions: ensuring required API cloudfunctions.googleapis.com is enabled...
i  functions: ensuring required API cloudbuild.googleapis.com is enabled...
✔  functions: required API cloudbuild.googleapis.com is enabled
✔  functions: required API cloudfunctions.googleapis.com is enabled
i  functions: preparing functions directory for uploading...
i  functions: packaged functions (2.02 MB) for uploading
i  pubsub: ensuring required API pubsub.googleapis.com is enabled...
i  scheduler: ensuring required API cloudscheduler.googleapis.com is enabled...
✔  scheduler: required API cloudscheduler.googleapis.com is enabled
✔  pubsub: required API pubsub.googleapis.com is enabled
✔  functions: functions folder uploaded successfully
i  functions: updating Node.js 10 function scheduledFirestoreExport(us-central1)...
✔  functions[scheduledFirestoreExport(us-central1)]: Successful upsert schedule operation. 
✔  functions[scheduledFirestoreExport(us-central1)]: Successful update operation. 
i  functions: cleaning up build files...

inlined added a commit that referenced this pull request Jul 3, 2021
* Unbreak build (#3463)

* Unbreak build

* linter changed its mind

* firestore:delete getConfirmationMessage should include current project (#3457)

The firestore:delete command should notify the user of the current project. This should help users minimize chances of accidental deletions when switching between projects.

* Add asia-southeast1 to RTDB CLI (#3460)

* Fix init database without projectId (#3446)

* Fix init database (#3445)

* Added value check for "feature" parameter in init (#3449)

When the optional feature parameter is provided in a `firebase init
[feature]` command, this checks that its value is a valid choice before
attempting any other initialization.

* Bump trim-newlines from 3.0.0 to 3.0.1 (#3471)

Bumps [trim-newlines](https://github.com/sindresorhus/trim-newlines) from 3.0.0 to 3.0.1.
- [Release notes](https://github.com/sindresorhus/trim-newlines/releases)
- [Commits](https://github.com/sindresorhus/trim-newlines/commits)

---
updated-dependencies:
- dependency-name: trim-newlines
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump ws from 7.2.3 to 7.4.6 (#3428)

* add node16 to tests (#3462)

* Import/export download tokens (#3444)

* Fixes Storage Emulator startup errors (#3478)

* Bump normalize-url from 4.5.0 to 4.5.1 (#3476)

Bumps [normalize-url](https://github.com/sindresorhus/normalize-url) from 4.5.0 to 4.5.1.
- [Release notes](https://github.com/sindresorhus/normalize-url/releases)
- [Commits](https://github.com/sindresorhus/normalize-url/commits)

---
updated-dependencies:
- dependency-name: normalize-url
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fixes download tokens missing when uploading files via Cloud SDK (#3479)

* Follow up to #3420 (#3437)

* Increase waitForPortClosed timeout to 60s (#3483)

* Added validation logic to allow selectResource param type in extensions.yaml (#3489)

* Fix background functions in functions:shell (#3491)

* 9.13.0

* [firebase-release] Removed change log and reset repo after 9.13.0 release

* Fix ext:update issue where local extension is incorrectly inferred as published extension (#3499)

* Add missing changelog entry for #3499 (#3500)

* Fix init hosting:github (#3503)

* 9.13.1

* [firebase-release] Removed change log and reset repo after 9.13.1 release

* Avoid emulator data loss when there an error during export (#3511)

* Ask before overwriting storage.rules (#3510)

* Update CONTRIBUTING.md (#3513)

added note to run `npm install` before `npm link` the first time

* Release Cloud Firestore Emulator v1.13.0. (#3515)

* Basic create support

This change adds support for `firebase --open-sesame golang`.

After running this command, `firebase init` will support Go 1.13
as a langauge for Cloud Functions.

Limitations:

1. .gitignore is empty
2. Customers cannot mix Node and Go code (WAI)
3. There is little validation being done of customer code
4. The actual deployed function params are hard coded; SDK incoming

* Use vendoring to fetch SDK

* Update sample code

* Simplify unarchive pipe

* TSLint

* PR feedback

* Delete Container Registry images left after Functions deployment (#3439)

* Delete Container Registry images left after Functions deployment

* Simplify caching

* Improve error handling and report next steps to users

* lint fixes

* Fix typo

* Increase max function ID length to 63 (#3521)

* Fix crash when deploying zero functions. (#3520)

Previously most code read the desired backend from
`options.config.get("functions.backend")` which was set to the
empty backend correctly. Code that depended on payload.functions.backend
crashed because payload.functions was null when the backend was
empty.

Since optins.config should be firebase.json data, this change
normalizes on payload.functions.backend and ensures that it
is never null while options.config.get('functions') is present
(i.e. when the customer has functions to deploy).

* Use proper replace and get commands

* Update changelog with my recent pushes (#3522)

* 9.14.0

* [firebase-release] Removed change log and reset repo after 9.14.0 release

* Bump glob-parent from 5.0.0 to 5.1.2 (#3472)

* Added deferred provisioning check for Storage and Authentication during extension install (#3497)

Implemented provisioning check helper which checks whether products use by the extension are fully provisioned.

* Generate JSON Schema for firebase.json (#3505)

* Fetch from newly public GH repo

* PR feedback

* Format

Co-authored-by: davidbrenner <david.a.brenner@gmail.com>
Co-authored-by: Fred Zhang <fredzqm@google.com>
Co-authored-by: Sam Stern <samstern@google.com>
Co-authored-by: Andrew Heard <andrew@wizheard.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bryan Kendall <bkend@google.com>
Co-authored-by: Abe Haskins <abeisgreat@abeisgreat.com>
Co-authored-by: Enrico Graziani <mrenrich84@gmail.com>
Co-authored-by: Pavel Jbanov <pavelgj@gmail.com>
Co-authored-by: Google Open Source Bot <firebase-oss-bot@google.com>
Co-authored-by: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Co-authored-by: davidbielik <davidbielik@users.noreply.github.com>
Co-authored-by: Yuchen Shi <yuchenshi@google.com>
@inlined
Copy link
Member Author

inlined commented Jul 10, 2021

That seems quite peculiar. Have you had many different functions by many different names in this project in the past? Unfortunately the GCR layout is to have each function in a different UUID named "directory" so we have to peek into every directory before we even find the ones we want to clean up. Could you check out the "gcf" directory in https://console.cloud.google.com/gcr and see how many images are there? You might need to manually clean them up to get unstuck. Unfortunately you cannot delete by directory, only by image, because the API is very chatty and the console team decided not to implement a feature with unbounded complexity.

devpeerapong pushed a commit to devpeerapong/firebase-tools that referenced this pull request Dec 14, 2021
…ebase#3439)

* Delete Container Registry images left after Functions deployment

* Simplify caching

* Improve error handling and report next steps to users

* lint fixes

* Fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Manual indication that this has passed CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants