Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use logging library with json support in cmctl (part 2) #6119

Merged
merged 1 commit into from Aug 1, 2023

Conversation

inteon
Copy link
Member

@inteon inteon commented May 30, 2023

Pull Request Motivation

Continuation of what we started in #5828.
Changes cmctl to use the logging library used in all our other components.

NOTE: I had to split this PR into 3 parts: part 1, part 2 and part 3 because cmctl uses a pinned cert-manager version. This PR contains some remaining changes to cert-manager/cert-manager.

Kind

/kind cleanup

Release Note

You can find the release notes in #6119

NONE

@jetstack-bot jetstack-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 30, 2023
@inteon
Copy link
Member Author

inteon commented May 30, 2023

Tests are failing because #6108 has not been merged yet.

@inteon inteon requested a review from maelvls June 15, 2023 15:34
@maelvls
Copy link
Member

maelvls commented Jun 23, 2023

Hey, I'll look at the PR today. Do you think it is still ready to be reviewed? A few Prow jobs are failing Ah, that's OK, it is because of #6108

@jetstack-bot jetstack-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2023
@inteon inteon added this to the 1.13 milestone Jul 4, 2023
@jetstack-bot jetstack-bot added area/testing Issues relating to testing and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 27, 2023
cmd/ctl/main.go Outdated
Comment on lines 36 to 39
fmt.Fprintf(os.Stderr, "%s\n", err)
logf.Log.Error(err, "error executing command")
Copy link
Member

@maelvls maelvls Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: It seems the error messages aren't consistently formatted. Some errors show as simple strings, some others as klog-formated strings.

For example, a simple string:

$ cmctl status certificate
error: the name of the Certificate has to be provided as argument
$ cmctl status certificate foo
error: error when getting Certificate resource: certificates.cert-manager.io "foo" not found

But other errors show in a klog-formatted string:

$ cmctl --foo
E0727 17:51:22.137918   20450 main.go:39] "cert-manager: error executing command" err="unknown flag: --foo"

Simple commands like check api (that are "one off", not meant to be run over time) are shown using the klog format. For example:

$ cmctl check api
I0728 16:18:00.842254   44411 api.go:143] "cert-manager/checkAPI: The cert-manager API is ready"

Is there something we can do? I think we should stick to using klog whenever we have a server, not for one-off operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new behavior:

$ cmctl status certificate
error: the name of the Certificate has to be provided as argument
$ cmctl status certificate foo
error: error when getting Certificate resource: the server could not find the requested resource (get certificates.cert-manager.io foo)
$ cmctl --foo
Usage: cmctl [command]

Available Commands:
  approve      Approve a CertificateRequest
  check        Check cert-manager components
  completion   Generate completion scripts for the cert-manager CLI
  convert      Convert cert-manager config files between different API versions
  create       Create cert-manager resources
  deny         Deny a CertificateRequest
  experimental Interact with experimental features
  help         Help about any command
  inspect      Get details on certificate related resources
  renew        Mark a Certificate for manual renewal
  status       Get details on current status of cert-manager resources
  upgrade      Tools that assist in upgrading cert-manager
  version      Print the cert-manager CLI version and the deployed cert-manager version

Flags:
  -h, --help                           help for cmctl
      --log-flush-frequency duration   Maximum number of seconds between log flushes (default 5s)
      --logging-format string          Sets the log format. Permitted formats: "json" (gated by LoggingBetaOptions), "text". (default "text")
  -v, --v Level[=2]                    number for the log level verbosity
      --vmodule pattern=N,...          comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)

Use "cmctl [command] --help" for more information about a command.

error: unknown flag: --foo
$ cmctl  check api
E0731 21:28:30.980017   61308 api.go:116] "cert-manager/checkAPI: Not ready" err="the cert-manager CRDs are not yet installed on the Kubernetes API server"
error: the cert-manager CRDs are not yet installed on the Kubernetes API server
$ echo "$?"
1
$ cmctl  check api --v=6
I0731 21:44:00.011195   70438 loader.go:373] Config loaded from file:  /home/tramlot/.kube/config
I0731 21:44:00.016759   70438 round_trippers.go:553] GET https://127.0.0.1:33735/api 200 OK in 4 milliseconds
I0731 21:44:00.017420   70438 round_trippers.go:553] GET https://127.0.0.1:33735/apis 200 OK in 0 milliseconds
E0731 21:44:00.017557   70438 api.go:122] "cert-manager/checkAPI: Not ready" err="the cert-manager CRDs are not yet installed on the Kubernetes API server" underlyingError="error finding the scope of the object: failed to get restmapping: failed to find API group \"cert-manager.io\""
error: the cert-manager CRDs are not yet installed on the Kubernetes API server
$ echo "$?"
1
$ cmctl  check api
The cert-manager API is ready
$ echo "$?"
0
$ cmctl  check api --v --wait=3s
E0731 21:48:57.371593   72469 api.go:118] "cert-manager/checkAPI: Not ready" err="the cert-manager CRDs are not yet installed on the Kubernetes API server" underlyingError="error finding the scope of the object: failed to get restmapping: failed to find API group \"cert-manager.io\""
E0731 21:48:57.373122   72469 api.go:118] "cert-manager/checkAPI: Not ready" err="the cert-manager CRDs are not yet installed on the Kubernetes API server" underlyingError="error finding the scope of the object: failed to get restmapping: failed to find API group \"cert-manager.io\""
E0731 21:49:02.376113   72469 api.go:118] "cert-manager/checkAPI: Not ready" err="the cert-manager CRDs are not yet installed on the Kubernetes API server" underlyingError="error finding the scope of the object: failed to get restmapping: failed to find API group \"cert-manager.io\""
E0731 21:49:02.376142   72469 api.go:137] "cert-manager/checkAPI: Timed out" err="context deadline exceeded" after="3s"
error: the cert-manager CRDs are not yet installed on the Kubernetes API server
$ echo "$?"
124

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ cmctl  check api --v=6
I0731 21:52:59.286162   74607 loader.go:373] Config loaded from file:  /home/tramlot/.kube/config
I0731 21:52:59.291059   74607 round_trippers.go:553] GET https://127.0.0.1:33735/api 200 OK in 4 milliseconds
I0731 21:52:59.291935   74607 round_trippers.go:553] GET https://127.0.0.1:33735/apis 200 OK in 0 milliseconds
I0731 21:52:59.292752   74607 round_trippers.go:553] GET https://127.0.0.1:33735/apis/cert-manager.io/v1 200 OK in 0 milliseconds
I0731 21:52:59.296707   74607 round_trippers.go:553] POST https://127.0.0.1:33735/apis/cert-manager.io/v1/namespaces/default/certificates?dryRun=All 201 Created in 3 milliseconds
The cert-manager API is ready
$ echo "$?"
0

@maelvls
Copy link
Member

maelvls commented Jul 31, 2023

Hey, I tried the new logging. I am not convinced, I think we should stick to using klog whenever we have a server, not for one-off operations.

@jetstack-bot jetstack-bot added the area/acme Indicates a PR directly modifies the ACME Issuer code label Jul 31, 2023
return nil
Run: func(cmd *cobra.Command, args []string) {
cmdutil.CheckErr(o.Complete())
cmdutil.CheckErr(o.Run(ctx))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cmdutil.CheckErr functions are also used in most of our other sub-commands.

@@ -38,12 +38,13 @@ var (
)

const (
crdsMappingError = `error finding the scope of the object: failed to get restmapping: no matches for kind "Certificate" in group "cert-manager.io"`
crdsMapping1Error = `error finding the scope of the object: failed to get restmapping: failed to find API group "cert-manager.io"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a chanage in controller-runtime that caused this regex to stop matching.

Copy link
Member

@maelvls maelvls Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had some form of testing that would catch this kind of sublety, but I am also conscious of the effort it would take to fake the API server to respond with these errors... so I'm OK with the current state of things 😅

@maelvls
Copy link
Member

maelvls commented Aug 1, 2023

Summary:

  • The new -v behavior is great and I approve!
  • I'd like this new behavior to be advertised in the --help output (either as part of this PR or in a follow-up PR).

Details:

I have tested the new -v behaviour and I am happy with it!

It is still possible to use -v without a number, which means we are not introducing any breaking change.

It is now possible to use -v=6, -v=7, and -v=8 to see the HTTP calls made. This use-case makes sense in the context of cmctl check api and cmctl upgrade migrate-api-version which both have complex logic happening. For example, in #6212, reading the error message wasn't informative enough and the user wouldn't have any way of knowing that the problem occurs when cmctl tries to create a certificate.

Although -v=6, -v=7, and -v=8 are a useful addition, I think the main focus should be on making sure the error messages give enough information to debug the error without having to result to inspecting the HTTP traffic. We don't have any tests ensuring that the "error mappings", which are super important to end-users, are still working over time.

I approve the idea of keeping the same approach as kubectl: anything that is printed as a human string stays the same when changing the level. Although it doesn't look great for log parsers, I think it is a "good enough" approach that won't surprise users. For example:

$ cmctl check api
The cert-manager API is ready

A minor note is that the behavior of -v changed. Before, it used to add a timestamp to every log line. It doesn't do that anymore. I don't think it matters to anyone as these timestamp weren't super useful anyways; -v=6, -v=7, and -v=8 are much more helpful.

# Before:
$ cmctl check api -v
2023/07/31 18:24:19 The cert-manager API is ready

# After:
$ cmctl check api -v
The cert-manager API is ready

The new -v=6 is super useful to understand where the cmctl command gets stuck or what it does. For example:

$ cmctl check api -v=6 --logging-format=json
{"ts":1690876954501.681,"caller":"clientcmd/loader.go:373","msg":"Config loaded from file:  /Users/mvalais/.kube/config\n","v":6}
{"ts":1690876954510.209,"caller":"transport/round_trippers.go:553","msg":"GET https://127.0.0.1:59109/api 200 OK in 7 milliseconds\n","v":0}
{"ts":1690876954512.226,"caller":"transport/round_trippers.go:553","msg":"GET https://127.0.0.1:59109/apis 200 OK in 1 milliseconds\n","v":0}
{"ts":1690876954513.974,"caller":"transport/round_trippers.go:553","msg":"GET https://127.0.0.1:59109/apis/cert-manager.io/v1 200 OK in 1 milliseconds\n","v":0}
{"ts":1690876954523.876,"caller":"transport/round_trippers.go:553","msg":"POST https://127.0.0.1:59109/apis/cert-manager.io/v1/namespaces/default/certificates?dryRun=All 201 Created in 9 milliseconds\n","v":0}
The cert-manager API is ready

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
cmd/ctl/main.go Outdated
defer logf.FlushLogs()

// In cmctl, we are using cmdutil.CheckErr, which will call os.Exit(1) if it receives an error.
// Instead, we want to do a soft exit, and use SetExitCode to set the correct exit code.
Copy link
Member

@maelvls maelvls Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: The comment describes what the code does (i.e., use a custom exiting logic instead of CheckErr) but it doesn't explain why. The "why" seems to be key here and I can't quite understand why we are doing what we are doing here 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cmdutil.CheckErr function was used already in code, I think it might be because it contains some logic to "beatify" some of the errors and display more human-readable errors.

Comment on lines 122 to 123
log.Error(err, "Not ready")
lastError = err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we both log and return the error here? Having a log line followed by the same message in plain text is weird:

$ cmctl check api
E0801 10:48:30.286495   76871 api.go:116] "cert-manager/checkAPI: Not ready" err="the cert-manager webhook deployment is not ready yet"
error: the cert-manager webhook deployment is not ready yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set --wait=5s, there will be multiple tries and each try will have its own log line. Only the last try is returned as a plain text message.

Copy link
Member Author

@inteon inteon Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK, we are just showing some extra information about what is going on using logs, but the final message contains the final result.

Copy link
Member

@maelvls maelvls Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, it doesn't look OK to handle the error but still return it afterwards. There is no point in displaying the error at the end if we already displayed it.

Also, why do we print a message to show the progress? The --wait parameter already shows the maximum wait time, so I don't understand the purpose of showing intermediate log lines. kubectl wait doesn't either:

$ kubectl wait deploy -n cert-manager cert-manager-webhook --for=condition=Ready --timeout=30s
error: timed out waiting for the condition on deployments/cert-manager-webhook

suggestion: don't show intermediate log lines. If people really want to have some intermediate output (e.g., to prevent CI's streaming from getting stale), they can use -v=6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this comment to #6247.

@jetstack-bot jetstack-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 1, 2023
@jetstack-bot jetstack-bot added release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 1, 2023
@maelvls
Copy link
Member

maelvls commented Aug 1, 2023

Now that most of the changes were moved to #6247, the release notes of this PR are off I think.

I wish we had kept this PR as "part 3" and created an intermediate "part 2" to avoid losing the context on conversations.

Could you also add some description to the PR to explain what #6119 achieves and why? I understand the change with the error message matching, but I don't understand the exit code changes.

@jetstack-bot jetstack-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 1, 2023
Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! All makes sense now.

/lgtm

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2023
@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon, maelvls

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot merged commit b93ec2f into cert-manager:master Aug 1, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme Indicates a PR directly modifies the ACME Issuer code area/testing Issues relating to testing dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants