Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgit2: add contextual logging to subtransports #778

Merged
merged 2 commits into from Jun 14, 2022

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Jun 9, 2022

Debugging connection issues can be extremely difficult, even more so at scale or when concurrent connections are required to trigger specific issues.

Changes:

  • Add a correlation identifier (reconcileID) for each reconciliation, which allows for greater traceability when going through all the reconciliation operations - including at transport level.
  • Add transportType to segregate HTTP and SSH transport logging.
  • SSH operations are now enriched with addr containing server address, and HTTP url.

Example of stdout output when grepping by a specific TransportID:

{"level":"Level(-2)","ts":"2022-06-13T07:27:09.781+0100","logger":"controller.gitrepository","msg":"creating new ssh session","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:09.863+0100","logger":"controller.gitrepository","msg":"run on remote","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22","cmd":"git-upload-pack '/pjbgf/flux-testing.git'"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.074+0100","logger":"controller.gitrepository","msg":"sshSmartSubtransport.Close()","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.074+0100","logger":"controller.gitrepository","msg":"session.Close()","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.074+0100","logger":"controller.gitrepository","msg":"close client","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.074+0100","logger":"controller.gitrepository","msg":"sshSmartSubtransport.Close()","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.074+0100","logger":"controller.gitrepository","msg":"close client","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"info","ts":"2022-06-13T07:27:10.075+0100","logger":"controller.gitrepository","msg":"no changes since last reconcilation: observed revision 'main/d35781accf1ea8078de18e154622486042bca4cc'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7"}
{"level":"info","ts":"2022-06-13T07:27:10.075+0100","msg":"no changes since last reconcilation: observed revision 'main/d35781accf1ea8078de18e154622486042bca4cc'","name":"ssh-ed25519-bitbucket","namespace":"flux-system","reconciler kind":"GitRepository","reason":"GitOperationSucceeded","annotations":null}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.080+0100","logger":"controller.gitrepository","msg":"sshSmartSubtransport.Close()","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}
{"level":"Level(-2)","ts":"2022-06-13T07:27:10.080+0100","logger":"controller.gitrepository","msg":"close client","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"ssh-ed25519-bitbucket","namespace":"flux-system","cid":"a44976a7-9461-4af7-88f4-b3d481fa5ca7","transportType":"ssh","addr":"bitbucket.org:22"}

@pjbgf pjbgf added the area/git Git related issues and pull requests label Jun 9, 2022
@pjbgf
Copy link
Member Author

pjbgf commented Jun 9, 2022

@darkowlzz following up from your comments (#776 (comment)), this PR now is only based on contextual logging. If there are no further changes required, I will go on and do the same for HTTP.

@stefanprodan
Copy link
Member

IMO this should print "level":"trace" instead of Level(-2). I think we have this in IAC.

@pjbgf
Copy link
Member Author

pjbgf commented Jun 9, 2022

IMO this should print "level":"trace" instead of Level(-2). I think we have this in IAC.

Yep, it feels like this is something upstream, as currently we do a very similar approach for both debug and trace, and the former works fine. I will take a look.

addr := net.JoinHostPort(u.Hostname(), port)

if t.ctx == nil {
t.ctx = opts.Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI seems to be failing because the test doesn't sets the context in

AddTransportOptions(transportOptsURL, TransportOptions{
, and this opts.Context remains nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, now both ctx and logger have safe defaults.

@darkowlzz
Copy link
Contributor

darkowlzz commented Jun 9, 2022

IMO this should print "level":"trace" instead of Level(-2). I think we have this in IAC.

@stefanprodan looks like it's the same in IAC:

{"level":"debug","ts":"2022-06-09T18:53:33.827+0530","logger":"controller.imageupdateautomation","msg":"updating with setters according to image policies","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"update-test","namespace":"image-auto-test-y1dxf","count":1,"manifests-path":"/tmp/image-auto-test-y1dxf-image-auto-test-ktspo3193755675"}
{"level":"Level(-2)","ts":"2022-06-09T18:53:33.827+0530","logger":"controller.imageupdateautomation","msg":"found policy","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"update-test","namespace":"image-auto-test-y1dxf","namespace":"image-auto-test-y1dxf","name":"policy-zfkh7","latest-image":"helloworld:v1.0.0"}

After some investigation, it looks like this is because zap doesn't know what -2 level means https://github.com/uber-go/zap/blob/e06e09a6d396031c89b87383eef3cad6f647cf2c/zapcore/level.go#L88 .
In fluxcd/pkg/runtime, we set trace level to be debug level - 1 https://github.com/fluxcd/pkg/blob/3357f39663bad8cb6315edb0d2a6b97f560bdba9/runtime/logger/logger.go#L36.
In zap, debug level is already -1. So -1 - 1 = -2. And the zap level stringer doesn't know about this level, resulting in Level(-2).

And looks like it was known at the time of implementing it based on https://github.com/fluxcd/pkg/blob/3357f39663bad8cb6315edb0d2a6b97f560bdba9/runtime/logger/logger.go#L32-L35 .

pkg/git/libgit2/managed/http.go Outdated Show resolved Hide resolved
pkg/git/libgit2/managed/http.go Outdated Show resolved Hide resolved
@pjbgf pjbgf force-pushed the improved-logging branch 2 times, most recently from 6a2d56f to 6a4741c Compare June 10, 2022 10:03
@pjbgf pjbgf force-pushed the improved-logging branch 3 times, most recently from b877bc2 to e66548f Compare June 10, 2022 16:07
@pjbgf pjbgf added this to the GA milestone Jun 10, 2022
@pjbgf pjbgf changed the title libgit2: improve subtransport logging libgit2: add contextual logging to subtransports Jun 10, 2022
pkg/git/libgit2/managed/http.go Outdated Show resolved Hide resolved
pkg/git/libgit2/managed/http.go Outdated Show resolved Hide resolved
pkg/git/libgit2/managed/ssh.go Outdated Show resolved Hide resolved
pkg/git/libgit2/managed/ssh.go Outdated Show resolved Hide resolved
Debugging connection issues can be extremely difficult, even more so at scale or when
concurrent connections are required to trigger specific issues.

Changes:
- Add a correlation identifier for each reconciliation, which allows for greater traceability when
going through all the reconciliation operations - including at transport level.
- Add transportType to segregate HTTP and SSH transport logging.
- SSH operations are now enriched with addr containing server address, and HTTP url.

Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

"transportType", "http",
"url", opts.TargetURL)
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine how it is, but I'd expect such one time configurations to be done at the very beginning of the function when we have all the things we need to do it.

Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
It'd be good to add the correlation ID to other reconcilers as well.

@pjbgf
Copy link
Member Author

pjbgf commented Jun 13, 2022

@darkowlzz added an additional commit that rolls out correlation ID across the other reconcilers.

@hiddeco
Copy link
Member

hiddeco commented Jun 13, 2022

How does the roll out of the CID combine with the upstream controller-runtime changes around logging that's still ahead of us? As this adds a reconcile ID.

xref: kubernetes-sigs/controller-runtime#1827

controllers/bucket_controller.go Outdated Show resolved Hide resolved
@pjbgf pjbgf force-pushed the improved-logging branch 2 times, most recently from 8d14f0c to 628a9e3 Compare June 13, 2022 15:31
@pjbgf pjbgf requested a review from hiddeco June 13, 2022 15:58
GitRepository introduced correlation ID to improve
transport level logging. This change aligns the other
reconcilers to the same approach.

Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
@pjbgf pjbgf merged commit 19292d4 into fluxcd:main Jun 14, 2022
@pjbgf pjbgf deleted the improved-logging branch June 14, 2022 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/git Git related issues and pull requests
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants