Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-986: Reload router when defaultDestinationCA is updated #537

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bharath-b-rh
Copy link

PR has the changes for enhancement proposed to support custom CA bundle to be used by the router to verify the server's certificate for the reencrypt termination type when the destinationCA is not configured. A CA certificate bundle with service CA and the custom CA certificates is made available to router as DefaultDestinationCA and router should watch the CA bundle file and reload whenever an updated is detected.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 17, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 17, 2023

@bharath-b-rh: This pull request references CFE-986 which is a valid jira issue.

In response to this:

PR has the changes for enhancement proposed to support custom CA bundle to be used by the router to verify the server's certificate for the reencrypt termination type when the destinationCA is not configured. A CA certificate bundle with service CA and the custom CA certificates is made available to router as DefaultDestinationCA and router should watch the CA bundle file and reload whenever an updated is detected.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bharath-b-rh
Copy link
Author

/hold openshift/enhancements#1514

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2023
@lihongan
Copy link
Contributor

cc @ShudiLi


if err := r.watchVolumeMountDir(caBundleDir, reloadFn); err != nil {
log.V(0).Error(err, "failed to establish watch on CA bundle certificate directory")
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return the error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, but the other invocations of watchVolumeMountDir isn't returning error. So I assumed here too no need for router to exit with file watcher error. Let me update to return the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't look at how other invocations worked. Perhaps not returning an error is the right thing to do here. If the existing usage is correct then we should comment why return nil is the right thing to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the code comment relates to my earlier review comment, in which I suggest adding a check for empty r.defaultDestinationCAPath. If I understand you correctly, the length check is not strictly necessary, and the current logic just logs an error if the path is empty. However, an explicit length check would make the logic easier to verify in my opinion and would avoid a scary "failed to establish watch" error message on startup, so I think it is better to add the check.

(In any case, I'm fine having return nil here.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion incorporated.

@bharath-b-rh
Copy link
Author

/retest

@bharath-b-rh
Copy link
Author

/retest e2e-agnostic

Copy link
Contributor

openshift-ci bot commented Nov 21, 2023

@bharath-b-rh: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test e2e-agnostic
  • /test e2e-aws-serial
  • /test e2e-upgrade
  • /test images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-metal-ipi-ovn-router
  • /test perfscale-aws-ingress-perf

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-router-master-e2e-agnostic
  • pull-ci-openshift-router-master-e2e-aws-serial
  • pull-ci-openshift-router-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-router-master-e2e-upgrade
  • pull-ci-openshift-router-master-images
  • pull-ci-openshift-router-master-unit
  • pull-ci-openshift-router-master-verify

In response to this:

/retest e2e-agnostic

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bharath-b-rh
Copy link
Author

/test e2e-agnostic

@bharath-b-rh
Copy link
Author

/test e2e-upgrade

@bharath-b-rh
Copy link
Author

/cc @Miciah

@openshift-ci openshift-ci bot requested a review from Miciah November 27, 2023 15:11
@ShudiLi
Copy link
Member

ShudiLi commented Nov 28, 2023

Tested it with 4.15.0-0.test-2023-11-28-002931-ci-ln-z8rzn72-latest
`

  1. updated cm admin-ca-bundle by deleted it and created it again with new certifications

  2. deleted cm admin-ca-bundle

  3. check the route log
    % oc -n openshift-ingress logs router-int1-86c8bf4447-8475m --tail=5
    I1128 03:36:53.035226 1 router.go:393] template "msg"="got watch event from fsnotify" "operation"="CHMOD" "path"="/var/run/configmaps/ca-trust/..2023_11_28_03_35_37.1258454796"
    E1128 03:36:53.046496 1 router.go:1503] template "msg"="failed to reload router after detecting changes in CA bundle certificate directory" "error"="error reloading router: exit status 1\n[NOTICE] (116) : haproxy version is 2.6.13-234aa6d\n[NOTICE] (116) : path to executable is /usr/sbin/haproxy\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:217] : 'server be_secure:default:myreen2/pod:apach-srv2:sec-apach2:sec-apach2:10.129.2.11:8443' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:264] : 'server be_secure:openshift-console:console/pod:console-866fb864f6-gmfgw:console:https:10.128.0.71:8443' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:265] : 'server be_secure:openshift-console:console/pod:console-866fb864f6-zgc5n:console:https:10.130.0.40:8443' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (bad object header).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:322] : 'server be_secure:openshift-monitoring:alertmanager-main/pod:alertmanager-main-1:alertmanager-main:web:10.128.2.17:9095' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (nested asn1 error).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:323] : 'server be_secure:openshift-monitoring:alertmanager-main/pod:alertmanager-main-0:alertmanager-main:web:10.131.0.15:9095' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:341] : 'server be_secure:openshift-monitoring:prometheus-k8s/pod:prometheus-k8s-0:prometheus-k8s:web:10.128.2.18:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:342] : 'server be_secure:openshift-monitoring:prometheus-k8s/pod:prometheus-k8s-1:prometheus-k8s:web:10.131.0.14:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:360] : 'server be_secure:openshift-monitoring:prometheus-k8s-federate/pod:prometheus-k8s-0:prometheus-k8s:web:10.128.2.18:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:361] : 'server be_secure:openshift-monitoring:prometheus-k8s-federate/pod:prometheus-k8s-1:prometheus-k8s:web:10.131.0.14:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:379] : 'server be_secure:openshift-monitoring:thanos-querier/pod:thanos-querier-8c994b889-9grgc:thanos-querier:web:10.128.2.15:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : [/var/lib/haproxy/conf/haproxy.config:380] : 'server be_secure:openshift-monitoring:thanos-querier/pod:thanos-querier-8c994b889-hpvpp:thanos-querier:web:10.131.0.9:9091' : Couldn't open the ca-file '/var/run/configmaps/ca-trust/ca-bundle.crt' (too long).\n[ALERT] (116) : config : 'ca-file' : unable to load /var/run/configmaps/ca-trust/ca-bundle.crt.\n[ALERT] (116) : config : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config\n[ALERT] (116) : config : Fatal errors found in configuration.\n"
    I1128 05:48:25.039996 1 router.go:393] template "msg"="got watch event from fsnotify" "operation"="CHMOD" "path"="/var/run/configmaps/ca-trust/..2023_11_28_05_48_25.2078201465"
    I1128 05:48:25.064399 1 router.go:652] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
    I1128 05:48:25.064420 1 router.go:1506] template "msg"="router was reloaded after detecting changes in CA bundle certificate directory"

  4. check the log
    % oc -n openshift-ingress logs router-int1-86c8bf4447-8475m | grep "router was reloaded"
    I1128 02:34:57.130922 1 router.go:1506] template "msg"="router was reloaded after detecting changes in CA bundle certificate directory"
    I1128 02:41:39.123943 1 router.go:1506] template "msg"="router was reloaded after detecting changes in CA bundle certificate directory"
    I1128 03:35:37.100220 1 router.go:1506] template "msg"="router was reloaded after detecting changes in CA bundle certificate directory"
    I1128 05:48:25.064420 1 router.go:1506] template "msg"="router was reloaded after detecting changes in CA bundle certificate directory"

  5. check the ca-bundle.crt under the router pod, which is updated as expected.
    % oc -n openshift-ingress rsh router-int1-86c8bf4447-8475m
    sh-4.4$ env | grep -i dest
    DEFAULT_DESTINATION_CA_PATH=/var/run/configmaps/ca-trust/ca-bundle.crt
    sh-4.4$ cat /var/run/configmaps/ca-trust/ca-bundle.crt
    -----BEGIN CERTIFICATE-----
    MIIDUTCCAjmgAwIBAgIIaBqBoB4eWkcwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE
    Awwrb3BlbnNoaWZ0LXNlcnZpY2Utc2VydmluZy1zaWduZXJAMTcwMTEzMzg1NjAe
    Fw0yMzExMjgwMTEwNTZaFw0yNjAxMjYwMTEwNTdaMDYxNDAyBgNVBAMMK29wZW5z
    aGlmdC1zZXJ2aWNlLXNlcnZpbmctc2lnbmVyQDE3MDExMzM4NTYwggEiMA0GCSqG
    SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDHJIzrDwikWgCR9t+Db1/XIgxAmZej+xX9
    fqCdg25AvurBQGBf6H0FmuTd4KPXofyJIn2StTTSl3Mn0bU4w6mocZOshB8r3yw8
    NS3SZr6EttHFFl/qkUCU9R8+IHtrGoERrdGewbT+MVTfnprWqyFf/hNW1HBpd9Tl
    RDSCSWGb+sydmByhMdes9ZCYMLrBIPn5kfDTNVsQmfyb2f3LQib4spAbNA5VN2gE
    Y2hnKkAcH9k8pUKRnfHLWFHvjwerht4aYMVIhsFw2MNzmCs4P1wl8bJe8bogfvQb
    yga9LbNhB0E0x5ry6ve8tRi/wbKd7wD8cyoqNIZz2/B53xTuEj8lAgMBAAGjYzBh
    MA4GA1UdDwEB/wQEAwICpDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBRWgpRk
    LRr8h+01ubDTGuGw1uOQ+zAfBgNVHSMEGDAWgBRWgpRkLRr8h+01ubDTGuGw1uOQ
    +zANBgkqhkiG9w0BAQsFAAOCAQEAwwfG/ahwwBLMTvTbYDQfiMsns+Pm2u4zMxbD
    mYKAGBHFL5Ftp/A/JuN1lSzVHyG8nebV0yaG9zfu4NFpFzZAAH4Gws1NwYqMLhfI
    afSiCJijGgh6OcxUixR9sQzo/XrSDVKjqvF8xLBPaQPU+5RblWY7BVBtUHsjx+ho
    PJFm4U8ID5gAdEmO/xDybs3OjiAw+b7vMSM4vtgYSGah2IfXVQ1+O7aIBMPJLKup
    BQH0nOrsoVOQ5ISX2i0GxHvz7HVLruIBV2mVnNeD9X3uNr6etfzrMU2IxAf+1E4Q
    r0AXh0n8+UvpaMB7DKW+qgIEs7uSXOaZVuDbHv7lRfAuCo+l2Q==
    -----END CERTIFICATE-----
    sh-4.4$
    `

@ShudiLi
Copy link
Member

ShudiLi commented Nov 28, 2023

/label qe-approved
thanks

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 28, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 28, 2023

@bharath-b-rh: This pull request references CFE-986 which is a valid jira issue.

In response to this:

PR has the changes for enhancement proposed to support custom CA bundle to be used by the router to verify the server's certificate for the reencrypt termination type when the destinationCA is not configured. A CA certificate bundle with service CA and the custom CA certificates is made available to router as DefaultDestinationCA and router should watch the CA bundle file and reload whenever an updated is detected.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bharath-b-rh
Copy link
Author

Thank you @ShudiLi!

@snarayan-redhat
Copy link

/label docs-approved

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Nov 28, 2023
@candita
Copy link
Contributor

candita commented Dec 6, 2023

/assign @gcs278
/assign @Miciah

Copy link
Contributor

openshift-ci bot commented Dec 14, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gcs278. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gcs278
Copy link
Contributor

gcs278 commented Dec 14, 2023

Thanks for the update. This is a pretty simple PR. As far as my review go:
/lgtm

But I'll let @Miciah provide the approve.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2023
@bharath-b-rh
Copy link
Author

Thank you @gcs278!

// watchCABundleCert watches the directory containing the CA bundle certificate
// and reloads the router if the directory contents change.
func (r *templateRouter) watchCABundleCert() error {
caBundleDir := filepath.Dir(r.defaultDestinationCAPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.defaultDestinationCAPath could be empty.

Suggested change
caBundleDir := filepath.Dir(r.defaultDestinationCAPath)
if len(r.defaultDestinationCAPath) == 0 {
return nil
}
caBundleDir := filepath.Dir(r.defaultDestinationCAPath)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion incorporated.

caBundleDir := filepath.Dir(r.defaultDestinationCAPath)

reloadFn := func() {
log.V(0).Info("router slated for reload after detecting changes in CA bundle certificate directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more verbose than the log message for reloading the default certificate ("reloading to get updated default certificate") or client CA CRL ("reloading to get updated client CA CRL"). Is there a reason not to keep it simple (e.g., "reloading to get updated default destination CA certificate bundle")?

I do like that the client CA CRL reload function also logs the file name; that might be useful here too.

log.V(0).Info("reloading to get updated client CA CRL", "name", crl.CRLFilename, "have CRLs", haveCRLs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion incorporated.


if err := r.watchVolumeMountDir(caBundleDir, reloadFn); err != nil {
log.V(0).Error(err, "failed to establish watch on CA bundle certificate directory")
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the code comment relates to my earlier review comment, in which I suggest adding a check for empty r.defaultDestinationCAPath. If I understand you correctly, the length check is not strictly necessary, and the current logic just logs an error if the path is empty. However, an explicit length check would make the logic easier to verify in my opinion and would avoid a scary "failed to establish watch" error message on startup, so I think it is better to add the check.

(In any case, I'm fine having return nil here.)

Copy link
Contributor

openshift-ci bot commented Jan 19, 2024

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2024
@bharath-b-rh
Copy link
Author

/retest

Copy link
Contributor

openshift-ci bot commented Jan 23, 2024

@bharath-b-rh: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2024
@bharath-b-rh
Copy link
Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants