Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-33181: Fixed audit-logs sigterm failing to terminate gracefully #3972

Conversation

Joseph-Goergen
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 1, 2024
@openshift-ci-robot
Copy link

@Joseph-Goergen: This pull request references Jira Issue OCPBUGS-33181, which is invalid:

  • expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "4.15" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from csrwng and hasueki May 1, 2024 21:42
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 1, 2024
@@ -967,3 +966,12 @@ func buildKonnectivityVolumeClusterCerts(v *corev1.Volume) {
DefaultMode: pointer.Int32(0640),
}
}

func renderAuditLogScript(auditLogFilePath string) string {
var script = `#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you starting a new script when the command calling this is /bin/bash? Do you really need #!/bin/bash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, I was mainly just following what others do when calling /bin/bash -clike

- args:
- -c
- |
#!/bin/bash
set -euo pipefail
/usr/bin/oc get istag -n hypershift hypershift-operator:latest -ojsonpath='{.image.dockerImageReference}' > /installer-data/image-ref
command:
- /bin/bash
. There's quite a few more examples throughout the hypershift operator, so I thought it was common place for this repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this case, the previous code doesn't do that. So unless there's a good reason, please continue the pattern of the previous audit container setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, it was calling tail directly, so it couldn't have - if we're going to use bash it seems prudent to continue with euo pipefail - it will reduce the chance we introduce any of the common bugs those guard against.

@jeffnowicki
Copy link

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 2, 2024
@openshift-ci-robot
Copy link

@jeffnowicki: This pull request references Jira Issue OCPBUGS-33181, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


func renderAuditLogScript(auditLogFilePath string) string {
var script = `#!/bin/bash
trap exit SIGTERM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you relying on implicit behavior of bash when you call exit? Could we instead explicitly forward SIGTERM to children on SIGINT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

function cleanup() {
  for child in $( jobs -p ); do
    kill "${child}"
  done
  wait
}
trap cleanup EXIT

for (( i = 0; i < 999; i++ )); do
    child.sh $i
    wait $!
done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and tested(using a 4.15 release branch)

@@ -967,3 +966,23 @@ func buildKonnectivityVolumeClusterCerts(v *corev1.Volume) {
DefaultMode: pointer.Int32(0640),
}
}

func renderAuditLogScript(auditLogFilePath string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please export the function and re-use it in the OAPI deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@Joseph-Goergen Joseph-Goergen force-pushed the fix-audit-logs-SIGTERM-master branch 3 times, most recently from cafc72c to 5052adc Compare May 15, 2024 15:46
Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2024
@Joseph-Goergen
Copy link
Contributor Author

Joseph-Goergen commented May 15, 2024

@stevekuznetsov
If/when this merges, the pr for #3994 may have to rebase (if this gets in before it) and use the RenderAuditLogScript function for the audit-logs container. Or if that PR merges, I'll have to reconfigure this.

@Joseph-Goergen
Copy link
Contributor Author

/retest-required

2 similar comments
@rtheis
Copy link
Contributor

rtheis commented May 16, 2024

/retest-required

@rtheis
Copy link
Contributor

rtheis commented May 16, 2024

/retest-required

@rtheis
Copy link
Contributor

rtheis commented May 16, 2024

/test e2e-kubevirt-aws-ovn

done
wait
}
trap cleanup EXIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and it doesn't work unless EXIT is SIGTERM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtheis I tested this using the openshift-apiserver

# time kubectl delete pod -n master-coskjif10c6fhqphi0hg openshift-apiserver-55476ddfc-h4ldv
pod "openshift-apiserver-55476ddfc-h4ldv" deleted

real	1m47.992s
user	0m0.273s
sys	0m0.109s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joseph-Goergen my point exactly, it shouldn't take 1 minute 47 seconds to fully terminate the pod.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@rtheis
Copy link
Contributor

rtheis commented May 20, 2024

/retest-required

@Joseph-Goergen
Copy link
Contributor Author

/test e2e-kubevirt-aws-ovn

1 similar comment
@Joseph-Goergen
Copy link
Contributor Author

/test e2e-kubevirt-aws-ovn

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2024
Copy link
Contributor

@rtheis rtheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2024
@Joseph-Goergen
Copy link
Contributor Author

/retest-required

1 similar comment
@rtheis
Copy link
Contributor

rtheis commented May 21, 2024

/retest-required

@rtheis
Copy link
Contributor

rtheis commented May 21, 2024

/test e2e-kubevirt-aws-ovn

@Joseph-Goergen
Copy link
Contributor Author

@stevekuznetsov May I get a re-review? I had to rebase

@sjenning
Copy link
Contributor

/approve

Copy link
Contributor

openshift-ci bot commented May 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Joseph-Goergen, rtheis, sjenning, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 0d10c82 and 2 for PR HEAD 82df7b3 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b5576e4 and 1 for PR HEAD 82df7b3 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4148e4d and 0 for PR HEAD 82df7b3 in total

Copy link
Contributor

openshift-ci bot commented May 23, 2024

@Joseph-Goergen: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure 82df7b3 link false /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rtheis
Copy link
Contributor

rtheis commented May 23, 2024

/test e2e-kubevirt-aws-ovn

@openshift-merge-bot openshift-merge-bot bot merged commit 239a333 into openshift:main May 23, 2024
12 of 13 checks passed
@openshift-ci-robot
Copy link

@Joseph-Goergen: Jira Issue OCPBUGS-33181: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-33181 has been moved to the MODIFIED state.

In response to this:

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-hypershift-container-v4.17.0-202405240410.p0.g239a333.assembly.stream.el9 for distgit hypershift.
All builds following this will include this PR.

@Joseph-Goergen
Copy link
Contributor Author

/cherry-pick release-4.15

@openshift-cherrypick-robot

@Joseph-Goergen: new pull request created: #4089

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rtheis
Copy link
Contributor

rtheis commented May 24, 2024

/cherry-pick release-4.16

@openshift-cherrypick-robot

@rtheis: new pull request created: #4090

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants