Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --pause-on-fail #2036

Merged
merged 3 commits into from
Oct 13, 2023
Merged

Fix --pause-on-fail #2036

merged 3 commits into from
Oct 13, 2023

Conversation

joestringer
Copy link
Member

cilium connectivity test --pause-on-fail wasn't working before because the
individual tests would fail out of the goroutine and the wrapping logic
wouldn't properly detect and enforce the pause. Add it into the fail/fatal
commands to make sure it takes effect, even if that means blocking an
individual test goroutine on user input.

Fixes: #2033

@joestringer joestringer requested a review from a team as a code owner October 12, 2023 17:23
@joestringer joestringer temporarily deployed to ci October 12, 2023 17:23 — with GitHub Actions Inactive
@michi-covalent
Copy link
Contributor

i mildly prefer deferring the whole post-failure logic like this 8000548 so that we also get printing flows and capturing a sysdump on failure 💭

@joestringer
Copy link
Member Author

👍 I'm not opinionated at all. That commit doesn't work for me, but happy to try any alternative proposal.

@michi-covalent
Copy link
Contributor

alright let me try and figure out why it's not working for you 💭

@michi-covalent
Copy link
Contributor

alright let me try and figure out why it's not working for you 💭

sorry i'm going back and forth on this, now i like this better than 8000548. could you update this pr:

@joestringer
Copy link
Member Author

Hmm, I think it's pausing twice with that proposal:

[=] Test [to-cidr-external]
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-to-cidr' to namespace 'cilium-test'..
  [-] Scenario [to-cidr-external/pod-to-cidr]
  [.] Action [to-cidr-external/pod-to-cidr/external-1111-0: cilium-test/client-84bfddc76b-njg5j (10.244.3.22) -> external-1111 (1.1.1.1:443)]
  Pausing after action failure, press the Enter key to continue:

  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 --retry 3 --retry-all-errors --retry-delay 3 https://1.1.1.1:443" failed: command terminated with exit code 28
  ℹ️  curl output:


  📄 No flows recorded for peer cilium-test/client-84bfddc76b-njg5j during action external-1111-0
  📄 No flows recorded for peer external-1111 during action external-1111-0
  Pausing after action failure, press the Enter key to continue:

@joestringer joestringer temporarily deployed to ci October 12, 2023 21:29 — with GitHub Actions Inactive
@joestringer
Copy link
Member Author

(updated to follow your latest feedback)

This first pause is actually not great, because it's pausing before printing the debuggability steps:

  [.] Action [to-cidr-external/pod-to-cidr/external-1111-0: cilium-test/client-84bfddc76b-njg5j (10.244.3.22) -> external-1111 (1.1.1.1:443)]
  Pausing after action failure, press the Enter key to continue:

  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 --retry 3 --retry-all-errors --retry-delay 3 https://1.1.1.1:443" failed: command terminated with exit code 28
  ℹ️  curl output:

@michi-covalent
Copy link
Contributor

oh yeah i think we can just delete ace4127#diff-e3c06da3c6d8b3d471002757d5e992af824a412431f22e7f18f5f245f35552adR217. no need to call failCommon() from there 💡

@joestringer
Copy link
Member Author

Hmm, unfortunately that's the one that is printing after printing the other output that's relevant to debugging the test. Deleting that one will prevent there being double pause, but it also deletes the pause at the most useful time..

@joestringer
Copy link
Member Author

For what it's worth, I think that this model where individual tests know about the control flow of the overall testing and run Fatal() to try to interrupt the test flow breaks the abstraction a bit.. a test should either fail or not and then leave it up to the CLI infrastructure to appropriately terminate and implement all the other bells and whistles (like the pause on failure, sysdump, etc.)

@michi-covalent
Copy link
Contributor

would it help if we swap these 2 lines => https://github.com/cilium/cilium-cli/pull/2036/files#diff-66a6f82a060f753d91cfb453548d1f41fd695abde4debdb8cc5b982b6b1260beR314-R315

Signed-off-by: Joe Stringer <joe@cilium.io>
Move the --pause-on-fail and --collect-sysdump-on-failure flag support
into Test.failCommon() and call into there from other common paths.

Signed-off-by: Joe Stringer <joe@cilium.io>

Signed-off-by: Joe Stringer <joe@cilium.io>
If the user does --pause-on-fail and then does ctrl+c, it should
terminate the program rather than waiting for the user to input a line
first.

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member Author

joestringer commented Oct 12, 2023

I'm not sure about the significance of moving the flush until after the log, but swapping the order has the desired effect of seeing the failure first then pausing.

@joestringer joestringer temporarily deployed to ci October 12, 2023 21:40 — with GitHub Actions Inactive
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 13, 2023
@michi-covalent michi-covalent merged commit 8a38f71 into main Oct 13, 2023
19 checks passed
@michi-covalent michi-covalent deleted the pr/joe/pause-on-fail branch October 13, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cilium connectivity test --pause-on-fail doesn't work
3 participants