Skip to content

Commit

Permalink
PR feedback: remove new stat, use real sleep in integration test
Browse files Browse the repository at this point in the history
The real sleep helps avoid a ~1/500 chance of a test flake due to a
race.

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
  • Loading branch information
mpuncel committed Dec 16, 2020
1 parent 49e689d commit a7cfa59
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ If health check is configured, the cluster has an additional statistics tree roo
network_failure, Counter, Number of health check failures due to network error
verify_cluster, Counter, Number of health checks that attempted cluster name verification
healthy, Gauge, Number of healthy members
goaway_error, Counter, Number of health check failures due to a HTTP/2 GOAWAY frame with a code other than NO_ERROR from the upstream (subset of network_failure)

.. _config_cluster_manager_cluster_stats_outlier_detection:

Expand Down
1 change: 0 additions & 1 deletion source/common/upstream/health_checker_base_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ namespace Upstream {
#define ALL_HEALTH_CHECKER_STATS(COUNTER, GAUGE) \
COUNTER(attempt) \
COUNTER(failure) \
COUNTER(goaway_error) \
COUNTER(network_failure) \
COUNTER(passive_failure) \
COUNTER(success) \
Expand Down
1 change: 0 additions & 1 deletion source/common/upstream/health_checker_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,6 @@ void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onGoAway(
handleFailure(envoy::data::core::v3::NETWORK);
}

parent_.stats_.goaway_error_.inc();
if (client_) {
expect_reset_ = true;
client_->close();
Expand Down
15 changes: 13 additions & 2 deletions test/integration/health_check_integration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -319,9 +319,20 @@ TEST_P(HttpHealthCheckIntegrationTest, SingleEndpointGoAwayError) {
test_server_->waitForCounterGe("cluster.cluster_1.health_check.failure", 1);
EXPECT_EQ(0, test_server_->counter("cluster.cluster_1.health_check.success")->value());
EXPECT_EQ(1, test_server_->counter("cluster.cluster_1.health_check.failure")->value());
test_server_->waitForCounterGe("cluster.cluster_1.health_check.goaway_error", 1);

// Advance time to cause another health check.
// Now we want to advance time to cause another health check. Ideally the
// waitForDisconnect() step above would guarantee that the health checker has
// seen the GOAWAY, closed the health check connection, and finally enabled
// the health check interval timer so we can advance simulated time to cause
// another check. However, due to special handling for protocol errors, there
// is a race where both the downstream (health checker) and upstream (fake
// upstream) close the connection, so we may exit waitForDisconnect() before
// the health checker has actually seen the GOAWAY and enabled the timer. To
// address this, we use a small real sleep before advancing thee simulated
// time system to give the health checker time to catch up. If we advance time
// before the health check timer is enabled, the next health check will never
// happen.
timeSystem().realSleepDoNotUseWithoutScrutiny(std::chrono::milliseconds(100));
timeSystem().advanceTimeWait(std::chrono::milliseconds(500));

ASSERT_TRUE(clusters_[cluster_idx].host_upstream_->waitForHttpConnection(
Expand Down

0 comments on commit a7cfa59

Please sign in to comment.