investigate flaky test-crypto-timing-safe-equal-benchmarks #38226

Trott · 2021-04-13T13:30:53Z

pummel/test-crypto-timing-safe-equal-benchmarks seems to fail frequently on LinuxONE. That's often a sign that a test (or some Node.js internal code) is not prepared to run on very fast CPUs. (In CI, our LinuxONE host is very fast and often finds race conditions of the "oh, I didn't expect it to run that quickly" variety).

Is this a bug in crypto? A bug in the test? Something else?

@nodejs/crypto @nodejs/testing

Doesn't seem to be a LinuxONE platform team so pinging a few IBM/Red Hat folks who are active in the Build team: @richardlau @AshCripps @mhdawson Probably not the exact right people, but they probably know who to loop in.

Test: test-crypto-timing-safe-equal-benchmarks
Platform: linuxone
Console Output:

00:09:15 not ok 2944 pummel/test-crypto-timing-safe-equal-benchmarks
00:09:18   ---
00:09:18   duration_ms: 2.136
00:09:18   severity: fail
00:09:18   exitcode: 1
00:09:18   stack: |-
00:09:18     node:assert:402
00:09:18         throw err;
00:09:18         ^
00:09:18     
00:09:18     AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=5.715867808741522)
00:09:18         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel7-s390x/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
00:09:18         at Module._compile (node:internal/modules/cjs/loader:1108:14)
00:09:18         at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
00:09:18         at Module.load (node:internal/modules/cjs/loader:988:32)
00:09:18         at Function.Module._load (node:internal/modules/cjs/loader:828:14)
00:09:18         at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
00:09:18         at node:internal/main/run_main_module:17:47 {
00:09:18       generatedMessage: false,
00:09:18       code: 'ERR_ASSERTION',
00:09:18       actual: false,
00:09:18       expected: true,
00:09:18       operator: '=='
00:09:18     }
00:09:18   ...

Build Links: https://ci.nodejs.org/job/node-test-commit-linuxone/26667/nodes=rhel7-s390x/console

The text was updated successfully, but these errors were encountered:

Trott · 2021-04-29T15:34:08Z

This has now started failing somewhat consistently on LinuxONE in CI. I don't know if that means the test just isn't reliable or if it means there's a timing issue with faster processors or what. @nodejs/crypto

not ok 2961 pummel/test-crypto-timing-safe-equal-benchmarks
  ---
  duration_ms: 2.12
  severity: fail
  exitcode: 1
  stack: |-
    node:assert:412
        throw err;
        ^
    
    AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=7.219867083599476)
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel7-s390x/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
        at Module._compile (node:internal/modules/cjs/loader:1109:14)
        at Object.Module._extensions..js (node:internal/modules/cjs/loader:1138:10)
        at Module.load (node:internal/modules/cjs/loader:989:32)
        at Function.Module._load (node:internal/modules/cjs/loader:829:14)
        at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
        at node:internal/main/run_main_module:17:47 {
      generatedMessage: false,
      code: 'ERR_ASSERTION',
      actual: false,
      expected: true,
      operator: '=='
    }
  ...

https://ci.nodejs.org/job/node-test-commit-linuxone/nodes=rhel7-s390x/26963/consoleText

Trott · 2021-04-29T15:39:08Z

Stress tests for bisecting:

https://ci.nodejs.org/job/node-stress-single-test/273/ (6% success, 94% failure) 6ca785b BAD
https://ci.nodejs.org/job/node-stress-single-test/274/ (52% success, 48% failure) 13c931a GOOD (normally, this would be marked "BAD" but uh, we're grading on a curve)

Bisecting: 70 revisions left to test after this (roughly 6 steps)
[4f11b8b] doc: fix typo in buffer.md

https://ci.nodejs.org/job/node-stress-single-test/275/ (100% success, 0% failure) 4f11b8b GOOD (but this commit is between the other two commits so that's surprising. ~~This one ran on test-ibm-rhel7-s390x-4 whereas the other two ran on test-ibm-rhel7-s390x-3 so maybe that's a difference that matters?~~)

Bisecting: 35 revisions left to test after this (roughly 5 steps)
[f27b7cf] fs: aggregate errors in fsPromises to avoid error swallowing

https://ci.nodejs.org/job/node-stress-single-test/276/ (100% success, 0% failure) f27b7cf GOOD (and this one was on test-ibm-rhel7-s390x-3, so that host itself isn't the problem)

Bisecting: 17 revisions left to test after this (roughly 4 steps)
[87aca07] doc: clarify that fs.Dir async iterator closes automatically

https://ci.nodejs.org/job/node-stress-single-test/277/ (99% success, 1% failure) 87aca07 GOOD (again, grading on a curve here, as current HEAD has 94% failure)

Bisecting: 8 revisions left to test after this (roughly 3 steps)
[896e5af] tools: remove fixer for non-ascii-character ESLint custom rule

https://ci.nodejs.org/job/node-stress-single-test/278/ (99% success, 1% failure) 896e5af GOOD (on a curve)

Bisecting: 4 revisions left to test after this (roughly 2 steps)
[1adcae9] doc: add try/catch in http2 respondWithFile example

https://ci.nodejs.org/job/node-stress-single-test/279/ (14% success, 86% failure) 1adcae9 BAD

Bisecting: 1 revision left to test after this (roughly 1 step)
[7919ced] lib: harden lint checks for globals

https://ci.nodejs.org/job/node-stress-single-test/280/ (38% success, 62% failure) 7919ced BAD

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[4af15df] src: fix validation of negative offset to avoid abort

https://ci.nodejs.org/job/node-stress-single-test/281/ (99% success, 1% failure) 4af15df GOOD (curve)

7919ced is the first bad commit
commit 7919ced
Author: Antoine du Hamel duhamelantoine1995@gmail.com
Date: Mon Apr 26 18:03:54 2021 +0200

lib: harden lint checks for globals

PR-URL: https://github.com/nodejs/node/pull/38419
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>

:040000 040000 8080a76b162a40209ec5b9397db3945fa36ee143 247e5c50613229df3f9776ce9637237146682dd0 M

Trott · 2021-04-29T16:42:31Z

I know they're not active in this repo anymore, but ping @not-an-aardvark: Hey, does this look like an issue with the crypto safe timing test? Or does it look like a bug? The machine that it's failing on is very fast, if that helps. Would increasing numTimes from 1e5 to a larger value help or does that defeat the purpose of the test?

not-an-aardvark · 2021-04-29T17:54:28Z

From looking at a few of the failing cases, it seems like the error message is always "timingSafeEqual should not leak information from its execution time (t=<some positive number>)". Assuming that the test is working properly, this would be a signal that timingSafeEqual is consistently running faster for equal inputs than for unequal inputs (at least on some hardware/builds), which would be bad and might be worth investigating. That said, in the past this test has had various false positives due to artifacts of the test setup itself.

The test fails if it determines that a timing difference is statistically significant. In general, increasing numTrials would make the test more sensitive to small timing differences (i.e. maybe even more likely to fail), since it would have a larger dataset to do statistical analysis on.

In this case, it seems like the t-values are routinely 9 and above, which wouldn't plausibly happen by chance. So we can be pretty confident that it's either (a) a legitimate bug in crypto.timingSafeEqual, or (b) a bug in the test setup, which wouldn't be fixed by changing numTrials.

Incidentally, when we were originally implementing crypto.timingSafeEqual in #8040, we considered using a "double HMAC" implementation rather than CRYPTO_memcmp, but decided on CRYPTO_memcmp because we didn't want to leak the boolean condition of whether the comparison was successful.*

Since then, there have been significant advances in side-channel attacks (Meltdown/Spectre and others), which has led me to believe that double HMAC could be a good defense-in-depth measure against increasingly powerful compilers and CPU branch prediction that could optimize away CRYPTO_memcmp.

It might be worth updating crypto.timingSafeEqual to use both double HMAC and CRYPTO_memcmp (i.e. given buffers a and b, return the result of CRYPTO_memcmp(sha256hmac(randomKey, a), sha256hmac(randomKey, b))).

*I also have some doubts about whether it's really possible to avoid leaking the boolean result, due to branch prediction. This might also be why this test keeps failing. In any case, combining CRYPTO_memcmp and double HMAC wouldn't make anything worse in that regard.

Trott · 2021-04-29T23:09:17Z

Bisect results:

7919ced0c97e9a5b17e6042e0b57bc911d23583d is the first bad commit
commit 7919ced0c97e9a5b17e6042e0b57bc911d23583d
Author: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date:   Mon Apr 26 18:03:54 2021 +0200

    lib: harden lint checks for globals
    
    PR-URL: https://github.com/nodejs/node/pull/38419
    Reviewed-By: James M Snell <jasnell@gmail.com>
    Reviewed-By: Darshan Sen <raisinten@gmail.com>

:040000 040000 8080a76b162a40209ec5b9397db3945fa36ee143 247e5c50613229df3f9776ce9637237146682dd0 M

Trott · 2021-04-29T23:20:07Z

The bisect results are certainly surprising. Going to run a bunch more tests to confirm them.

Stress tests against 7919ced:

https://ci.nodejs.org/job/node-stress-single-test/282/ (77% failures)
https://ci.nodejs.org/job/node-stress-single-test/283/ (72% failures)
https://ci.nodejs.org/job/node-stress-single-test/284/ (42% failures)

Stress tests on 4af15df (the last change before the one above that is being tentatively blamed for the problem):

https://ci.nodejs.org/job/node-stress-single-test/285/ (16% failures)
https://ci.nodejs.org/job/node-stress-single-test/286/ (19% failures)
https://ci.nodejs.org/job/node-stress-single-test/287/ (11% failures)

Fixes: nodejs#38226

Trott · 2021-04-30T00:10:41Z

In this case, it seems like the t-values are routinely 9 and above, which wouldn't plausibly happen by chance.

When I increase numTimes to 1e6, I got a whopping t value of 37.871320438939634 which I imagine is not moving in the right direction.....

Trott · 2021-04-30T00:13:37Z

The bisect results are certainly surprising. Going to run a bunch more tests to confirm them.

Stress tests against 7919ced:

https://ci.nodejs.org/job/node-stress-single-test/282/ (77% failures)

https://ci.nodejs.org/job/node-stress-single-test/283/ (72% failures)

https://ci.nodejs.org/job/node-stress-single-test/284/ (42% failures)

Stress tests on 4af15df (the last change before the one above that is being tentatively blamed for the problem):

https://ci.nodejs.org/job/node-stress-single-test/285/ (16% failures)

https://ci.nodejs.org/job/node-stress-single-test/286/ (19% failures)

https://ci.nodejs.org/job/node-stress-single-test/287/ (11% failures)

Those sure seem consistent with 7919ced making the problem much worse, presumably as some indirect side effect. I wonder if the next step is to sorta bisect the individual line changes in that commit (since outside of the lint tool changes, they are mostly self-contained) to get to the bottom of which one (if it's indeed only one) is causing the issue.

Trott · 2021-04-30T01:09:51Z

So, I bisected on individual file changes and the one that is causing the issue is Trott@845adb7 which is totally puzzling to me, but maybe @not-an-aardvark or @aduh95 would have a reasonable explanation (assuming my results hold up to further scrutiny, which I will definitely be applying to confirm).

Fixes: nodejs#38226

Refs: nodejs#38226

Refs: nodejs#38226 (comment)

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also makes them far more timing safe. Refs: nodejs#38226 (comment) Fixes: nodejs#38226

Fixes: #38226 PR-URL: #38476 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Ruy Adorno <ruyadorno@github.com> Reviewed-By: James M Snell <jasnell@gmail.com>

mhdawson · 2021-05-19T14:10:42Z

@Trott should we exclude the failing test until we resolve the issue? Sounds like you are on top of the PRs that might fix it, but seems like it may still take a little while until there is agreement on the fix?

I'd be happy to open a PR to exclude for now.

mhdawson · 2021-05-19T14:11:01Z

What I meant is mark flaky

mhdawson · 2021-09-09T21:52:36Z

Failing consistently when testing this PR: #39977 (comment)

mhdawson · 2021-09-09T21:54:42Z

@Trott I'm happy to submit a PR to mark as flaky if the fix you mention is not going to land soon. Any objections to that.

richardlau · 2021-09-10T11:11:49Z

Failed on arm64: https://ci.nodejs.org/job/node-test-commit-arm/38939/nodes=centos7-arm64-gcc8/

richardlau · 2021-09-10T11:21:54Z

Opened a PR to mark the test as flaky: #40065

PR-URL: #40065 Refs: #38226 Reviewed-By: Ash Cripps <acripps@redhat.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de>

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also makes them far more timing safe. Refs: nodejs#38226 (comment) Fixes: nodejs#38226

tniessen · 2022-05-29T20:43:00Z

I tried rewriting the test to avoid branching within the benchmark entirely, to get rid of V8 optimizations etc. It only benchmarks calls that return false because there is no guarantee that returning true is going to take exactly the same time. (Also, this is usually what matters for security: the timing should give no information about what part of the buffer was incorrect.)

diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
index 3c6e31e829..db8f85169e 100644
--- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -9,61 +9,60 @@ if (!common.enoughTestMem)
 const assert = require('assert');
 const crypto = require('crypto');
 
-function runOneBenchmark(compareFunc, firstBufFill, secondBufFill, bufSize) {
-  return eval(`
-      const firstBuffer = Buffer.alloc(bufSize, firstBufFill);
-      const secondBuffer = Buffer.alloc(bufSize, secondBufFill);
-
-      const startTime = process.hrtime();
-      const result = compareFunc(firstBuffer, secondBuffer);
-      const endTime = process.hrtime(startTime);
-
-      // Ensure that the result of the function call gets used, so it doesn't
-      // get discarded due to engine optimizations.
-      assert.strictEqual(result, firstBufFill === secondBufFill);
-
-      endTime[0] * 1e9 + endTime[1];
-    `);
-}
+const numTrials = 1e5;
+const bufSize = 16384;
+const readOnlyTestBuffer = crypto.randomBytes(bufSize);
 
 function getTValue(compareFunc) {
-  const numTrials = 1e5;
-  const bufSize = 10000;
   // Perform benchmarks to verify that timingSafeEqual is actually timing-safe.
 
-  const rawEqualBenches = Array(numTrials);
-  const rawUnequalBenches = Array(numTrials);
-
-  for (let i = 0; i < numTrials; i++) {
-    if (Math.random() < 0.5) {
-      // First benchmark: comparing two equal buffers
-      rawEqualBenches[i] = runOneBenchmark(compareFunc, 'A', 'A', bufSize);
-      // Second benchmark: comparing two unequal buffers
-      rawUnequalBenches[i] = runOneBenchmark(compareFunc, 'B', 'C', bufSize);
-    } else {
-      // Flip the order of the benchmarks half of the time.
-      rawUnequalBenches[i] = runOneBenchmark(compareFunc, 'B', 'C', bufSize);
-      rawEqualBenches[i] = runOneBenchmark(compareFunc, 'A', 'A', bufSize);
-    }
+  // Store all results in a single array and separate it later to avoid
+  // branching in the benchmark loop.
+  const measurements = Array(2 * numTrials).fill(0);
+  const testBuffer = Buffer.from(readOnlyTestBuffer);
+
+  // Run the actual benchmark. Avoid all branching (except the loop condition)
+  // to avoid conditional V8 optimizations.
+  let n = 0;
+  for (let i = 0; i < 2 * numTrials; i++) {
+    // Modify either the first or last byte of the copy of the test buffer.
+    const j = (bufSize - 1) * (i % 2);
+    testBuffer[j] ^= 1 | (i & 0xff);
+    // Call the comparison function and coerce the result into a number.
+    const startTime = process.hrtime.bigint();
+    n += compareFunc(testBuffer, readOnlyTestBuffer);
+    const endTime = process.hrtime.bigint();
+    measurements[i] = Number(endTime - startTime);
+    // Restore the original byte.
+    testBuffer[j] = readOnlyTestBuffer[j];
   }
 
-  const equalBenches = filterOutliers(rawEqualBenches);
-  const unequalBenches = filterOutliers(rawUnequalBenches);
+  // The comparison function should have returned false in every iteration, but
+  // we only check that here to avoid explicit branching above.
+  assert.strictEqual(n, 0);
+
+  // A simple comparison would be fast for even i and slow for odd i.
+  const rawFastBenches = measurements.filter((_, i) => i % 2 === 0);
+  const rawSlowBenches = measurements.filter((_, i) => i % 2 !== 0);
+  const fastBenches = filterOutliers(rawFastBenches);
+  const slowBenches = filterOutliers(rawSlowBenches);
 
   // Use a two-sample t-test to determine whether the timing difference between
   // the benchmarks is statistically significant.
   // https://wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test
 
-  const equalMean = mean(equalBenches);
-  const unequalMean = mean(unequalBenches);
+  const fastMean = mean(fastBenches);
+  const slowMean = mean(slowBenches);
+
+  const fastLen = fastBenches.length;
+  const slowLen = slowBenches.length;
 
-  const equalLen = equalBenches.length;
-  const unequalLen = unequalBenches.length;
+  const combinedStd = combinedStandardDeviation(fastBenches, slowBenches);
+  const standardErr = combinedStd * Math.sqrt(1 / fastLen + 1 / slowLen);
 
-  const combinedStd = combinedStandardDeviation(equalBenches, unequalBenches);
-  const standardErr = combinedStd * Math.sqrt(1 / equalLen + 1 / unequalLen);
+  console.log(JSON.stringify({ rawFastBenches, rawSlowBenches, fastMean, slowMean, t: (fastMean - slowMean) / standardErr }));
 
-  return (equalMean - unequalMean) / standardErr;
+  return (fastMean - slowMean) / standardErr;
 }
 
 // Returns the mean of an array

The variable names fast and slow refer to a simple comparison function that would be fast when the start of the buffer does not match and slow when the end of the buffer does not match. We expect timingSafeEqual to take the same time in both cases, so we expect to find no statistically significant difference between fast and slow when the comparison function is timingSafeEqual.

Stress test on master: 44.9 % failed (build)
Stress test with this patch: 2.3 % failed (build)

Second stress test on master: 29.0 % failed (build)
Second stress test with this patch: 1.3 % failed (build)

So this seems to help, but the failure rate doesn't go down to 0. I added some logging and re-ran the stress test with the above patch (build). I copied the raw measurements (rawFastBenches and rawSlowBenches) from the logging data produced during the first failure (line 6517 in the console output of the build), here it is as a CSV file. (This is just one randomly chosen failure, so all of my findings below might be coincidental or wrong.)

The test script had reported t=15.965637751534858. R produces the same value:

d <- read.csv('fast-slow.csv')
t.test(d$fast, d$slow)

	Welch Two Sample t-test

data:  d$fast and d$slow
t = 15.966, df = 196380, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 55.84268 71.47212
sample estimates:
mean of x mean of y
 9680.232  9616.575

The test does include some code to remove "large outliers" from the measurements, that is, values that are at least 50 times larger than the mean.

max(d$fast) / mean(d$fast)

5.823827

max(d$slow) / mean(d$slow)

8.201985

So none of the measurements are close to being considered "large outliers" by the test.

But there certainly are outliers (which are not removed by the test), and there are certainly also large differences even between successive measurements.

# Relative difference between successive "fast" measurements, as percentages.
rel.diff <- 100 * abs(diff(d$fast)) / d$fast[1:99999]

quantile(rel.diff, probs = c(0.90, 0.99))

     90%      99%
11.68831 41.46492

This means that 1 in 10 measurements differed by at least 11.6 % from the previous measurement, and 1 in 100 measurements differed by at least 41.5 % from the previous measurement.

Plotting all 200,000 data points:

library(ggplot2)
library(reshape2)
library(dplyr)

d$index <- 1:nrow(d)
plottable <- melt(d, id.vars = "index", measure.vars = c("fast", "slow"))

ggplot(plottable, aes(index, value, colour = variable)) + geom_point(alpha = 0.4, size = 1)

It doesn't look that bad. (Most of the fast data points are hidden behind the slow data points, there are 100,000 each.)

There are certainly more outliers in the first part. So let's try the t test again, using the second half of the graph only:

t.test(d$fast[50000:100000], d$slow[50000:100000])

	Welch Two Sample t-test

data:  d$fast[50000:1e+05] and d$slow[50000:1e+05]
t = 3.6058, df = 99338, p-value = 0.0003113
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  7.657665 25.896144
sample estimates:
mean of x mean of y 
 9589.558  9572.781

The test uses T_THRESHOLD = 3.892, so t = 3.6058 is below the threshold. In other words, the second half of the benchmark would have passed the test, but barely. This might indicate some relation to just-in-time optimization (but note that the patched test seems to have a failure rate as low as 2 %).

100 * (mean(d$fast[1:50000]) / mean(d$fast[50000:100000]) - 1)

1.89088

100 * (mean(d$slow[1:50000]) / mean(d$slow[50000:100000]) - 1)

0.9149779

So the fast benchmark was about 2 % slower during the first half than during the second half, and the slow benchmark was about 1 % slower during the second half. This difference is statistically significant:

t.test(d$fast[1:50000], d$fast[50000:100000])

	Welch Two Sample t-test

data:  d$fast[1:50000] and d$fast[50000:1e+05]
t = 30.313, df = 89327, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 169.6027 193.0513
sample estimates:
mean of x mean of y 
 9770.885  9589.558

Given that there is a statistically significant difference between the first and second half of the same benchmark, it is not surprising that there is also a statistically significant difference between the two benchmarks.

The graph also hints at another issue. Let's only consider the fast benchmark, and only consider measurements that are at most one standard deviation away from the mean. In other words, the distribution of measurements around the mean:

ggplot(d, aes(index, fast)) +
  geom_point(alpha=0.1, size=0.5) +
  ylim(mean(d$fast) - sd(d$fast), mean(d$fast) + sd(d$fast)) +
  geom_hline(yintercept = mean(d$fast), color = "red")

This does not look like it's close to being a normal distribution; the measurements below the mean are much closer together than those above the mean.

This becomes more apparent when considering how many samples fall within x times the standard deviation. This is what we should be seeing:

library(purrr)
samples <- rnorm(length(d$fast), mean = mean(d$fast), sd = sd(d$fast))
f <- function(x) length(samples[abs(samples - mean(samples)) <= x * sd(samples)]) / length(samples)
map_dbl(c(0.5, 1, 2, 3), f)

0.38253 0.68167 0.95490 0.99747

But this is what we see:

samples <- d$fast
map_dbl(c(0.5, 1, 2, 3), f)

0.85619 0.89095 0.96773 0.97943

But the t test assumes that both sets of samples are from normal distributions, and since that isn't true, the t test is not necessarily meaningful.

My main takeaway from this (at least for the moment) is that timingSafeEqual is not constant-time. It guarantees that the time it takes does not significantly depend on the input data because the underlying algorithm is a constant-time algorithm (and this is what matters to users), but the function itself does not always take the same amount of time, and the deviations from the expected time do not approximate a normal distribution.

This does not explain why the failures are much more frequent on rhel72-ppc64 than on other machines though. We can probably fine-tune this test to reduce the failure probability enough to re-enable the test, but I am not sure we are addressing the real problem. (cc @Trott)

anonrig · 2023-10-18T01:59:02Z

Appeared on https://ci.nodejs.org/job/node-test-commit-linuxone/40395/

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

PR-URL: nodejs#50232 Refs: nodejs#38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

tniessen · 2024-04-04T17:21:03Z

Refs: #52341

Not sure if this change might make it better or worse.

PR-URL: nodejs/node#50232 Refs: nodejs/node#38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

mhdawson · 2024-04-25T15:46:29Z

Seems to be failing a lot on MacOS now. An example:

https://github.com/nodejs/node/actions/runs/8822600805/job/24221172295?pr=52671

=== release test-crypto-timing-safe-equal-benchmarks ===
Path: pummel/test-crypto-timing-safe-equal-benchmarks
Error: --- stderr ---
node:assert:408
    throw err;
    ^

AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=8.54045763079109)
    at Object.<anonymous> (/Users/runner/work/node/node/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
    at Module._compile (node:internal/modules/cjs/loader:1476:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1555:10)
    at Module.load (node:internal/modules/cjs/loader:1288:32)
    at Module._load (node:internal/modules/cjs/loader:1104:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:191:14)
    at node:internal/main/run_main_module:30:49 {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

PR-URL: #52751 Refs: #38226 Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

lpinca · 2024-05-01T19:46:51Z

b876e00 landed, I'm closing this.

Trott added crypto Issues and PRs related to the crypto subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Apr 13, 2021

Trott added a commit to Trott/io.js that referenced this issue Apr 29, 2021

test: fix flaky test-crypto-timing-safe-dqual-benchmarks

c388d11

Fixes: nodejs#38226

Trott mentioned this issue Apr 29, 2021

test: fix flaky test-crypto-timing-safe-dqual-benchmarks #38476

Closed

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

test: fix flaky test-crypto-timing-safe-dqual-benchmarks

03dcf77

Fixes: nodejs#38226

jasnell closed this as completed in c7ccab3 Apr 30, 2021

Trott reopened this Apr 30, 2021

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

wip: add HMAC to crypto.timingSafeEqual()

8e9466e

Refs: nodejs#38226

Trott mentioned this issue Apr 30, 2021

crypto: add HMAC to crypto.timingSafeEqual() #38488

Closed

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

wip: add HMAC to crypto.timingSafeEqual()

f2a0d3a

Refs: nodejs#38226

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

wip: add HMAC to crypto.timingSafeEqual()

a080a53

Refs: nodejs#38226

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

wip: add HMAC to crypto.timingSafeEqual()

feede9c

Refs: nodejs#38226 (comment)

Trott added a commit to Trott/io.js that referenced this issue Apr 30, 2021

crypto: add HMAC to crypto.timingSafeEqual()

b3c3a0c

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also makes them far more timing safe. Refs: nodejs#38226 (comment) Fixes: nodejs#38226

richardlau mentioned this issue May 1, 2021

[v16.x] deps: V8: Add Power10 to the supported list and enable related features #38489

Closed

This was referenced May 13, 2021

Flaky test pummel/test-crypto-timing-safe-equal-benchmarks on linuxone #38672

Closed

v16.2.0 release proposal #38719

Merged

joyeecheung mentioned this issue May 19, 2021

bootstrap: include vm and contextify binding into the snapshot #38677

Closed

mhdawson mentioned this issue May 19, 2021

src: replace autos in node_contextify.cc #38644

Closed

mhdawson mentioned this issue Sep 9, 2021

src: add option to disable loading native addons #39977

Closed

2 tasks

richardlau mentioned this issue Sep 10, 2021

test: mark test-crypto-timing-safe-equal-benchmarks flaky #40065

Closed

Trott added a commit to Trott/io.js that referenced this issue May 27, 2022

crypto: add HMAC to crypto.timingSafeEqual()

fab3612

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also makes them far more timing safe. Refs: nodejs#38226 (comment) Fixes: nodejs#38226

Trott added a commit to Trott/io.js that referenced this issue May 28, 2022

crypto: add HMAC to crypto.timingSafeEqual()

e3d7835

Add HMAC to crypto.timingSafeEqual(). This makes things slower but also makes them far more timing safe. Refs: nodejs#38226 (comment) Fixes: nodejs#38226

anonrig mentioned this issue Oct 18, 2023

test: set crypto-timing test as flaky #50232

Merged

nodejs-github-bot pushed a commit that referenced this issue Oct 20, 2023

test: set crypto-timing test as flaky

eed33c9

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

targos pushed a commit that referenced this issue Oct 23, 2023

test: set crypto-timing test as flaky

5129bed

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

alexfernandez pushed a commit to alexfernandez/node that referenced this issue Nov 1, 2023

test: set crypto-timing test as flaky

d2456a0

PR-URL: nodejs#50232 Refs: nodejs#38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

targos pushed a commit that referenced this issue Nov 11, 2023

test: set crypto-timing test as flaky

0dce19c

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

targos pushed a commit that referenced this issue Nov 27, 2023

test: set crypto-timing test as flaky

0c05c25

PR-URL: #50232 Refs: #38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

panva mentioned this issue Apr 24, 2024

2024-04-24, Version 22.0.0 (Current) #52505

Merged

sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024

test: set crypto-timing test as flaky

c5c6f33

PR-URL: nodejs/node#50232 Refs: nodejs/node#38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024

test: set crypto-timing test as flaky

4dc01ea

PR-URL: nodejs/node#50232 Refs: nodejs/node#38226 Reviewed-By: Ruy Adorno <ruyadorno@google.com> Reviewed-By: Filip Skokan <panva.ip@gmail.com>

richardlau mentioned this issue Apr 25, 2024

flaky: benchmark/test-benchmark-crypto #52690

Open

RafaelGSS mentioned this issue Apr 29, 2024

test: drop test-crypto-timing-safe-equal-benchmarks #52751

Merged

lpinca closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate flaky test-crypto-timing-safe-equal-benchmarks #38226

investigate flaky test-crypto-timing-safe-equal-benchmarks #38226

Trott commented Apr 13, 2021

Trott commented Apr 29, 2021

Trott commented Apr 29, 2021 •

edited

Trott commented Apr 29, 2021

not-an-aardvark commented Apr 29, 2021 •

edited

Trott commented Apr 29, 2021

Trott commented Apr 29, 2021 •

edited

Trott commented Apr 30, 2021

Trott commented Apr 30, 2021

Trott commented Apr 30, 2021

mhdawson commented May 19, 2021

mhdawson commented May 19, 2021

mhdawson commented Sep 9, 2021

mhdawson commented Sep 9, 2021

richardlau commented Sep 10, 2021

richardlau commented Sep 10, 2021

tniessen commented May 29, 2022

anonrig commented Oct 18, 2023

tniessen commented Apr 4, 2024 •

edited

mhdawson commented Apr 25, 2024

lpinca commented May 1, 2024 •

edited

investigate flaky test-crypto-timing-safe-equal-benchmarks #38226

investigate flaky test-crypto-timing-safe-equal-benchmarks #38226

Comments

Trott commented Apr 13, 2021

Trott commented Apr 29, 2021

Trott commented Apr 29, 2021 • edited

Trott commented Apr 29, 2021

not-an-aardvark commented Apr 29, 2021 • edited

Trott commented Apr 29, 2021

Trott commented Apr 29, 2021 • edited

Trott commented Apr 30, 2021

Trott commented Apr 30, 2021

Trott commented Apr 30, 2021

mhdawson commented May 19, 2021

mhdawson commented May 19, 2021

mhdawson commented Sep 9, 2021

mhdawson commented Sep 9, 2021

richardlau commented Sep 10, 2021

richardlau commented Sep 10, 2021

tniessen commented May 29, 2022

anonrig commented Oct 18, 2023

tniessen commented Apr 4, 2024 • edited

mhdawson commented Apr 25, 2024

lpinca commented May 1, 2024 • edited

Trott commented Apr 29, 2021 •

edited

not-an-aardvark commented Apr 29, 2021 •

edited

Trott commented Apr 29, 2021 •

edited

tniessen commented Apr 4, 2024 •

edited

lpinca commented May 1, 2024 •

edited