Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With privacy.resistFingerprinting set to true in Firefox, Speedometer shows the result Infinity #399

Open
julienw opened this issue Mar 13, 2024 · 9 comments · May be fixed by #406
Open
Assignees
Labels

Comments

@julienw
Copy link
Contributor

julienw commented Mar 13, 2024

This was reported by some people on social networks. We found that setting privacy.resistFingerprinting to true was triggering this. But this might happen in some other cases too.

When doing it, some subtests take 0ms on every run, this may mess up the results.
Any chance we can fix this?

@bgrins
Copy link
Contributor

bgrins commented Mar 13, 2024

The simplest fix here IMO would be to detect an infinity score and instead show an error message

@bgrins
Copy link
Contributor

bgrins commented Mar 13, 2024

I could imagine a warmup step where it looks for anomalies and shows a warning, like MotionMark does with refresh rate, but this case is likely rare in practice so a guardrail on the score display may be sufficient.

@julienw
Copy link
Contributor Author

julienw commented Mar 13, 2024

But I don't understand why one single substep being at 0 would yield an Infinity result. I'm not great at statistics, but I smell something fishy here.

@camillobruni camillobruni self-assigned this Mar 13, 2024
@julienw
Copy link
Contributor Author

julienw commented Mar 13, 2024

(update: I got it wrong, I rewrote the comment)

But I don't understand why one single substep being at 0 would yield an Infinity result.

The problem isn't a single substep, but all substeps for 1 test being 0. I've seen it happening for TodoMVC-Preact-Complex-DOM for example, but svelte-complex and lit-complex could happen too.

The geomean is a product, so a suite at 0 poisons the geomean for this run.
Then I think that we geomean the geomeans to get the result for all iterations.
Then a score is 1000 / geomean, so we get Infinity.

The wikipedia page for geomean says:

The geometric mean can be an unreliable measure of central tendency for a dataset where one or more values are extremely close to zero in comparison to the other members of the dataset.

Therefore I wonder if the use of geomean is appropriate.
Geomeaning the scores between runs, or the tests between runs, makes sense. But geomeaning the various tests together, I'm not so sure. I guess this ship has sailed for v3 though.

A quick fix might be to replace 0 by 1 when adding values in

const suiteTotal = this._measuredValues.tests[suiteName].total;
product *= suiteTotal;

(possibly also in
add(value) {
if (typeof value !== "number")
throw new Error(`Adding invalid value=${value} to metric=${this.name}`);
this.values.push(value);
}
but after a quick look I don't think the geomean calculated here is actually used).

What do you think @camillobruni ?

@julienw
Copy link
Contributor Author

julienw commented Mar 13, 2024

Note: I also noticed errors with NaN values in the SVG rendering of the Details view, I believe this comes from the same issue.

@julienw
Copy link
Contributor Author

julienw commented Mar 13, 2024

I find our score computation very difficult to follow, where we do the same thing in different ways in different files: benchmark-runner.mjs vs metric.mjs vs Statistics.mjs, with no clear comment about the formula we use to calculate the score. It's very unclear to understand where the geomean that we use as the basis for the score is computed from.

@camillobruni
Copy link
Contributor

You are absolutely right :) I kind of kept some backwards compatibility with v2.1 but ultimately we should all migrate to metrics.mjs (the Statistics.mjs is from the original v.21 version as to no alter any existing scoring). So

Geomean over all the different tests makes sense to not favour the longer running ones disproportionally (we could spend some effort to make each test run the same length in the future?).
For everything else, the arithmeticMean makes sense IMO.

Given that this is something that should never happen, we could also just fail and not display anything if the sum of all subtests equal to 0.

@julienw
Copy link
Contributor Author

julienw commented Mar 13, 2024

Yeah indeed, 0 is an absurd number in this case. Is it practical to ignore the test instead of failing generally ?

@rniwa
Copy link
Member

rniwa commented Mar 13, 2024

The point of using geometric mean across suites is to give each suite the same weight. If we used arithmetic mean, we would give more weights to the long running / slow suites.

@julienw julienw added the v3.1 label Mar 20, 2024
@camillobruni camillobruni linked a pull request Apr 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants