Update histogram math #13680

Maniktherana · 2024-03-01T12:15:55Z

Addresses #11269
Bucket count is now represented by area rather than height

Height is now = count / range, creating a height proportional to frequency density

@beorn7

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-03-01T12:17:43Z

Screen.Recording.2024-03-01.at.5.46.20.PM.mov

juliusv · 2024-03-01T13:13:32Z

Nice. I'd like @beorn7 to be the one to judge the histogram logic bits, but happy to review / :+1: it from a UI code perspective after that :)

beorn7 · 2024-03-01T16:14:55Z

Great stuff. I'll have a detailed look ASAP (which might be Tuesday). Thank you very much.

beorn7 · 2024-03-05T17:09:05Z

Sorry, drowning in work, it will take another day or so.

beorn7 · 2024-03-07T19:32:52Z

I'm playing with this now. It looks generally very nice, but there seem to be some quirks.

Here my observations (really just as an outside observer, without any insight into the code, as I'm not familiar with React):

The text version is sorted fine now and displays the zero bucket just fine, too.

In the graphical version, the zero bucket is displayed, but "left of zero", while it should be centered on zero (it would probably be fine to only display the half "right of zero" if there are no negative buckets):

(You can see the little tick mark for zero between the bar and the popup.)

With the normalization of the height of the buckets, the number next to the Y axis appear pretty useless. Maybe we can just omit them? Showing the count in the bucket seems to be a fine way. (And there is always the text representation underneath for precise numbers and for those that cannot hover the mouse pointer for accessibility reasons.)

Speaking of the height of the bars, they seem off. See the following two screenshots:

The 1st one highlights a bucket with 44 observations, the 2nd one a bucket with 82 buckets. The numbers are also displayed in the some way in the text part. However, the bucket with count of 44 is higher than the bucket with a count of 82. It should be the other way round, and that by a lot (almost factor 2). Remember, with the exponential rendering, all buckets have the same width in the rendering, so their height should be proportional to their count (because ultimately the area of each bar has to be proportional to the count). So there is effectively no normalization necessary (you could even display the counts directly on the y axis.)

I haven't checked the linear plot in detail yet. Let's first get the exponential plot right. However, I can say that this example app creates a normal distribution of observations. So the linear plot should result in a nice Gauss bell shape. Maybe that's even working, if squinting at

But as said, let's take care of it once the exponential plot works.

Another note for the zero bucket: By default, it is very narrow (and it could be set to a width of zero in some use cases). That means that the normalization for the linear plot would result in an extreme height (or even infinity in case of a zero width zero bucket). There should be some limitation on the displayed height of the zero bucket, maybe something like "display it if it is at most 10x higher than the highest other bucket, but just cut it off at the max height of any other bucket otherwise". When I deliberately put observations in a zero bucket with the narrow default width, I see an overly high bucket even in the exponential plot, which tells me that some normalisation is happening (but it should not, see above):

And here the funny result with a zero bucket of width zero that actually contains a few observations of exactly zero:

Final topic: gaps between buckets. There can be gaps between buckets (or in other word: empty buckets, that just aren't explicitly stated in the query result, because native histograms are "sparse"), and they seem to be displayed nicely in the linear view, but they aren't displayed at all in the exponential view. I think they should. Imagine the gaps are filled with buckets of population zero and following the same exponential bucketing schema. (A more subtle topic is the possible gap between the zero bucket and the first regular bucket, but it's probably fine to not draw this one, at least not in the exponential plot.)

Sorry for all the comments. The zero bucket is really a beast to plot because it breaks the regularity. So I would propose the following plan:

Let's first get the height of the bars solved in the exponential view.
Then let's get the gaps between buckets right in the exponential view.
Then let's figure out how to render the zero bucket properly.
And finally we can find out if the height in the linear plot is correct.

juliusv · 2024-03-07T20:13:52Z

(@beorn7 Side note: I never had the expectation that the area under each bar in the exponential view had to be adjusted to be proportional to the number of observations and was surprised to hear that. Is that a commonly expected thing when displaying exponetial histograms on an exponential axis? If we didn't do that, then the Y-axis would keep making sense...)

beorn7 · 2024-03-07T20:27:46Z

I never had the expectation that the area under each bar in the exponential view had to be adjusted to be proportional to the number of observations and was surprised to hear that

That's the essence of the histogram (as it was used on paper, before computers were invented…).

However, the basic idea with the exponential plot is that every bucket will be rendered with the same width, so now not only the area but also the height is proportional to the bucket counte. And indeed, in that case we can keep the Y axis.

beorn7 · 2024-03-07T20:29:20Z

https://en.wikipedia.org/wiki/Histogram#/media/File:Travel_time_histogram_total_n_Stata.png is the proverbial histogram from Wikipedia . Note the unit on the y axis (people per time interval).

Maniktherana · 2024-03-08T03:56:49Z

With the normalization of the height of the buckets, the number next to the Y axis appear pretty useless. Maybe we can just omit them?

Yes I'll remove them. One of your screenshots show the scales to be in the order of trillion ^ 2 😆 which doesn't seem useful.

In the graphical version, the zero bucket is displayed, but "left of zero", while it should be centered on zero:

this could be fixed by simply updating the position of the 0 tick:

 const zeroAxisLeft =
    scale === 'linear'
      ? ((0 - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
      : (closestToZero.closestIdx / (buckets ? buckets.length : 1)) * 100 + '%';

On an exponential scale, it currently finds the bucket closest to zero and puts the 0 marker on that bucket's lower bounds. This is incorrect, I'll fix it and put the 0 tick on true zero.

(it would probably be fine to only display the half "right of zero" if there are no negative buckets)

This bit is accounted for

{rangeMin < 0 && <div style={{ position: 'absolute', left: zeroAxisLeft }}>0</div>}

The 1st one highlights a bucket with 44 observations, the 2nd one a bucket with 82 buckets. The numbers are also displayed in the some way in the text part. However, the bucket with count of 44 is higher than the bucket with a count of 82. It should be the other way round, and that by a lot (almost factor 2). Remember, with the exponential rendering, all buckets have the same width in the rendering, so their height should be proportional to their count (because ultimately the area of each bar has to be proportional to the count). So there is effectively no normalization necessary (you could even display the counts directly on the y axis.)

Okay so for this one, I made the assumption that these are frequency density histograms. This means that the height is is calculated by count / range (frequency density). I went ahead with this as on multiplying the height with the width, we'd end up with an area that represents the actual count. This means that height is proportional to frequency density, not height.

So if we went ahead an calculated the heights for the buckets you mentioned, you'd see that they're actually consistent:

44 / (0.000030517578125 - 0.0000152587890625) = 28,83,584
82 / (0.00006103515625 - 0.000030517578125) = 26,86,976

here, bucket with count 44 is actually taller than that with count 82.

However, the basic idea with the exponential plot is that every bucket will be rendered with the same width, so now not only the area but also the height is proportional to the bucket counte. And indeed, in that case we can keep the Y axis.

here's my rationale: Yes the widths do appear equal to eachother, but they are on an exponential scale. If we calculate the width of some bucket x on the linear scale, it would have the same value as bucket x on the exponential scale. The values of the range/width are the same, they simply appear differently on each scale.

Because of this, height is some proportion of count/range (frequency density) on both scales.

Do let me know if there are gaps in my theory.

Then let's get the gaps between buckets right in the exponential view

Hmm, yes I'll try and figure something out for this.

However, I can say that this example app creates a normal distribution of observations. So the linear plot should result in a nice Gauss bell shape. Maybe that's even working, if squinting at

That's the same example I'm using. Question tho (my statistics is fuzzy), are these random values supposed to exactly fit in a bell curve on linear scales? or should it look somewhat like a bell curve with some outliers. If yes to the latter, I belive it would then act somewhat like a galton board, and this is what's happening already happening. Again, do let me know if I'm wrong.

Another note for the zero bucket: By default, it is very narrow (and it could be set to a width of zero in some use cases). That means that the normalization for the linear plot would result in an extreme height (or even infinity in case of a zero width zero bucket). There should be some limitation on the displayed height of the zero bucket, maybe something like "display it if it is at most 10x higher than the highest other bucket, but just cut it off at the max height of any other bucket otherwise". When I deliberately put observations in a zero bucket with the narrow default width, I see an overly high bucket even in the exponential plot, which tells me that some normalisation is happening (but it should not, see above):

And here the funny result with a zero bucket of width zero that actually contains a few observations of exactly zero:

I didn't anticipate this 😅. I'll revisit the zero bucket.

beorn7 · 2024-03-13T15:51:08Z

Sorry for the delay in my response.

Let's clarify the scaling of the height of the buckets first:

I think the linear case is pretty clear, and your implementation might be correct already. In the linear case, I think we should not have values on the Y axis, as they will generally be weird to read and understand. The hover info will tell the user about the number of observations in the bucket, and that's good enough. I do believe that, within statistical noise, the graphical appearance of the the histogram in linear view should approach a Gaussian bell curve.

But let's do a final verification of all of that once we have nailed the exponential view, because the latter will be the one that users are using normally.

So now about the exponential view: What you use as input in your calculations is the range directly, but since we have a logarithmic x axis, you also need to take the logarithm of the range. If you do that, you'll find out that now everything cancels and the height of the buckets is proportional to the count in the bucket. That's only the case because the buckets are exponential. (Once custom bucket layouts land here, we have to revisit the rendering.) In other words: The actually displayed area of the bars (or the "numbers of pixels" in the bar) should be proportional to the count in the bucket. So for the current bucket schemas, the exponential view is very simple: height == count, and you can put the count on the y axis.
The only odd one out here is the zero bucket, because it doesn't comply with the logarithmic x axis (it would be infinitely far away from any bucket if we did the x axis logarithmic all the way). That's why you should just render the zero bucket as wide as all the other buckets and again make it's height == count.

Yet another comparison: Imagine a linear bucket layout (one bucket from 0 to 1, next from 1 to 2, next from 2 to 3). If you render such a histogram on a linear x axis, again height would be proportional to count. Exponential buckets rendered on a linear x axis need the multiplication math for their correct area, but if the x axis is logarithmic itself, it all cancels out and you are back to height being proportional to the count.

beorn7 · 2024-03-17T15:10:24Z

Another thought: You could also do the multiplication thing in the exponential view "strictly correct" right now (I assume you have to change the current code just slightly to take into account the logarithmic nature of the x axis by doing an exponentiation at the right place). This will appear needlessly complicated for now, but will help once custom bucket definitions are a thing.

Maniktherana · 2024-03-27T12:53:41Z

Sorry I've been away 😬

@beorn7 I'll get back to you on this soon (before the weekend or so)

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-03-28T11:00:00Z

@beorn7 Again, sorry about the delay

I have now set height proportional to count for the exponential scale and kept everything same for linear. Also removed the y axis labels for linear.

beorn7 · 2024-03-28T13:26:05Z

Height looks good now.

The only thing left for exponential view is to position the bars correctly. I try to summarize our discussion we had on Slack:

I don't know a lot about how to code this all in CSS and Typescript etc., so I make a few fundamental assumption to create some pseudocode. Please fix my assumptions where necessary.

I assume the leftmost point on the x-axis in the rendering is 0 and the rightmost point is 1 (maybe it's something like 0% to 100%, but formulating it in terms of 0 to 1 is easier below).

The "standard bucket width" bw is defined as

bw := abs(log(abs(upper_bound_of_any_bucket)) - log(abs(lower_bound_of_any_bucket)))

This should yield the same result, no matter what bucket you pick, as long as you don't pick the zero bucket. (Note: There is the special case where only the zero bucket is populated, which has to be rendered just as a single bar centered on zero.)

Every bucket will be rendered with this width bw.

What's left is to map bucket boundaries to points on the x axis. For that, let's define the following values:

start_neg := -log(abs(lower_bound_of_lowest_negative_bucket))
end_neg := -log(abs(upper_bound_of_highest_negative_bucket))
start_pos := log(lower_bound_of_lowest_positive_bucket))
end_pos := log(upper_bound_of_highest_positive_bucket))
width_neg := end_neg - start_neg
width_pos := end_pos - start_pos
width_total := width_neg + bw + width_pos

Now it's easy to calculate the position of bucket boundaries etc. on that 0 to 1 scale for the whole panel:

The zero mark is at ( width_neg + 0.5 * bw ) / width_total
The zero bucket starts at ( width_neg ) / width_total
The zero bucket ends at ( width_neg + bw ) / width_total
Any boundary of a negative bucket is it -(log(abs(boundary)) - start_neg) / width_total
Any boundary of a positive bucket is it (log(boundary) - start_pos + bw + width_neg) / width_total

In case negative or positive buckets don't exist, just set their respective start and end value to zero, and things should just work out.

beorn7 · 2024-03-28T13:30:06Z

Note that the above always puts the zero bucket directly adjacent to the highest negative bucket and the lowest positive bucket, no matter if there is a gap in between. I think that's fine (as in most cases there will be a gap anyway, and it might be more confusing to show it than to gloss over it), but we can revisit that decision if needed.

bboreham · 2024-05-14T11:42:01Z

Hello from the bug-scrub! Do you think you will come back to this @Maniktherana ?

Maniktherana · 2024-05-14T12:06:32Z

Hello from the bug-scrub! Do you think you will come back to this @Maniktherana ?

Hey there, yes I'm working on it now 😬

Signed-off-by: Manik Rana <manikrana54@gmail.com>

beorn7

What happens with these changes? (I haven't tried myself… 🦆)

beorn7 · 2024-05-15T16:40:28Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+  const minPositive = buckets ? parseFloat(buckets[closestToZero.closestIdx + 1][1]) : 0;
+  const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;
+  const minNegative = buckets ? parseFloat(buckets[0][1]) : 0;
+  const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg


Let's do the following like in my latest update:

Suggested change

const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg

const startNegative = buckets ? -Math.log(Math.abs(minNegative)) : 0; //start_neg

beorn7 · 2024-05-15T16:40:39Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+  const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;
+  const minNegative = buckets ? parseFloat(buckets[0][1]) : 0;
+  const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg
+  const endNegative = buckets ? Math.log(Math.abs(maxNegative)) : 0; //end_neg


Suggested change

const endNegative = buckets ? Math.log(Math.abs(maxNegative)) : 0; //end_neg

const endNegative = buckets ? -Math.log(Math.abs(maxNegative)) : 0; //end_neg

beorn7 · 2024-05-15T16:41:20Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+  const startPositive = buckets ? Math.log(minPositive) : 0; //start_pos
+  const endPositive = buckets ? Math.log(maxPositive) : 0; //end_pos
+
+  const widthNegative = startNegative - endNegative; //width_neg


With the changes above, this becomes again the more intuitive:

Suggested change

const widthNegative = startNegative - endNegative; //width_neg

const widthNegative = endNegative - startNegative; //width_neg

beorn7 · 2024-05-15T16:42:58Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+        Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][2]))) -
+          Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][1])))


If there are only negative buckets, or even only a zero bucket, this will break (just a note for the final code).

beorn7 · 2024-05-15T16:47:39Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+          scale === 'linear'
+            ? ((left - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
+            : left < 0
+            ? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary


With recent updates and fixed parenthesis:

Suggested change

? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary

? -(Math.log(Math.abs(left)) - startNegative) / widthTotal * 100 + '%' // negative buckets boundary

beorn7 · 2024-05-15T16:48:02Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+            ? ((left - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
+            : left < 0
+            ? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary
+            : (Math.log(left) - startPositive + bw + widthNegative / widthTotal) * 100 + '%'; // positive buckets boundary


Suggested change

: (Math.log(left) - startPositive + bw + widthNegative / widthTotal) * 100 + '%'; // positive buckets boundary

: (Math.log(left) - startPositive + bw + widthNegative) / widthTotal * 100 + '%'; // positive buckets boundary

Maniktherana · 2024-05-15T17:19:55Z

What happens with these changes? (I haven't tried myself… 🦆)

getting closer
negative buckets are now appearing on the chart
the leftmost bucket is at left: 55.5556%; width: 2.77778%;

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-05-15T17:27:41Z

Apologies, missed the last suggested change. changing the formula for positive buckets boundary is having this weird effect where negative buckets are overflowing to the right.

Screen.Recording.2024-05-15.at.10.55.48.PM.mov

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-05-15T17:36:23Z

- ? -(Math.log(Math.abs(left)) - startNegative) / widthTotal * 100 + '%' // negative buckets boundary
+ ? -(Math.log(Math.abs(left)) + startNegative) / widthTotal * 100 + '%' // negative buckets boundary

forgot to account for this lil bit. looks like it fixed the negative buckets position

beorn7 · 2024-05-16T12:13:30Z

Things look quite good.

So this is a perfect example, as far as I can see:

The zero bucket is unpopulated here, but rendered correctly as an empty bucket centered on the 0 x-axis mark.

However, with fewer and less populated buckets, things look weirdly off:

The zero tick is not centered (although there is a centered line visible), and the lowest positive bucket is rendered over the highest negative bucket (the one closest to zero). Note that in this example, the buckets are not contiguous. There is a gap between the lowest and second lowest positive bucket. Maybe that's triggering some weird bug.

Here is an example with an actually populated zero bucket (and, for extra difficulty, the zero bucket does not conicide with a usual bucket boundary, so that the buckets next to the zero bucket have a different width):

It seem that the "closest to zero" detection has a problem. In different news: This is a case where the buckets do not all have the same width, even without custom bucket boundaries. But maybe we don't need to handle this quite special case.

Maniktherana · 2024-05-16T12:26:41Z

In this last case, are they truly of different width or are they overlapping?

Also looks like the zero bucket goes from negative to positive ranges. Don't think that one is explicitly handled

beorn7 · 2024-05-16T12:20:02Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+  const minPositive = buckets ? parseFloat(buckets[closestToZero.closestIdx + 1][1]) : 0;
+  const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;


If the zero bucket is not populated, it isn't included in the buckets list, so the closestIdx ± 1 will fail.

beorn7 · 2024-05-16T12:22:16Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+      }
+    }
+
+    return { closest, closestIdx };


You are not really using the returned closest, are you?

I think this should instead return the index of the closest-to-zero negative bucket and the index of the closest-to-zero positive bucket separately, so you don't need the ±1 magic above (which fails if there is no zero bucket in buckets).

beorn7 · 2024-05-16T12:35:46Z

web/ui/react-app/src/pages/graph/HistogramChart.tsx

+  const widthPositive = endPositive - startPositive; //width_pos
+  const widthTotal = widthNegative + expBucketWidth + widthPositive; //width_total
+
+  function findClosestToZero(numbers: number[]): closestToZeroType {


I suspect you need to pass in the full buckets (or the upper limit for the negative buckets and the lower limit for the positive buckets), and you must not get confused by the zero bucket (which might or might not exist).

beorn7 · 2024-05-16T12:37:32Z

In this last case, are they truly of different width or are they overlapping?

They are truly different width. Buckets are never really overlapping.

Also looks like the zero bucket goes from negative to positive ranges. Don't think that one is explicitly handled

Yes, that's the very definition of the zero bucket (it goes from positive to negative, or it is just zero with zero width).

beorn7 · 2024-05-16T12:39:14Z

Another idea for findClosestToZero: Let it just find the zero bucket (the one that goes from negative to positive), and return its lower and upper limit. If there is no zero bucket, or its width is zero, just return {0, 0}. Then the other parts of the math should just work, and you would even render a possible gap between the zero bucket and the first positive/negative bucket correctly. (The name of the function should then be changed, maybe to findZeroBucket).

Maniktherana · 2024-05-16T13:20:06Z

In this last case, are they truly of different width or are they overlapping?

They are truly different width. Buckets are never really overlapping.

In the case I'm getting, they're overlapping. It can be verified on inspecting the element. Not sure in which case a bucketWidth may change as we're passing a single source of truth defined from the parent component (that is, bw and width_total are defined once and supplied to all buckets). Still, I'll try to see if I get a case where the bucket widths aren't equal

Maniktherana · 2024-05-16T13:21:19Z

Another idea for findClosestToZero: Let it just find the zero bucket (the one that goes from negative to positive), and return its lower and upper limit. If there is no zero bucket, or its width is zero, just return {0, 0}. Then the other parts of the math should just work, and you would even render a possible gap between the zero bucket and the first positive/negative bucket correctly. (The name of the function should then be changed, maybe to findZeroBucket).

agreed, closestToZero was made with a flawed premise and needs to be updated

Maniktherana · 2024-05-16T14:02:13Z

Another interesting observation: if there are a certain amount of buckets, for example 20, then we get bw = 100/20 = 5%. And this is expected

however, for cases like where number of buckets = 7, we should be getting bw = 100/7 = 14.28% or so but I'm getting 16.6667%. And I believe this also affects the left positioning for buckets around and including the zero bucket.

beorn7 · 2024-05-16T14:06:15Z

WRT overlapping zero bucket: If you look at the numbers in your screenshot above, you'll see that no buckets are overlapping.

This is a "feature" of the zero bucket. If you set it such that it doesn't coincide with the "natural" bucket boundary of the neighboring buckets, those buckets are cut short, and that's the one case where you get smaller buckets with an otherwise standard exponential schema.

There are custom buckets in the making, and those will have variable bucket length anyway. So eventually, we have to handle them. Right now, we can ignore it for now, but we need to get that special case around the zero bucket solved. I think with the formulas developed so far, it should "just work", and the artifacts we see are caused by some implementation problems.

Maniktherana · 2024-05-16T14:09:10Z

WRT overlapping zero bucket: If you look at the numbers in your screenshot above, you'll see that no buckets are overlapping.

I'm not saying that the ranges of buckets are overlapping. I mean on the chart itself they are physically overlapping. the middle one has 2 buckets. and on the console to the right you'll see 2 elements have the same left position. in this chart there should be 7 buckets all adjascent to eachother.

beorn7 · 2024-05-16T14:11:56Z

Yes, and I think that needs to be fixed (and it's unexpected from the math).

Maniktherana · 2024-05-16T14:16:59Z

Yes, and I think that needs to be fixed (and it's unexpected from the math).

I played around a bit and found that updating this nudged it in the right direction:

-const widthTotal = widthNegative + expBucketWidth + widthPositive; //width_total
+const widthTotal = widthNegative + 3 * expBucketWidth + widthPositive; //width_total

beorn7 · 2024-05-16T15:04:57Z

But isn't that just ad-hoc compensation for a mistake that we have done elsewhere?

I think with a fixed closestToZero, based on the bounds of the zero bucket, things should all fall into place nicely.

Maniktherana · 2024-05-16T16:08:58Z

But isn't that just ad-hoc compensation for a mistake that we have done elsewhere?

I think with a fixed closestToZero, based on the bounds of the zero bucket, things should all fall into place nicely.

You are correct. I also believe the same when it comes to positioning the buckets. Zero bucket calculation might also be affecting the widths

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-05-29T09:23:15Z

@beorn7 updated some calculations:

minPositive and maxNegative now use zero bucket's index if available
Updated zero marker position
if zero bucket exists, it is positioned at (width_neg) / width_total

It fixes some things but but theres still the issue of these overlapped buckets:

but in some cases, theres no such issue:

finally, sometimes a bucket may have ranges with very small boundaries (order of 10^-8 or so) and such buckets seem to occupy random positions. in this case, the highlighted bucket should be the right side of the zero marker:

beorn7 · 2024-05-30T18:47:45Z

At first glance, I would say your first screenshot is already the correct behavior. Maybe we should do the "frequency density thing" in that case, but maybe not. A zero bucket overlapping with the neighboring buckets is a scenario that should be avoided generally, and I'm not sure there is a good way of treating it graphically. I will look at the details ASAP to come to a conclusion (which will be early next week, hopefully).

I will also look into the issue with the last screenshot then. Floating point precision issues are probably hard to avoid in certain situations, and maybe they even play out a bit differently in the Javascript VM than in a native program… However, 1e-8 isn't that small in terms of IEEE 754. As said, I'll try to get to the bottom of it.

And if anyone beats me to it before the start of next week, even better.

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana · 2024-05-31T06:33:24Z

At first glance, I would say your first screenshot is already the correct behavior. Maybe we should do the "frequency density thing" in that case, but maybe not. A zero bucket overlapping with the neighboring buckets is a scenario that should be avoided generally, and I'm not sure there is a good way of treating it graphically. I will look at the details ASAP to come to a conclusion (which will be early next week, hopefully).

We made the assumption that once we calculate bw for one bucket, it would be the same width for each of the bars. Now excluding the zero bucket (as thats a sepcial case) I found out that the buckets adjascent to the zero bucket in the first screenshot actually don't have the same bucket width as the rest of the bars. And you are correct, that screenshot is graphically correct if we calculate bw for each of the buckets (except zero bucket).

So I went ahead and calculated the true bw for all buckets (except zero bucket) and this is what I got:

Its the same as the first screenshot, only now the buckets dont overlap in the HTML

Maniktherana added 4 commits March 1, 2024 11:23

refac: histogram count is represented by area instead of height

3e350d7

Signed-off-by: Manik Rana <manikrana54@gmail.com>

refac: remove unused code

db33e7b

Signed-off-by: Manik Rana <manikrana54@gmail.com>

chore: add commentds

0ed061b

Signed-off-by: Manik Rana <manikrana54@gmail.com>

chore: update comments

6b0d2e3

Signed-off-by: Manik Rana <manikrana54@gmail.com>

Maniktherana requested a review from juliusv as a code owner March 1, 2024 12:15

Maniktherana marked this pull request as draft March 1, 2024 12:21

beorn7 self-requested a review March 1, 2024 16:14

Maniktherana marked this pull request as ready for review March 7, 2024 18:07

beorn7 mentioned this pull request Mar 19, 2024

ui (histograms): Render Native Histograms in Graph view #11268

Open

refac: use count as height for exponential scale

d70f4be

Signed-off-by: Manik Rana <manikrana54@gmail.com>

broken: try out new formulae

8ab6569

Signed-off-by: Manik Rana <manikrana54@gmail.com>

beorn7 requested changes May 15, 2024

View reviewed changes

Maniktherana added 2 commits May 15, 2024 22:50

fix(broken): update formulae

7068830

Signed-off-by: Manik Rana <manikrana54@gmail.com>

chore(broken): apply suggested changes

50b1b10

Signed-off-by: Manik Rana <manikrana54@gmail.com>

fix: update negative bucket boundary

d9a66f3

Signed-off-by: Manik Rana <manikrana54@gmail.com>

beorn7 requested changes May 16, 2024

View reviewed changes

feat: add proper zero bucket

5d33e31

Signed-off-by: Manik Rana <manikrana54@gmail.com>

feat: experimental bucket width calculation

ee71a57

Signed-off-by: Manik Rana <manikrana54@gmail.com>

	const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg
	const startNegative = buckets ? -Math.log(Math.abs(minNegative)) : 0; //start_neg

	const endNegative = buckets ? Math.log(Math.abs(maxNegative)) : 0; //end_neg
	const endNegative = buckets ? -Math.log(Math.abs(maxNegative)) : 0; //end_neg

	const widthNegative = startNegative - endNegative; //width_neg
	const widthNegative = endNegative - startNegative; //width_neg

		Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][2]))) -
		Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][1])))

	? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary
	? -(Math.log(Math.abs(left)) - startNegative) / widthTotal * 100 + '%' // negative buckets boundary

	: (Math.log(left) - startPositive + bw + widthNegative / widthTotal) * 100 + '%'; // positive buckets boundary
	: (Math.log(left) - startPositive + bw + widthNegative) / widthTotal * 100 + '%'; // positive buckets boundary

		const minPositive = buckets ? parseFloat(buckets[closestToZero.closestIdx + 1][1]) : 0;
		const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;

Update histogram math #13680

Are you sure you want to change the base?

Update histogram math #13680

Conversation

Maniktherana commented Mar 1, 2024

Maniktherana commented Mar 1, 2024

juliusv commented Mar 1, 2024

beorn7 commented Mar 1, 2024

beorn7 commented Mar 5, 2024

beorn7 commented Mar 7, 2024

juliusv commented Mar 7, 2024

beorn7 commented Mar 7, 2024

beorn7 commented Mar 7, 2024

Maniktherana commented Mar 8, 2024 • edited

beorn7 commented Mar 13, 2024

beorn7 commented Mar 17, 2024

Maniktherana commented Mar 27, 2024

Maniktherana commented Mar 28, 2024

beorn7 commented Mar 28, 2024 • edited

beorn7 commented Mar 28, 2024

bboreham commented May 14, 2024

Maniktherana commented May 14, 2024

beorn7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Maniktherana commented May 15, 2024

Maniktherana commented May 15, 2024

Maniktherana commented May 15, 2024

beorn7 commented May 16, 2024

Maniktherana commented May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beorn7 commented May 16, 2024

beorn7 commented May 16, 2024

Maniktherana commented May 16, 2024

Maniktherana commented May 16, 2024 • edited

Maniktherana commented May 16, 2024

beorn7 commented May 16, 2024

Maniktherana commented May 16, 2024 • edited

beorn7 commented May 16, 2024

Maniktherana commented May 16, 2024

beorn7 commented May 16, 2024

Maniktherana commented May 16, 2024

Maniktherana commented May 29, 2024 • edited

beorn7 commented May 30, 2024

Maniktherana commented May 31, 2024

Maniktherana commented Mar 8, 2024 •

edited

beorn7 commented Mar 28, 2024 •

edited

Maniktherana commented May 16, 2024 •

edited

Maniktherana commented May 16, 2024 •

edited

Maniktherana commented May 16, 2024 •

edited

Maniktherana commented May 29, 2024 •

edited