Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update histogram math #13680

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Maniktherana
Copy link
Contributor

Addresses #11269
Bucket count is now represented by area rather than height

Height is now = count / range, creating a height proportional to frequency density

@beorn7

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

Screen.Recording.2024-03-01.at.5.46.20.PM.mov

@Maniktherana Maniktherana marked this pull request as draft March 1, 2024 12:21
@juliusv
Copy link
Member

juliusv commented Mar 1, 2024

Nice. I'd like @beorn7 to be the one to judge the histogram logic bits, but happy to review / :+1: it from a UI code perspective after that :)

@beorn7 beorn7 self-requested a review March 1, 2024 16:14
@beorn7
Copy link
Member

beorn7 commented Mar 1, 2024

Great stuff. I'll have a detailed look ASAP (which might be Tuesday). Thank you very much.

@beorn7
Copy link
Member

beorn7 commented Mar 5, 2024

Sorry, drowning in work, it will take another day or so.

@Maniktherana Maniktherana marked this pull request as ready for review March 7, 2024 18:07
@beorn7
Copy link
Member

beorn7 commented Mar 7, 2024

I'm playing with this now. It looks generally very nice, but there seem to be some quirks.

Here my observations (really just as an outside observer, without any insight into the code, as I'm not familiar with React):

The text version is sorted fine now and displays the zero bucket just fine, too.

In the graphical version, the zero bucket is displayed, but "left of zero", while it should be centered on zero (it would probably be fine to only display the half "right of zero" if there are no negative buckets):
image
(You can see the little tick mark for zero between the bar and the popup.)

With the normalization of the height of the buckets, the number next to the Y axis appear pretty useless. Maybe we can just omit them? Showing the count in the bucket seems to be a fine way. (And there is always the text representation underneath for precise numbers and for those that cannot hover the mouse pointer for accessibility reasons.)

Speaking of the height of the bars, they seem off. See the following two screenshots:
image
image

The 1st one highlights a bucket with 44 observations, the 2nd one a bucket with 82 buckets. The numbers are also displayed in the some way in the text part. However, the bucket with count of 44 is higher than the bucket with a count of 82. It should be the other way round, and that by a lot (almost factor 2). Remember, with the exponential rendering, all buckets have the same width in the rendering, so their height should be proportional to their count (because ultimately the area of each bar has to be proportional to the count). So there is effectively no normalization necessary (you could even display the counts directly on the y axis.)

I haven't checked the linear plot in detail yet. Let's first get the exponential plot right. However, I can say that this example app creates a normal distribution of observations. So the linear plot should result in a nice Gauss bell shape. Maybe that's even working, if squinting at
image

But as said, let's take care of it once the exponential plot works.

Another note for the zero bucket: By default, it is very narrow (and it could be set to a width of zero in some use cases). That means that the normalization for the linear plot would result in an extreme height (or even infinity in case of a zero width zero bucket). There should be some limitation on the displayed height of the zero bucket, maybe something like "display it if it is at most 10x higher than the highest other bucket, but just cut it off at the max height of any other bucket otherwise". When I deliberately put observations in a zero bucket with the narrow default width, I see an overly high bucket even in the exponential plot, which tells me that some normalisation is happening (but it should not, see above):
image

And here the funny result with a zero bucket of width zero that actually contains a few observations of exactly zero:

image

Final topic: gaps between buckets. There can be gaps between buckets (or in other word: empty buckets, that just aren't explicitly stated in the query result, because native histograms are "sparse"), and they seem to be displayed nicely in the linear view, but they aren't displayed at all in the exponential view. I think they should. Imagine the gaps are filled with buckets of population zero and following the same exponential bucketing schema. (A more subtle topic is the possible gap between the zero bucket and the first regular bucket, but it's probably fine to not draw this one, at least not in the exponential plot.)

Sorry for all the comments. The zero bucket is really a beast to plot because it breaks the regularity. So I would propose the following plan:

  • Let's first get the height of the bars solved in the exponential view.
  • Then let's get the gaps between buckets right in the exponential view.
  • Then let's figure out how to render the zero bucket properly.
  • And finally we can find out if the height in the linear plot is correct.

@juliusv
Copy link
Member

juliusv commented Mar 7, 2024

(@beorn7 Side note: I never had the expectation that the area under each bar in the exponential view had to be adjusted to be proportional to the number of observations and was surprised to hear that. Is that a commonly expected thing when displaying exponetial histograms on an exponential axis? If we didn't do that, then the Y-axis would keep making sense...)

@beorn7
Copy link
Member

beorn7 commented Mar 7, 2024

I never had the expectation that the area under each bar in the exponential view had to be adjusted to be proportional to the number of observations and was surprised to hear that

That's the essence of the histogram (as it was used on paper, before computers were invented…).

However, the basic idea with the exponential plot is that every bucket will be rendered with the same width, so now not only the area but also the height is proportional to the bucket counte. And indeed, in that case we can keep the Y axis.

@beorn7
Copy link
Member

beorn7 commented Mar 7, 2024

https://en.wikipedia.org/wiki/Histogram#/media/File:Travel_time_histogram_total_n_Stata.png is the proverbial histogram from Wikipedia . Note the unit on the y axis (people per time interval).

@Maniktherana
Copy link
Contributor Author

Maniktherana commented Mar 8, 2024

With the normalization of the height of the buckets, the number next to the Y axis appear pretty useless. Maybe we can just omit them?

Yes I'll remove them. One of your screenshots show the scales to be in the order of trillion ^ 2 😆 which doesn't seem useful.

In the graphical version, the zero bucket is displayed, but "left of zero", while it should be centered on zero:

this could be fixed by simply updating the position of the 0 tick:

 const zeroAxisLeft =
    scale === 'linear'
      ? ((0 - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
      : (closestToZero.closestIdx / (buckets ? buckets.length : 1)) * 100 + '%';

On an exponential scale, it currently finds the bucket closest to zero and puts the 0 marker on that bucket's lower bounds. This is incorrect, I'll fix it and put the 0 tick on true zero.

(it would probably be fine to only display the half "right of zero" if there are no negative buckets)

This bit is accounted for

{rangeMin < 0 && <div style={{ position: 'absolute', left: zeroAxisLeft }}>0</div>}

The 1st one highlights a bucket with 44 observations, the 2nd one a bucket with 82 buckets. The numbers are also displayed in the some way in the text part. However, the bucket with count of 44 is higher than the bucket with a count of 82. It should be the other way round, and that by a lot (almost factor 2). Remember, with the exponential rendering, all buckets have the same width in the rendering, so their height should be proportional to their count (because ultimately the area of each bar has to be proportional to the count). So there is effectively no normalization necessary (you could even display the counts directly on the y axis.)

Okay so for this one, I made the assumption that these are frequency density histograms. This means that the height is is calculated by count / range (frequency density). I went ahead with this as on multiplying the height with the width, we'd end up with an area that represents the actual count. This means that height is proportional to frequency density, not height.

So if we went ahead an calculated the heights for the buckets you mentioned, you'd see that they're actually consistent:

44 / (0.000030517578125 - 0.0000152587890625) = 28,83,584
82 / (0.00006103515625 - 0.000030517578125) = 26,86,976

here, bucket with count 44 is actually taller than that with count 82.

However, the basic idea with the exponential plot is that every bucket will be rendered with the same width, so now not only the area but also the height is proportional to the bucket counte. And indeed, in that case we can keep the Y axis.

here's my rationale: Yes the widths do appear equal to eachother, but they are on an exponential scale. If we calculate the width of some bucket x on the linear scale, it would have the same value as bucket x on the exponential scale. The values of the range/width are the same, they simply appear differently on each scale.

Because of this, height is some proportion of count/range (frequency density) on both scales.

Do let me know if there are gaps in my theory.

Then let's get the gaps between buckets right in the exponential view

Hmm, yes I'll try and figure something out for this.

However, I can say that this example app creates a normal distribution of observations. So the linear plot should result in a nice Gauss bell shape. Maybe that's even working, if squinting at

That's the same example I'm using. Question tho (my statistics is fuzzy), are these random values supposed to exactly fit in a bell curve on linear scales? or should it look somewhat like a bell curve with some outliers. If yes to the latter, I belive it would then act somewhat like a galton board, and this is what's happening already happening. Again, do let me know if I'm wrong.

Another note for the zero bucket: By default, it is very narrow (and it could be set to a width of zero in some use cases). That means that the normalization for the linear plot would result in an extreme height (or even infinity in case of a zero width zero bucket). There should be some limitation on the displayed height of the zero bucket, maybe something like "display it if it is at most 10x higher than the highest other bucket, but just cut it off at the max height of any other bucket otherwise". When I deliberately put observations in a zero bucket with the narrow default width, I see an overly high bucket even in the exponential plot, which tells me that some normalisation is happening (but it should not, see above):

And here the funny result with a zero bucket of width zero that actually contains a few observations of exactly zero:

I didn't anticipate this 😅. I'll revisit the zero bucket.

@beorn7
Copy link
Member

beorn7 commented Mar 13, 2024

Sorry for the delay in my response.

Let's clarify the scaling of the height of the buckets first:

I think the linear case is pretty clear, and your implementation might be correct already. In the linear case, I think we should not have values on the Y axis, as they will generally be weird to read and understand. The hover info will tell the user about the number of observations in the bucket, and that's good enough. I do believe that, within statistical noise, the graphical appearance of the the histogram in linear view should approach a Gaussian bell curve.

But let's do a final verification of all of that once we have nailed the exponential view, because the latter will be the one that users are using normally.

So now about the exponential view: What you use as input in your calculations is the range directly, but since we have a logarithmic x axis, you also need to take the logarithm of the range. If you do that, you'll find out that now everything cancels and the height of the buckets is proportional to the count in the bucket. That's only the case because the buckets are exponential. (Once custom bucket layouts land here, we have to revisit the rendering.) In other words: The actually displayed area of the bars (or the "numbers of pixels" in the bar) should be proportional to the count in the bucket. So for the current bucket schemas, the exponential view is very simple: height == count, and you can put the count on the y axis.
The only odd one out here is the zero bucket, because it doesn't comply with the logarithmic x axis (it would be infinitely far away from any bucket if we did the x axis logarithmic all the way). That's why you should just render the zero bucket as wide as all the other buckets and again make it's height == count.

Yet another comparison: Imagine a linear bucket layout (one bucket from 0 to 1, next from 1 to 2, next from 2 to 3). If you render such a histogram on a linear x axis, again height would be proportional to count. Exponential buckets rendered on a linear x axis need the multiplication math for their correct area, but if the x axis is logarithmic itself, it all cancels out and you are back to height being proportional to the count.

@beorn7
Copy link
Member

beorn7 commented Mar 17, 2024

Another thought: You could also do the multiplication thing in the exponential view "strictly correct" right now (I assume you have to change the current code just slightly to take into account the logarithmic nature of the x axis by doing an exponentiation at the right place). This will appear needlessly complicated for now, but will help once custom bucket definitions are a thing.

@Maniktherana
Copy link
Contributor Author

Sorry I've been away 😬

@beorn7 I'll get back to you on this soon (before the weekend or so)

Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

@beorn7 Again, sorry about the delay

I have now set height proportional to count for the exponential scale and kept everything same for linear. Also removed the y axis labels for linear.

Screenshot 2024-03-28 at 4 00 15 PM Screenshot 2024-03-28 at 4 00 22 PM

@beorn7
Copy link
Member

beorn7 commented Mar 28, 2024

Height looks good now.

The only thing left for exponential view is to position the bars correctly. I try to summarize our discussion we had on Slack:

I don't know a lot about how to code this all in CSS and Typescript etc., so I make a few fundamental assumption to create some pseudocode. Please fix my assumptions where necessary.

I assume the leftmost point on the x-axis in the rendering is 0 and the rightmost point is 1 (maybe it's something like 0% to 100%, but formulating it in terms of 0 to 1 is easier below).

The "standard bucket width" bw is defined as

bw := abs(log(abs(upper_bound_of_any_bucket)) - log(abs(lower_bound_of_any_bucket)))

This should yield the same result, no matter what bucket you pick, as long as you don't pick the zero bucket. (Note: There is the special case where only the zero bucket is populated, which has to be rendered just as a single bar centered on zero.)

Every bucket will be rendered with this width bw.

What's left is to map bucket boundaries to points on the x axis. For that, let's define the following values:

start_neg := -log(abs(lower_bound_of_lowest_negative_bucket))
end_neg := -log(abs(upper_bound_of_highest_negative_bucket))
start_pos := log(lower_bound_of_lowest_positive_bucket))
end_pos := log(upper_bound_of_highest_positive_bucket))
width_neg := end_neg - start_neg
width_pos := end_pos - start_pos
width_total := width_neg + bw + width_pos

Now it's easy to calculate the position of bucket boundaries etc. on that 0 to 1 scale for the whole panel:

  • The zero mark is at ( width_neg + 0.5 * bw ) / width_total
  • The zero bucket starts at ( width_neg ) / width_total
  • The zero bucket ends at ( width_neg + bw ) / width_total
  • Any boundary of a negative bucket is it -(log(abs(boundary)) - start_neg) / width_total
  • Any boundary of a positive bucket is it (log(boundary) - start_pos + bw + width_neg) / width_total

In case negative or positive buckets don't exist, just set their respective start and end value to zero, and things should just work out.

@beorn7
Copy link
Member

beorn7 commented Mar 28, 2024

Note that the above always puts the zero bucket directly adjacent to the highest negative bucket and the lowest positive bucket, no matter if there is a gap in between. I think that's fine (as in most cases there will be a gap anyway, and it might be more confusing to show it than to gloss over it), but we can revisit that decision if needed.

@bboreham
Copy link
Member

Hello from the bug-scrub! Do you think you will come back to this @Maniktherana ?

@Maniktherana
Copy link
Contributor Author

Hello from the bug-scrub! Do you think you will come back to this @Maniktherana ?

Hey there, yes I'm working on it now 😬

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with these changes? (I haven't tried myself… 🦆)

const minPositive = buckets ? parseFloat(buckets[closestToZero.closestIdx + 1][1]) : 0;
const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;
const minNegative = buckets ? parseFloat(buckets[0][1]) : 0;
const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the following like in my latest update:

Suggested change
const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg
const startNegative = buckets ? -Math.log(Math.abs(minNegative)) : 0; //start_neg

const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;
const minNegative = buckets ? parseFloat(buckets[0][1]) : 0;
const startNegative = buckets ? Math.log(Math.abs(minNegative)) : 0; //start_neg
const endNegative = buckets ? Math.log(Math.abs(maxNegative)) : 0; //end_neg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const endNegative = buckets ? Math.log(Math.abs(maxNegative)) : 0; //end_neg
const endNegative = buckets ? -Math.log(Math.abs(maxNegative)) : 0; //end_neg

const startPositive = buckets ? Math.log(minPositive) : 0; //start_pos
const endPositive = buckets ? Math.log(maxPositive) : 0; //end_pos

const widthNegative = startNegative - endNegative; //width_neg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes above, this becomes again the more intuitive:

Suggested change
const widthNegative = startNegative - endNegative; //width_neg
const widthNegative = endNegative - startNegative; //width_neg

Comment on lines 22 to 23
Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][2]))) -
Math.log(Math.abs(parseFloat(buckets[buckets.length - 1][1])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are only negative buckets, or even only a zero bucket, this will break (just a note for the final code).

scale === 'linear'
? ((left - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
: left < 0
? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With recent updates and fixed parenthesis:

Suggested change
? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary
? -(Math.log(Math.abs(left)) - startNegative) / widthTotal * 100 + '%' // negative buckets boundary

? ((left - rangeMin) / (rangeMax - rangeMin)) * 100 + '%'
: left < 0
? (Math.log(Math.abs(left)) - startNegative / widthTotal) * 100 + '%' // negative buckets boundary
: (Math.log(left) - startPositive + bw + widthNegative / widthTotal) * 100 + '%'; // positive buckets boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: (Math.log(left) - startPositive + bw + widthNegative / widthTotal) * 100 + '%'; // positive buckets boundary
: (Math.log(left) - startPositive + bw + widthNegative) / widthTotal * 100 + '%'; // positive buckets boundary

@Maniktherana
Copy link
Contributor Author

What happens with these changes? (I haven't tried myself… 🦆)

getting closer
negative buckets are now appearing on the chart
the leftmost bucket is at left: 55.5556%; width: 2.77778%;

Screenshot 2024-05-15 at 10 48 12 PM

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

Apologies, missed the last suggested change. changing the formula for positive buckets boundary is having this weird effect where negative buckets are overflowing to the right.

Screenshot 2024-05-15 at 10 52 27 PM
Screen.Recording.2024-05-15.at.10.55.48.PM.mov

Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

- ? -(Math.log(Math.abs(left)) - startNegative) / widthTotal * 100 + '%' // negative buckets boundary
+ ? -(Math.log(Math.abs(left)) + startNegative) / widthTotal * 100 + '%' // negative buckets boundary

forgot to account for this lil bit. looks like it fixed the negative buckets position
Screenshot 2024-05-15 at 11 05 08 PM

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

Things look quite good.

So this is a perfect example, as far as I can see:

image

The zero bucket is unpopulated here, but rendered correctly as an empty bucket centered on the 0 x-axis mark.

However, with fewer and less populated buckets, things look weirdly off:

image

The zero tick is not centered (although there is a centered line visible), and the lowest positive bucket is rendered over the highest negative bucket (the one closest to zero). Note that in this example, the buckets are not contiguous. There is a gap between the lowest and second lowest positive bucket. Maybe that's triggering some weird bug.

Here is an example with an actually populated zero bucket (and, for extra difficulty, the zero bucket does not conicide with a usual bucket boundary, so that the buckets next to the zero bucket have a different width):

image

It seem that the "closest to zero" detection has a problem. In different news: This is a case where the buckets do not all have the same width, even without custom bucket boundaries. But maybe we don't need to handle this quite special case.

@Maniktherana
Copy link
Contributor Author

Maniktherana commented May 16, 2024

In this last case, are they truly of different width or are they overlapping?

Also looks like the zero bucket goes from negative to positive ranges. Don't think that one is explicitly handled

Comment on lines 49 to 50
const minPositive = buckets ? parseFloat(buckets[closestToZero.closestIdx + 1][1]) : 0;
const maxNegative = buckets ? parseFloat(buckets[closestToZero.closestIdx - 1][2]) : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the zero bucket is not populated, it isn't included in the buckets list, so the closestIdx ± 1 will fail.

}
}

return { closest, closestIdx };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not really using the returned closest, are you?

I think this should instead return the index of the closest-to-zero negative bucket and the index of the closest-to-zero positive bucket separately, so you don't need the ±1 magic above (which fails if there is no zero bucket in buckets).

const widthPositive = endPositive - startPositive; //width_pos
const widthTotal = widthNegative + expBucketWidth + widthPositive; //width_total

function findClosestToZero(numbers: number[]): closestToZeroType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you need to pass in the full buckets (or the upper limit for the negative buckets and the lower limit for the positive buckets), and you must not get confused by the zero bucket (which might or might not exist).

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

In this last case, are they truly of different width or are they overlapping?

They are truly different width. Buckets are never really overlapping.

Also looks like the zero bucket goes from negative to positive ranges. Don't think that one is explicitly handled

Yes, that's the very definition of the zero bucket (it goes from positive to negative, or it is just zero with zero width).

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

Another idea for findClosestToZero: Let it just find the zero bucket (the one that goes from negative to positive), and return its lower and upper limit. If there is no zero bucket, or its width is zero, just return {0, 0}. Then the other parts of the math should just work, and you would even render a possible gap between the zero bucket and the first positive/negative bucket correctly. (The name of the function should then be changed, maybe to findZeroBucket).

@Maniktherana
Copy link
Contributor Author

In this last case, are they truly of different width or are they overlapping?

They are truly different width. Buckets are never really overlapping.

Screenshot 2024-05-16 at 6 47 11 PM

In the case I'm getting, they're overlapping. It can be verified on inspecting the element. Not sure in which case a bucketWidth may change as we're passing a single source of truth defined from the parent component (that is, bw and width_total are defined once and supplied to all buckets). Still, I'll try to see if I get a case where the bucket widths aren't equal

@Maniktherana
Copy link
Contributor Author

Maniktherana commented May 16, 2024

Another idea for findClosestToZero: Let it just find the zero bucket (the one that goes from negative to positive), and return its lower and upper limit. If there is no zero bucket, or its width is zero, just return {0, 0}. Then the other parts of the math should just work, and you would even render a possible gap between the zero bucket and the first positive/negative bucket correctly. (The name of the function should then be changed, maybe to findZeroBucket).

agreed, closestToZero was made with a flawed premise and needs to be updated

@Maniktherana
Copy link
Contributor Author

Another interesting observation: if there are a certain amount of buckets, for example 20, then we get bw = 100/20 = 5%. And this is expected

however, for cases like where number of buckets = 7, we should be getting bw = 100/7 = 14.28% or so but I'm getting 16.6667%. And I believe this also affects the left positioning for buckets around and including the zero bucket.

Screenshot 2024-05-16 at 7 32 05 PM

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

WRT overlapping zero bucket: If you look at the numbers in your screenshot above, you'll see that no buckets are overlapping.

This is a "feature" of the zero bucket. If you set it such that it doesn't coincide with the "natural" bucket boundary of the neighboring buckets, those buckets are cut short, and that's the one case where you get smaller buckets with an otherwise standard exponential schema.

There are custom buckets in the making, and those will have variable bucket length anyway. So eventually, we have to handle them. Right now, we can ignore it for now, but we need to get that special case around the zero bucket solved. I think with the formulas developed so far, it should "just work", and the artifacts we see are caused by some implementation problems.

@Maniktherana
Copy link
Contributor Author

Maniktherana commented May 16, 2024

WRT overlapping zero bucket: If you look at the numbers in your screenshot above, you'll see that no buckets are overlapping.

I'm not saying that the ranges of buckets are overlapping. I mean on the chart itself they are physically overlapping. the middle one has 2 buckets. and on the console to the right you'll see 2 elements have the same left position. in this chart there should be 7 buckets all adjascent to eachother.

Screenshot 2024-05-16 at 7 37 12 PM

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

Yes, and I think that needs to be fixed (and it's unexpected from the math).

@Maniktherana
Copy link
Contributor Author

Yes, and I think that needs to be fixed (and it's unexpected from the math).

I played around a bit and found that updating this nudged it in the right direction:

-const widthTotal = widthNegative + expBucketWidth + widthPositive; //width_total
+const widthTotal = widthNegative + 3 * expBucketWidth + widthPositive; //width_total

@beorn7
Copy link
Member

beorn7 commented May 16, 2024

But isn't that just ad-hoc compensation for a mistake that we have done elsewhere?

I think with a fixed closestToZero, based on the bounds of the zero bucket, things should all fall into place nicely.

@Maniktherana
Copy link
Contributor Author

But isn't that just ad-hoc compensation for a mistake that we have done elsewhere?

I think with a fixed closestToZero, based on the bounds of the zero bucket, things should all fall into place nicely.

You are correct. I also believe the same when it comes to positioning the buckets. Zero bucket calculation might also be affecting the widths

Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

Maniktherana commented May 29, 2024

@beorn7 updated some calculations:

  • minPositive and maxNegative now use zero bucket's index if available
  • Updated zero marker position
  • if zero bucket exists, it is positioned at (width_neg) / width_total

It fixes some things but but theres still the issue of these overlapped buckets:

Screenshot 2024-05-29 at 2 43 20 PM

but in some cases, theres no such issue:
Screenshot 2024-05-29 at 2 50 54 PM

finally, sometimes a bucket may have ranges with very small boundaries (order of 10^-8 or so) and such buckets seem to occupy random positions. in this case, the highlighted bucket should be the right side of the zero marker:

Screenshot 2024-05-29 at 2 51 28 PM

@beorn7
Copy link
Member

beorn7 commented May 30, 2024

At first glance, I would say your first screenshot is already the correct behavior. Maybe we should do the "frequency density thing" in that case, but maybe not. A zero bucket overlapping with the neighboring buckets is a scenario that should be avoided generally, and I'm not sure there is a good way of treating it graphically. I will look at the details ASAP to come to a conclusion (which will be early next week, hopefully).

I will also look into the issue with the last screenshot then. Floating point precision issues are probably hard to avoid in certain situations, and maybe they even play out a bit differently in the Javascript VM than in a native program… However, 1e-8 isn't that small in terms of IEEE 754. As said, I'll try to get to the bottom of it.

And if anyone beats me to it before the start of next week, even better.

Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

At first glance, I would say your first screenshot is already the correct behavior. Maybe we should do the "frequency density thing" in that case, but maybe not. A zero bucket overlapping with the neighboring buckets is a scenario that should be avoided generally, and I'm not sure there is a good way of treating it graphically. I will look at the details ASAP to come to a conclusion (which will be early next week, hopefully).

We made the assumption that once we calculate bw for one bucket, it would be the same width for each of the bars. Now excluding the zero bucket (as thats a sepcial case) I found out that the buckets adjascent to the zero bucket in the first screenshot actually don't have the same bucket width as the rest of the bars. And you are correct, that screenshot is graphically correct if we calculate bw for each of the buckets (except zero bucket).

So I went ahead and calculated the true bw for all buckets (except zero bucket) and this is what I got:
Screenshot 2024-05-31 at 12 01 09 PM

Its the same as the first screenshot, only now the buckets dont overlap in the HTML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants