Is the mechanism for controlling whether `hist` replaces dimensions too indirect? #3435

SimonHeybrock · 2024-04-30T03:59:10Z

According to the docstring:

When histogramming a dimension with an existing dimension-coord, the binning for
the dimension is modified, i.e., the input and the output will have the same
dimension labels.

When histogramming by non-dimension-coords, the output will have new dimensions
given by the names of these coordinates. These new dimensions replace the
dimensions the input coordinates depend on.

In practice this means that:

A prior use of transform_coords with or without the rename_dims option affects the outcome of a subsequent hist.
It is possible to indirectly control which dimensions are to be removed, as shown in the example below, by renaming and/or flattening dimensions:

import scipp as sc

table = sc.data.table_xyz(1000)
binned = table.bin(x=3, y=4)  # sizes {'x': 3, 'y': 4}
binned.rename_dims(y='z').hist(z=5)  # sizes {'x': 3, 'z': 5}
binned.flatten(to='z').hist(z=5)  # sizes {'z': 5}

The mechanism was introduced since it allows the algorithm to either add a new dimension, or replace an existing dimension. But is it too confusing when working with multi-dimensional data?

Would it suffice to improve the docstring (I am thinking of adding concrete examples on how to control the behavior), or do we need to think of something else?

Note that related functions such as bin are also affected.

The text was updated successfully, but these errors were encountered:

nvaytet · 2024-05-07T12:18:59Z

I do find it a little unpredictable sometimes.

Would it be less confusing if the dims you specified inside hist(...) would always be what you get as an output?
e.g. the output of table.bin(x=3, z=4).hist(z=5) would have sizes {'z': 5}.
If you want to keep the x dim, you'd have to do table.bin(x=3, z=4).hist(x=3, z=5) -> sizes {'x': 3, 'z': 5}.

It could check if the dims requested for x are the same and not re-do the binning in x?

However, that wouldn't really work if you've manually specified bins in x, because you'd have to specify them again, which would be annoying...

SimonHeybrock · 2024-05-07T12:25:22Z

That would not work, since you would always have to look up existing binning if you want to keep it, and we would need to add code that detects if the user-specified binning is the same as the existing one (to avoid re-doing the work), which is likely going to break all the time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the mechanism for controlling whether `hist` replaces dimensions too indirect? #3435

Is the mechanism for controlling whether `hist` replaces dimensions too indirect? #3435

SimonHeybrock commented Apr 30, 2024

nvaytet commented May 7, 2024

SimonHeybrock commented May 7, 2024

Is the mechanism for controlling whether hist replaces dimensions too indirect? #3435

Is the mechanism for controlling whether hist replaces dimensions too indirect? #3435

Comments

SimonHeybrock commented Apr 30, 2024

nvaytet commented May 7, 2024

SimonHeybrock commented May 7, 2024

Is the mechanism for controlling whether `hist` replaces dimensions too indirect? #3435

Is the mechanism for controlling whether `hist` replaces dimensions too indirect? #3435