Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Scippneutron's elem_unit and elem_dtype to Scipp? #3391

Open
nvaytet opened this issue Feb 13, 2024 · 5 comments
Open

Move Scippneutron's elem_unit and elem_dtype to Scipp? #3391

nvaytet opened this issue Feb 13, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@nvaytet
Copy link
Member

nvaytet commented Feb 13, 2024

Can we move Scippneutron's elem_unit and elem_dtype to Scipp?
They are quite useful.

See also #3029

@nvaytet nvaytet added the enhancement New feature or request label Feb 13, 2024
@SimonHeybrock
Copy link
Member

I was actually wondering if unit and dtype should just resolve to the elem-unit and elem-dtype?

@jl-wynen
Copy link
Member

Pretty sure the dtype shouldn't. Otherwise, code that checks dtypes also has to check for data.bins is None in many places.

@SimonHeybrock
Copy link
Member

Pretty sure the dtype shouldn't. Otherwise, code that checks dtypes also has to check for data.bins is None in many places.

I can't think of many examples where we are doing a check for the exotic dtypes you get from binned data, can you? And for those that do, maybe a more dedicated method or property would be a viable alternative?

@jl-wynen
Copy link
Member

My argument is the other way around. Using dtype == float would be true for binned float data.

@SimonHeybrock
Copy link
Member

SimonHeybrock commented Feb 14, 2024

My argument is the other way around. Using dtype == float would be true for binned float data.

Isn't that what we want, in most situations?

But I suppose the question is: If binned data is not described by the dtype (and not by the shape), which property describes it? Is it a third property (the bins property?), or must it necessarily either be part of dtype or shape? A good summary was recently written by @jpivarski in scikit-hep/ragged#6 (see bullet points in the middle of the original post).

I think it is worth thinking about this again. While it is true that we make binned data part of the dtype, this is not necessarily done consistently, since we have found it more than inconvenient in several places (e.g., astype, and user code adding elem_dtype, triggering this discussion). The bins property avoids making it part of the shape, but one could argue that it mostly replaces a "dim of unknown length". Scipp currently uses the bins property to perform actions along the internal bin dimensions:

# da.sizes == {'x': 2, 'y':3}
da = da.bins.sum()

but one could just as well write

# da.sizes == {'x': 2, 'y':3, 'event':None}
da = da.sum('event')

Now, it is unfortunately not as easy as that, since we almost always use a DataArray as the content, i.e., we also have da.bins.coords['x'] in addition to da.coords['x'], so it is not clear if thinking about this purely in terms of the shape is adequate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants