IndexHierarchy.astype
does strange things with boolean masks.
#496
Replies: 8 comments 1 reply
-
Many thanks for posting this question. This behavior is correct and is consistent with other selection interfaces, but certainly seems surprising. Boolean selection in StaticFrame is only used if given a Boolean array. Here, we are using lists of bools, which is interpreted as any other selection list: as the "labels" to select. As We can see below that if we use a Boolean array instead of a list, we get the expected results. >>> idx = sf.IndexHierarchy.from_labels([("1", 2), (3, "4"), ('5', '6')], name=(5, 6))
>>> idx.astype[np.array((True, False))](str)
<IndexHierarchy>
1 2
3 4
5 6
<<U1> <object>
>>> idx.astype[np.array((True, True))](str)
<IndexHierarchy>
1 2
3 4
5 6
<<U1> <<U1>
>>> idx.astype[np.array((False, True))](str)
<IndexHierarchy>
1 2
3 4
5 6
<object> <<U1> |
Beta Was this translation helpful? Give feedback.
-
@flexatone
|
Beta Was this translation helpful? Give feedback.
-
@Acexxxxxxxxx : yes, that is expected. In that case we are selecting "columns" 0 and 1 (which is interpreted the same as False and True when given in a list). |
Beta Was this translation helpful? Give feedback.
-
Interesting! I wonder if it is worth having a special case for bools here, as it seems undesirable to treat them as integers in this case. I think then that to change the dtypes of some levels in an index, e.g., convert objects to string, the best approach is to do this: # 1. select types to convert
needs_conversion = idx.dtypes == np.dtype("O")
# 2. pass .values to astype:
converted=idx.astype[needs_conversion.values](str) |
Beta Was this translation helpful? Give feedback.
-
@ForeverWintr , thanks for your comments. Can you elaborate on what you suggest regarding bools? I think I misspoke when I said that (on a depth-2 index) Regarding the "best approach", the following works exactly as you suggest: >>> idx = sf.IndexHierarchy.from_product(('a', 'b'), (1, 2), (True, False))
>>> idx
<IndexHierarchy>
a 1 True
a 1 False
a 2 True
a 2 False
b 1 True
b 1 False
b 2 True
b 2 False
<<U1> <int64> <bool>
>>> idx.astype[(idx.dtypes == bool).values](str)
<IndexHierarchy>
a 1 True
a 1 False
a 2 True
a 2 False
b 1 True
b 1 False
b 2 True
b 2 False
<<U1> <int64> <<U5> |
Beta Was this translation helpful? Give feedback.
-
Your example shows the same thing as mine, right? If I understand what's happening, Re-reading this, I'm not sure if I understood your original comment correctly.
I don't think this is true, as there are cases where a boolean list works as (I) expected. For example, in f = sf.Frame.from_records([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
f.loc[[True, False, True]]
<Frame>
<Index> 0 1 2 <int64>
<Index>
0 1 2 3
2 7 8 9
<int64> <int64> <int64> <int64> I see this is not the case in all selections though. I guess then my suggestion was that static frame treat boolean lists the same as boolean arrays, but at the time I though However, selection with a boolean list works in numpy; should it work in more cases in static frame too? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the example of a list of With NumPy, selection with a list of >>> f = sf.Frame.from_fields((list('abc'), list('def')), columns=(True, False))
>>> f
<Frame>
<Index> True False <bool>
<Index>
0 a d
1 b e
2 c f
<int64> <<U1> <<U1>
>>> f[[False, True]] # a selection list
<Frame>
<Index> False True <bool>
<Index>
0 d a
1 e b
2 f c
<int64> <<U1> <<U1>
>>> f[np.array([False, True])] # a Boolean selection
<Frame>
<Index> False <bool>
<Index>
0 d
1 e
2 f
<int64> <<U1> Putting the responsibility of the caller to make explicit what type of selection they are doing by forcing the usage of Boolean arrays is well within the spirit of StaticFrame's stricter interfaces, I believe. |
Beta Was this translation helpful? Give feedback.
-
Good point about the possibility of Boolean labels! Is this the only case where selecting with an |
Beta Was this translation helpful? Give feedback.
-
Description
Trying to
astype
levels of an IndexHierarchy doesn't seem to work correctly.Example
Given this IH:
This looks like it should astype only the first level, but it astypes both.
This looks like it should astype both levels, but it astypes only the second
Platform
Run the following function (static-frame >= 0.8.1) and provide the results to define your platform and environment:
Beta Was this translation helpful? Give feedback.
All reactions