DOC: Return type for quantiles seems to depend on quantile method. #22323

aschaffer · 2022-09-21T18:21:17Z

Describe the issue:

Per quantile dox:

"If the input contains integers or floats smaller than float64, the output data-type is float64. Otherwise, the output data-type is the same as that of the input."

For example, for an integer source array, the result should be converted to float64. Regardless of the selected method.

However,

arr1 = np.array([1,2,2,40,1,1,2,1,0,10,3,3,40,15,3,7,5,4,7,3,5,1,0,9], dtype = int)
qs_arr = np.array([0.001, 0.37, 0.42, 0.67, 0.83, 0.99, 0.39, 0.49, 0.5])

r1 = np.quantile(arr1, qs_arr, method = 'inverted_cdf')
r1.dtype
dtype('int64')

r2 = np.quantile(arr1, qs_arr, method = 'interpolated_inverted_cdf')
r2.dtype
dtype('float64')

There's no mention in the dox that the output type depends on the selected method.

Reproduce the code example:

import numpy as np
arr1 = np.array([1,2,2,40,1,1,2,1,0,10,3,3,40,15,3,7,5,4,7,3,5,1,0,9], dtype = int)
qs_arr = np.array([0.001, 0.37, 0.42, 0.67, 0.83, 0.99, 0.39, 0.49, 0.5])

r1 = np.quantile(arr1, qs_arr, method = 'inverted_cdf')
r1.dtype

r2 = np.quantile(arr1, qs_arr, method = 'interpolated_inverted_cdf')
r2.dtype

Error message:

r1.dtype != r2.dtype
True

NumPy/Python version information:

1.23.0 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]

Context for the issue:

No response

seberg · 2022-09-21T19:38:10Z

Thanks for the note @aschaffer, that is indeed incorrect. We fixed things up in gh-19857 (and followups). IIRC there was a tiny bit of back and forth here also over the time (at least for boolean inputs).

In any case, the comment is correct for all interpolating/continuous methods, I believe (should double check the code maybe). What is important is that all methods that give a "discontiguous" results have no interpolation and we (now?) retain the input dtype faithfully.

seberg · 2022-09-21T20:44:09Z

OK, double checking the rule for interpolated values: the actual "rule" is more complicated and drops implicitly out of the interpolation calculation.

However, unless you care about object or longdouble dtype (for q), the rule seems correct as stated.
I think the only niche difference is np.quantile([1, 2, 3], 1) (or 0) with q being integral). However, that seems very niche, and I do not think the exact behavior in that case can be considered "specified" or fixed.

aschaffer · 2022-09-21T21:40:08Z

It's probably worth noting that for discontinuous methods who return one or the other end of the interval within which the quantile input falls, it might make sense to return results of same type as source array. But for continuous methods; or, discontinuous methods that could return mid-intervals (e.g., averaged-inverted-cdf) some conversion to floating point is necessary, if the source array is integer(-like).

seberg · 2022-09-22T07:16:05Z

For the discontinuous methods we "always" return the same dtype as the input. But, at least average_inverted_cdf method is actually both discontinuous and interpolated (sorry, I had forgotten about that). So when I say "discontinuous" above, that one is not included, because the important thing it is also "interpolated" (to a degree).

For the interpolated ones, we take into account the dtype of q in a bit of a round-about way. That somewhat makes sense (NumPy rarely ignores a dtype), but could be disputed and even changed.

Luckily, numerical types it mainly leads to that upcast to float64 with the only odd case being q=0 and q=1 and I can live with that being considered "undefined"...

aschaffer added the 00 - Bug label Sep 21, 2022

seberg added 04 - Documentation and removed 00 - Bug labels Sep 21, 2022

seberg changed the title ~~BUG: Return type for quantiles seems to depend on quantile method.~~ DOC: Return type for quantiles seems to depend on quantile method. Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Return type for quantiles seems to depend on quantile method. #22323

DOC: Return type for quantiles seems to depend on quantile method. #22323

aschaffer commented Sep 21, 2022

seberg commented Sep 21, 2022

seberg commented Sep 21, 2022

aschaffer commented Sep 21, 2022

seberg commented Sep 22, 2022

DOC: Return type for quantiles seems to depend on quantile method. #22323

DOC: Return type for quantiles seems to depend on quantile method. #22323

Comments

aschaffer commented Sep 21, 2022

Describe the issue:

Reproduce the code example:

Error message:

NumPy/Python version information:

Context for the issue:

seberg commented Sep 21, 2022

seberg commented Sep 21, 2022

aschaffer commented Sep 21, 2022

seberg commented Sep 22, 2022