PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

anntzer · 2021-08-17T13:53:43Z

This PR builds on top of #19618 (to avoid a tricky rebase) and #19042 (which this PR would have fixed more or less naturally anyways, so I may as well credit @DFEvans for the test he wrote in that PR). (#19618 has now been merged, so this is now ready for review.)

The general idea is as follows:

First, we treat each row of loadtxt's input as a single item of a structured dtype with no nested structured dtypes, with as many fields as needed. If loadtxt was given a scalar dtype, then the structured dtype is constructed by creating as many fields (each with that scalar dtype) as there are columns; if a structured dtype was requested, then first flatten the dtype (as explained in genfromtxt fails when a non-contiguous dtype is requested #19623, but correctly taking offsets into account) and repeat the fields as needed. (Note that this also fixes a previous bug, whereby loading e.g. "0 1 2 3 4\n5 6 7 8 9" with dtype=[("a", int), ("b", int)] would return [[(0, 1), (2, 3)], [(5, 6), (7, 8)]] and silently drop the last column -- the old behavior seems clearly buggy.)
Once the whole array is read, we then .view() back to the actually requested dtype. This implies an extraneous copy if the requested dtype has .hasobject = True (which would be fixed by ENH: Make it possible to call .view on object arrays #8514); I believe that that case is rare enough to be ignored for now (and a fix is possible anyways).
In itself, this is much faster (~30%) for loading actual structured dtypes (by skipping the recursive packer), somewhat faster (~5-10%) for large loads (>10_000 rows, perhaps because shape inference of the final array is faster?), and much slower (nearly 2x) for very small loads (10 rows) or for reads using dtype=object; however, the main point is to allow the next points.
Then, we take advantage of the possibility of assigning a tuple of strs to a structured dtype with e.g. float fields, and have the strs be implicitly converted to floats by numpy at the C-level. (A Python-level fallback is kept to support e.g. hex floats.) Together with the previous commit, this provides a massive speedup (~2x on the loadtxt_dtypes_csv benchmark for 10_000+ ints or floats), but is beneficial with as little as 100 rows. Very small reads (10 rows) are still slower (nearly 2x for object), as well as reads using object dtypes (still due to the extra copy), but the tradeoff seems, again, worthwhile.
Finally, using structured dtypes provides a small extra advantage, in that they implicitly check the number of fields in the input, and thus allow skipping the len(words) == ncols check; even that is a ~5% speedup for the largest loads (100_000 rows) of numeric scalar types.

Overall, the benchmarks (compared to #19618) shown below indicate a >2x speedup for large reads of simple numeric types, and a slowdown of very small reads (~1.5x for 10 rows) or reads of object arrays (~10% for large reads, ~2x for small ones). But small reads are fast anyways and reading into object arrays is, again, likely rare.

       before           after         ratio
     [45f9118f]       [4b7b0f46]
     <loadtxtusecols>       <_wip/loadtxtflatdtype>
+      26.8±0.2μs       51.8±0.2μs     1.93  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
+      29.7±0.2μs       50.6±0.3μs     1.71  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
+      29.2±0.2μs       46.2±0.2μs     1.58  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10)
+      30.3±0.2μs       47.3±0.3μs     1.56  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
+      29.3±0.2μs       45.6±0.2μs     1.56  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10)
+      29.2±0.1μs       45.5±0.1μs     1.56  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10)
+      30.2±0.2μs       44.6±0.4μs     1.47  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10)
+      32.2±0.6μs      44.7±0.09μs     1.39  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10)
+       134±0.3μs          169±1μs     1.26  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
+      63.9±0.6μs       77.4±0.3μs     1.21  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20)
+       159±0.6μs          177±1μs     1.12  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
+       134±0.5ms        149±0.4ms     1.11  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
+     12.6±0.04ms       13.5±0.2ms     1.07  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
-         435±2μs          374±2μs     0.86  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
-         592±2μs        495±0.9μs     0.84  bench_io.LoadtxtReadUint64Integers.time_read_uint64(550)
-         591±3μs          493±3μs     0.84  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550)
-     5.48±0.04ms      4.51±0.02ms     0.82  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
-     1.06±0.01ms          870±6μs     0.82  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000)
-     1.06±0.01ms          868±7μs     0.82  bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000)
-     10.4±0.04ms      8.38±0.09ms     0.80  bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000)
-     4.16±0.02ms      3.33±0.01ms     0.80  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
-      42.1±0.2ms       33.6±0.1ms     0.80  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000)
-      10.6±0.1ms      8.31±0.01ms     0.78  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000)
-         168±1μs          130±2μs     0.78  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
-       158±0.9μs          120±1μs     0.76  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100)
-       159±0.6μs          119±1μs     0.75  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100)
-       160±0.7μs          119±1μs     0.74  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100)
-         198±2ms          139±3ms     0.70  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000)
-         220±3ms          153±3ms     0.70  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0)
-       154±0.3ms        107±0.6ms     0.69  bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
-         219±3ms          152±3ms     0.69  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500)
-       168±0.6μs          114±1μs     0.68  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100)
-     7.30±0.06ms      4.66±0.02ms     0.64  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])
-     15.3±0.09ms       9.68±0.2ms     0.63  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
-     8.83±0.05ms      5.58±0.03ms     0.63  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
-       161±0.6ms         94.9±2ms     0.59  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
-         193±5μs        113±0.8μs     0.59  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100)
-      14.7±0.1ms      8.43±0.06ms     0.57  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000)
-      14.6±0.1ms      8.36±0.02ms     0.57  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000)
-     14.6±0.07ms      8.27±0.06ms     0.57  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000)
-       155±0.8ms       82.8±0.3ms     0.53  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000)
-       156±0.9ms         82.9±1ms     0.53  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000)
-       156±0.4ms       82.7±0.6ms     0.53  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000)
-     15.4±0.05ms      7.85±0.07ms     0.51  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000)
-       161±0.5ms       79.2±0.5ms     0.49  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000)
-      18.1±0.2ms      7.88±0.07ms     0.44  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000)
-       188±0.5ms       78.4±0.9ms     0.42  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000)

Still, if one includes the earlier speedups to loadtxt that I posted recently, even these cases are faster than previously (see below), so I'll apply the credit of these earlier PRs towards this one :)

       before           after         ratio
     [a1ee7968]       [4b7b0f46]
     <_pushme/loadtxtlencheck~9>       <_wip/loadtxtflatdtype>
-      55.2±0.4μs       50.6±0.4μs     0.92  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
-      51.6±0.1μs       45.1±0.2μs     0.87  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10)
-      53.3±0.6μs       46.5±0.4μs     0.87  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10)
-      52.7±0.3μs       45.6±0.3μs     0.87  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10)
-      53.5±0.2μs       45.9±0.5μs     0.86  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10)
-      57.5±0.3μs       47.5±0.4μs     0.83  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
-      93.4±0.6μs         77.0±1μs     0.82  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20)
-      54.9±0.2μs       44.8±0.2μs     0.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10)
-         686±2μs          379±2μs     0.55  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
-         321±1μs          173±2μs     0.54  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
-         952±4μs          495±4μs     0.52  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550)
-         347±2μs          180±2μs     0.52  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
-     6.51±0.01ms      3.33±0.01ms     0.51  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
-         960±5μs          491±3μs     0.51  bench_io.LoadtxtReadUint64Integers.time_read_uint64(550)
-      66.1±0.3ms       33.6±0.2ms     0.51  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000)
-     1.72±0.01ms          857±6μs     0.50  bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000)
-     9.23±0.04ms      4.58±0.04ms     0.50  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
-     1.74±0.01ms          858±9μs     0.49  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000)
-     17.0±0.03ms      8.22±0.05ms     0.48  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000)
-     17.0±0.09ms      8.23±0.07ms     0.48  bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000)
-       313±0.8ms          150±3ms     0.48  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
-      30.1±0.2ms       13.7±0.3ms     0.46  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
-         330±1ms          150±3ms     0.46  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
-      32.7±0.3ms       14.4±0.2ms     0.44  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
-         259±1ms        107±0.4ms     0.41  bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
-         363±1ms        136±0.8ms     0.37  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000)
-         399±2ms        148±0.2ms     0.37  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500)
-         404±3ms        149±0.6ms     0.37  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0)
-       335±0.5μs          119±2μs     0.36  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100)
-         334±3μs        118±0.6μs     0.35  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100)
-         321±2μs          114±2μs     0.35  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100)
-         336±3μs          118±1μs     0.35  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100)
-         374±5μs        127±0.5μs     0.34  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
-       352±0.5μs        113±0.9μs     0.32  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100)
-     18.4±0.05ms      5.58±0.08ms     0.30  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
-     16.0±0.08ms      4.67±0.06ms     0.29  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])
-      31.6±0.2ms       8.48±0.1ms     0.27  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000)
-      31.2±0.2ms      8.33±0.09ms     0.27  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000)
-      31.4±0.2ms      8.29±0.09ms     0.26  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000)
-     35.8±0.04ms       9.36±0.2ms     0.26  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
-      30.5±0.1ms       7.96±0.1ms     0.26  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000)
-         325±1ms         83.3±2ms     0.26  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000)
-       363±0.9ms         92.8±1ms     0.26  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
-         311±2ms         79.3±2ms     0.25  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000)
-         324±1ms       82.4±0.7ms     0.25  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000)
-       325±0.4ms       82.5±0.5ms     0.25  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000)
-      33.2±0.2ms      7.97±0.08ms     0.24  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000)
-       343±0.7ms         78.8±2ms     0.23  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000)

charris · 2021-08-23T18:21:02Z

numpy/lib/npyio.py

-]
+# These converters only ever get str (not bytes) as input.
+_CONVERTER_DICT = {
+    np.bool_: int,  # Implicitly converted to bool.


Is this correct? Booleans are only allowed values of 0 or 1.

The point is that we only need to cast the str to an int, and then we can let the dtype machinery do the int->np.bool_ cast (and np.bool_(42) == np.bool_(True)).

Closes numpy#17277. If loadtxt is passed an unsized string or byte dtype, the size is set automatically from the longest entry in the first 50000 lines. If longer entries appeared later, they were silently truncated.

This is much faster (~30%) for loading actual structured dtypes (by skipping the recursive packer), somewhat faster (~5-10%) for large loads (>10000 rows, perhaps because shape inference of the final array is faster?), and much slower (nearly 2x) for very small loads (10 rows) or for reads using `dtype=object` (due to the extraneous limitation on object views, which could be fixed separately); however, the main point is to allow further optimizations.

This patch takes advantage of the possibility of assigning a tuple of *strs* to a structured dtype with e.g. float fields, and have the strs be implicitly converted to floats by numpy at the C-level. (A Python-level fallback is kept to support e.g. hex floats.) Together with the previous commit, this provides a massive speedup (~2x on the loadtxt_dtypes_csv benchmark for 10_000+ ints or floats), but is beneficial with as little as 100 rows. Very small reads (10 rows) are still slower (nearly 2x for object), as well as reads using object dtypes (due to the extra copy), but the tradeoff seems worthwhile.

In the fast-path of loadtxt, the conversion to np.void implicitly checks the number of fields. Removing the explicit length check saves ~5% for the largest loads (100_000 rows) of numeric scalar types.

anntzer · 2021-09-06T07:25:46Z

Kindly bumping.

anntzer · 2021-09-16T20:31:35Z

@charris Anything I can do to move this forward? Thanks! (Sorry, I'm picking on you as you already left a review comment :-))

seberg · 2021-09-22T19:36:08Z

Mainly bringing it up in case it interests you, @anntzer. But part of the reason the momentum has stalled a bit here is that @rossbar and I have been looking to pushing forward npreadtext with the goal of replacing np.loadtxt by moving it to C: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/X4AU2DUDDNA44HTEFDQXJLC24E6MDEW3/

That does not have to stand in the way here though. But, it would give us a good speed-up and additionally allows supporting new features, such as quote='"' (and other csv.Dialect features in the future), or even user provided C-parsers.

anntzer · 2021-09-22T22:20:43Z

That sounds great; I guess it depends on the timescale over which you think npreadtext will make it into numpy. From a quick test npreadtext is faster than loadtxt even with the improvements here, but I am slightly worried that including a large chunk of C may take a while to review (wearing my matplotlib dev hat here), whereas the PR here may be faster to review (although it also involves some tricks).

So if you think npreadtext can be merged relatively quickly (wrt. numpy's release schedule), I am fine with closing this PR and its followup; otherwise, perhaps it can still go in as a temporary stopgap improvement.

seberg · 2022-01-16T22:24:59Z

Closing, as superseded by gh-20580, I don't think it is helpful to keep this open, unless the other PR gets rejected (which at this point seems a very long shot, I think it is finished and good).

anntzer force-pushed the loadtxtflatdtype branch 3 times, most recently from af33ee7 to c98e34d Compare August 17, 2021 20:46

charris changed the title ~~PERF: Rely on C-level str conversions in loadtxt, for an up to 2x speedup~~ MAINT: Rely on C-level str conversions in loadtxt for an up to 2x speedup Aug 18, 2021

github-actions bot added the 03 - Maintenance label Aug 18, 2021

charris changed the title ~~MAINT: Rely on C-level str conversions in loadtxt for an up to 2x speedup~~ MAINT: Rely on C-level str conversions in loadtxt for up to 2x speedup Aug 18, 2021

anntzer changed the title ~~MAINT: Rely on C-level str conversions in loadtxt for up to 2x speedup~~ PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup Aug 18, 2021

anntzer mentioned this pull request Aug 18, 2021

Include a fast floating point parser #19708

Closed

anntzer force-pushed the loadtxtflatdtype branch 3 times, most recently from bc2e615 to df5ee9f Compare August 23, 2021 04:09

This was referenced Aug 23, 2021

PERF: Avoid unnecessary string operations in loadtxt. #19734

Closed

Change the layout of PyArray_Descr wrt. structured dtypes? #19735

Closed

charris reviewed Aug 23, 2021

View reviewed changes

DFEvans and others added 4 commits August 26, 2021 16:20

BUG: fix string truncation bug in loadtxt

0c33cfd

Closes numpy#17277. If loadtxt is passed an unsized string or byte dtype, the size is set automatically from the longest entry in the first 50000 lines. If longer entries appeared later, they were silently truncated.

PERF: Implicit check for field count in loadtxt.

a126896

In the fast-path of loadtxt, the conversion to np.void implicitly checks the number of fields. Removing the explicit length check saves ~5% for the largest loads (100_000 rows) of numeric scalar types.

anntzer force-pushed the loadtxtflatdtype branch from df5ee9f to a126896 Compare August 26, 2021 14:23

seberg closed this Jan 16, 2022

anntzer deleted the loadtxtflatdtype branch January 16, 2022 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

anntzer commented Aug 17, 2021 •

edited

charris Aug 23, 2021

anntzer Aug 23, 2021

anntzer commented Sep 6, 2021

anntzer commented Sep 16, 2021

seberg commented Sep 22, 2021

anntzer commented Sep 22, 2021

seberg commented Jan 16, 2022

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

Conversation

anntzer commented Aug 17, 2021 • edited

charris Aug 23, 2021

Choose a reason for hiding this comment

anntzer Aug 23, 2021

Choose a reason for hiding this comment

anntzer commented Sep 6, 2021

anntzer commented Sep 16, 2021

seberg commented Sep 22, 2021

anntzer commented Sep 22, 2021

seberg commented Jan 16, 2022

anntzer commented Aug 17, 2021 •

edited