Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd int and string parsing on aarch64 #65

Merged
merged 36 commits into from Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5c2d9cb
experiment with int parsing speedups
samuelcolvin Jan 20, 2024
a605ea0
fix tess
samuelcolvin Jan 20, 2024
486097e
tweak big int parsing
samuelcolvin Jan 20, 2024
d5679fa
simd int parsing on aarch64
samuelcolvin Jan 21, 2024
b09d73a
linting
samuelcolvin Jan 21, 2024
bf5cefa
test on macos-latest-xlarge
samuelcolvin Jan 22, 2024
e02f4fd
tweaks
samuelcolvin Jan 22, 2024
f4c00aa
fix ci
samuelcolvin Jan 22, 2024
ebbbc3d
separate simd_aarch64
samuelcolvin Jan 23, 2024
72b421a
simd string parsing for aarch64
samuelcolvin Jan 23, 2024
7e6d5b3
linting
samuelcolvin Jan 24, 2024
2272c41
fix ci
samuelcolvin Jan 26, 2024
58180c0
inlining
samuelcolvin Jan 26, 2024
79de565
simplify somewhat
samuelcolvin Jan 26, 2024
320bb78
more tests
samuelcolvin Jan 26, 2024
cc9a6b3
simplify logic after end of string
samuelcolvin Feb 4, 2024
8379b9d
tweaks
samuelcolvin Feb 4, 2024
dc09f12
bump
samuelcolvin Feb 4, 2024
fb5952c
improve non-ascii checks
samuelcolvin Feb 6, 2024
764c857
fuzz on aarch64
samuelcolvin Feb 6, 2024
a896baf
simplify short int parsing
samuelcolvin Feb 6, 2024
9f976d6
fix tests, address one comment
samuelcolvin Mar 28, 2024
c09ef8a
static ONGOING_CHUNK_SIZE
samuelcolvin Mar 28, 2024
6f386fc
fix benchmarks
samuelcolvin Mar 28, 2024
3aeb18a
fix comments
samuelcolvin Mar 28, 2024
47aba14
remove on_backslash macro
samuelcolvin Mar 28, 2024
37eb734
add cargo-careful
samuelcolvin Mar 28, 2024
5bda07f
without cargo cache
samuelcolvin Mar 28, 2024
c442f2f
fixes required by careful
samuelcolvin Mar 28, 2024
74ef054
bump rust cache
samuelcolvin Mar 28, 2024
4c6e06c
clarify ONGOING_CHUNK_MULTIPLIER
samuelcolvin Apr 1, 2024
195f97d
move comment
samuelcolvin Apr 1, 2024
0f8e3c0
NumberInt::try_from(&[u8])
samuelcolvin Apr 1, 2024
2ba4502
one more test
samuelcolvin Apr 1, 2024
328f357
update README
samuelcolvin Apr 1, 2024
fc7570a
Merge branch 'main' into int-simd
samuelcolvin Apr 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
76 changes: 69 additions & 7 deletions .github/workflows/ci.yml
Expand Up @@ -9,15 +9,19 @@ on:
pull_request: {}

jobs:
test:
name: test rust-${{ matrix.rust-version }}
test-linux:
name: test rust-${{ matrix.rust-version }} on linux
strategy:
fail-fast: false
matrix:
rust-version: [stable, nightly]

runs-on: ubuntu-latest

env:
RUNS_ON: ubuntu-latest
RUST_VERSION: ${{ matrix.rust-version }}

steps:
- uses: actions/checkout@v3

Expand All @@ -30,6 +34,55 @@ jobs:
with:
toolchain: ${{ matrix.rust-version }}

- id: cache-rust
uses: Swatinem/rust-cache@v2
with:
prefix-key: "v1-rust"

- run: cargo install rustfilt coverage-prepare cargo-careful
if: steps.cache-rust.outputs.cache-hit != 'true'

- run: rustup component add llvm-tools-preview

- run: cargo test -F python
env:
RUST_BACKTRACE: 1
RUSTFLAGS: '-C instrument-coverage'

- run: coverage-prepare --ignore-filename-regex '/tests/' lcov $(find target/debug/deps -regex '.*/main[^.]*')

- run: cargo test --doc

- run: cargo careful t -F python
if: matrix.rust-version == 'nightly'

- uses: codecov/codecov-action@v3
with:
env_vars: RUNS_ON,RUST_VERSION

test-macos:
name: test on ${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
runs-on: [macos-latest, macos-latest-xlarge]

runs-on: ${{ matrix.runs-on }}

env:
RUNS_ON: ${{ matrix.runs-on }}
RUST_VERSION: stable

steps:
- uses: actions/checkout@v3

- name: set up python
uses: actions/setup-python@v4
with:
python-version: '3.11'

- uses: dtolnay/rust-toolchain@stable

- id: cache-rust
uses: Swatinem/rust-cache@v2

Expand All @@ -48,13 +101,15 @@ jobs:
- run: cargo test --doc

- uses: codecov/codecov-action@v3
with:
env_vars: RUNS_ON,RUST_VERSION

bench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- uses: moonrepo/setup-rust@v0
- uses: moonrepo/setup-rust@v1
with:
channel: stable
cache-target: release
Expand All @@ -70,11 +125,18 @@ jobs:
token: ${{ secrets.CODSPEED_TOKEN }}

fuzz:
runs-on: ubuntu-latest
name: fuzz on ${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
runs-on: [ubuntu-latest, macos-latest-xlarge]

runs-on: ${{ matrix.runs-on }}

steps:
- uses: actions/checkout@v3

- uses: moonrepo/setup-rust@v0
- uses: moonrepo/setup-rust@v1
with:
channel: nightly
cache-target: release
Expand All @@ -88,7 +150,7 @@ jobs:
steps:
- uses: actions/checkout@v3

- uses: moonrepo/setup-rust@v0
- uses: moonrepo/setup-rust@v1
with:
channel: stable
components: rustfmt, clippy
Expand All @@ -105,7 +167,7 @@ jobs:
# https://github.com/marketplace/actions/alls-green#why used for branch protection checks
check:
if: always()
needs: [test, bench, fuzz, lint]
needs: [test-linux, test-macos, bench, fuzz, lint]
runs-on: ubuntu-latest
steps:
- name: Decide whether the needed jobs succeeded or failed
Expand Down
90 changes: 49 additions & 41 deletions README.md
Expand Up @@ -100,45 +100,53 @@ to a string.
For more details, see [the benchmarks](https://github.com/pydantic/jiter/tree/main/benches).

```text
running 30 tests
test big_jiter_iter ... bench: 7,056,970 ns/iter (+/- 93,517)
test big_jiter_value ... bench: 7,928,879 ns/iter (+/- 150,790)
test big_serde_value ... bench: 32,281,154 ns/iter (+/- 1,152,593)
test bigints_array_jiter_iter ... bench: 26,579 ns/iter (+/- 833)
test bigints_array_jiter_value ... bench: 32,602 ns/iter (+/- 1,901)
test bigints_array_serde_value ... bench: 148,677 ns/iter (+/- 4,517)
test floats_array_jiter_iter ... bench: 36,071 ns/iter (+/- 2,448)
test floats_array_jiter_value ... bench: 33,926 ns/iter (+/- 25,554)
test floats_array_serde_value ... bench: 231,632 ns/iter (+/- 15,617)
test massive_ints_array_jiter_iter ... bench: 102,095 ns/iter (+/- 1,645)
test massive_ints_array_jiter_value ... bench: 108,109 ns/iter (+/- 8,396)
test massive_ints_array_serde_value ... bench: 517,150 ns/iter (+/- 53,110)
test medium_response_jiter_iter ... bench: 0 ns/iter (+/- 0)
test medium_response_jiter_value ... bench: 8,933 ns/iter (+/- 37)
test medium_response_serde_value ... bench: 10,074 ns/iter (+/- 454)
test pass1_jiter_iter ... bench: 0 ns/iter (+/- 0)
test pass1_jiter_value ... bench: 5,704 ns/iter (+/- 161)
test pass1_serde_value ... bench: 7,153 ns/iter (+/- 33)
test pass2_jiter_iter ... bench: 462 ns/iter (+/- 2)
test pass2_jiter_value ... bench: 1,448 ns/iter (+/- 14)
test pass2_serde_value ... bench: 1,385 ns/iter (+/- 13)
test string_array_jiter_iter ... bench: 1,112 ns/iter (+/- 26)
test string_array_jiter_value ... bench: 4,229 ns/iter (+/- 89)
test string_array_serde_value ... bench: 3,650 ns/iter (+/- 23)
test true_array_jiter_iter ... bench: 663 ns/iter (+/- 23)
test true_array_jiter_value ... bench: 1,239 ns/iter (+/- 80)
test true_array_serde_value ... bench: 1,307 ns/iter (+/- 75)
test true_object_jiter_iter ... bench: 3,205 ns/iter (+/- 177)
test true_object_jiter_value ... bench: 5,963 ns/iter (+/- 375)
test true_object_serde_value ... bench: 7,686 ns/iter (+/- 507)

test result: ok. 0 passed; 0 failed; 0 ignored; 30 measured

Running benches/python.rs (target/release/deps/python-11d488ef3a08ee17)

running 4 tests
test python_parse_medium_response ... bench: 8,397 ns/iter (+/- 183)
test python_parse_numeric ... bench: 427 ns/iter (+/- 8)
test python_parse_other ... bench: 160 ns/iter (+/- 8)
test python_parse_true_object ... bench: 8,817 ns/iter (+/- 102)
running 48 tests
test big_jiter_iter ... bench: 3,662,616 ns/iter (+/- 88,878)
test big_jiter_value ... bench: 6,998,605 ns/iter (+/- 292,383)
test big_serde_value ... bench: 29,793,191 ns/iter (+/- 576,173)
test bigints_array_jiter_iter ... bench: 11,836 ns/iter (+/- 414)
test bigints_array_jiter_value ... bench: 28,979 ns/iter (+/- 938)
test bigints_array_serde_value ... bench: 129,797 ns/iter (+/- 5,096)
test floats_array_jiter_iter ... bench: 19,302 ns/iter (+/- 631)
test floats_array_jiter_value ... bench: 31,083 ns/iter (+/- 921)
test floats_array_serde_value ... bench: 208,932 ns/iter (+/- 6,167)
test lazy_map_lookup_1_10 ... bench: 615 ns/iter (+/- 15)
test lazy_map_lookup_2_20 ... bench: 1,776 ns/iter (+/- 36)
test lazy_map_lookup_3_50 ... bench: 4,291 ns/iter (+/- 77)
test massive_ints_array_jiter_iter ... bench: 62,244 ns/iter (+/- 1,616)
test massive_ints_array_jiter_value ... bench: 82,889 ns/iter (+/- 1,916)
test massive_ints_array_serde_value ... bench: 498,650 ns/iter (+/- 47,759)
test medium_response_jiter_iter ... bench: 0 ns/iter (+/- 0)
test medium_response_jiter_value ... bench: 3,521 ns/iter (+/- 101)
test medium_response_jiter_value_owned ... bench: 6,088 ns/iter (+/- 180)
test medium_response_serde_value ... bench: 9,383 ns/iter (+/- 342)
test pass1_jiter_iter ... bench: 0 ns/iter (+/- 0)
test pass1_jiter_value ... bench: 3,048 ns/iter (+/- 79)
test pass1_serde_value ... bench: 6,588 ns/iter (+/- 232)
test pass2_jiter_iter ... bench: 384 ns/iter (+/- 9)
test pass2_jiter_value ... bench: 1,259 ns/iter (+/- 44)
test pass2_serde_value ... bench: 1,237 ns/iter (+/- 38)
test sentence_jiter_iter ... bench: 283 ns/iter (+/- 10)
test sentence_jiter_value ... bench: 357 ns/iter (+/- 15)
test sentence_serde_value ... bench: 428 ns/iter (+/- 9)
test short_numbers_jiter_iter ... bench: 0 ns/iter (+/- 0)
test short_numbers_jiter_value ... bench: 18,085 ns/iter (+/- 613)
test short_numbers_serde_value ... bench: 87,253 ns/iter (+/- 1,506)
test string_array_jiter_iter ... bench: 615 ns/iter (+/- 18)
test string_array_jiter_value ... bench: 1,410 ns/iter (+/- 44)
test string_array_jiter_value_owned ... bench: 2,863 ns/iter (+/- 151)
test string_array_serde_value ... bench: 3,467 ns/iter (+/- 60)
test true_array_jiter_iter ... bench: 299 ns/iter (+/- 8)
test true_array_jiter_value ... bench: 995 ns/iter (+/- 29)
test true_array_serde_value ... bench: 1,207 ns/iter (+/- 36)
test true_object_jiter_iter ... bench: 2,482 ns/iter (+/- 84)
test true_object_jiter_value ... bench: 2,058 ns/iter (+/- 45)
test true_object_serde_value ... bench: 7,991 ns/iter (+/- 370)
test unicode_jiter_iter ... bench: 315 ns/iter (+/- 7)
test unicode_jiter_value ... bench: 389 ns/iter (+/- 6)
test unicode_serde_value ... bench: 445 ns/iter (+/- 6)
test x100_jiter_iter ... bench: 12 ns/iter (+/- 0)
test x100_jiter_value ... bench: 20 ns/iter (+/- 1)
test x100_serde_iter ... bench: 72 ns/iter (+/- 3)
test x100_serde_value ... bench: 83 ns/iter (+/- 3)
```
4 changes: 4 additions & 0 deletions benches/main.rs
Expand Up @@ -239,6 +239,7 @@ test_cases!(medium_response);
test_cases!(x100);
test_cases!(sentence);
test_cases!(unicode);
test_cases!(short_numbers);

fn string_array_jiter_value_owned(bench: &mut Bencher) {
let json = read_file("./benches/string_array.json");
Expand Down Expand Up @@ -336,5 +337,8 @@ benchmark_group!(
lazy_map_lookup_1_10,
lazy_map_lookup_2_20,
lazy_map_lookup_3_50,
short_numbers_jiter_iter,
short_numbers_jiter_value,
short_numbers_serde_value,
);
benchmark_main!(benches);
5 changes: 5 additions & 0 deletions benches/python.rs
Expand Up @@ -51,6 +51,10 @@ fn _python_parse_file(path: &str, bench: &mut Bencher, cache_mode: StringCacheMo
})
}

fn python_parse_massive_ints_array(bench: &mut Bencher) {
_python_parse_file("./benches/massive_ints_array.json", bench, StringCacheMode::All);
}

fn python_parse_medium_response_not_cached(bench: &mut Bencher) {
_python_parse_file("./benches/medium_response.json", bench, StringCacheMode::None);
}
Expand Down Expand Up @@ -111,5 +115,6 @@ benchmark_group!(
python_parse_x100,
python_parse_true_object,
python_parse_true_array,
python_parse_massive_ints_array,
);
benchmark_main!(benches);
1 change: 1 addition & 0 deletions benches/short_numbers.json
@@ -0,0 +1 @@
[0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942, 0, 142, 242, 342, 442, 542, 642, 742, 842, 942]
1 change: 0 additions & 1 deletion src/lazy_index_map.rs
@@ -1,5 +1,4 @@
use std::borrow::{Borrow, Cow};
use std::cmp::{Eq, PartialEq};
use std::fmt;
use std::hash::Hash;
use std::slice::Iter as SliceIter;
Expand Down
2 changes: 2 additions & 0 deletions src/lib.rs
Expand Up @@ -9,6 +9,8 @@ mod parse;
mod py_string_cache;
#[cfg(feature = "python")]
mod python;
#[cfg(target_arch = "aarch64")]
mod simd_aarch64;
mod string_decoder;
mod value;

Expand Down