Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: TOML performance #19

Open
arp242 opened this issue Oct 1, 2023 · 2 comments
Open

Re: TOML performance #19

arp242 opened this issue Oct 1, 2023 · 2 comments

Comments

@arp242
Copy link

arp242 commented Oct 1, 2023

I'm not sure where else to report this, but re: https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML#18-performance

This is really a tomlc99 issue, not a TOML issue. The problem is that it allocates memory in 1,000 byte chucks, which takes quite a while (see start of toml_parse_file()), and after this it spends most of its time in check_key(), which loops over an array of keys, and is called every time it parses a new key, so ... yeah,

But these are not issues with TOML the file format. Other parsers are much faster; for example:

% ls -lh ~/50M.toml
-rw-r--r-- 1 martin martin 49M Sep 30 20:23 /home/martin/50M.toml

With C++:

% time ./tomlv++ ~/50M.toml
./tomlv++ ~/50M.toml  2.19s user 0.15s system 99% cpu 2.348 total

Go:

% time tomlv-go ~/50M.toml
tomlv-go ~/50M.toml  6.31s user 0.51s system 154% cpu 4.421 total

And Python:

% time ./tomlv.py ~/50M.toml
./tomlv.py ~/50M.toml  11.95s user 0.42s system 99% cpu 12.413 total

None are spectacularly fast as such, but orders of magnitudes faster than tomlc99, which is the outlier here. I also don't know how this compares to the 50M test-file you used (the TOML file lacks comments and is more "dense" in syntax than the big_file.ini that your script generates, so it's probably not a fair comparison).

Most of the other points in that article are more subjective, but I felt this one was clearly wrong.

@madmurphy
Copy link
Owner

@arp242

Thank you for pointing that out. I can write a paragraph about it in the next days (but I am not sure I will have time before mid October).

I also don't know how this compares to the 50M test-file you used (the TOML file lacks comments and is more "dense" in syntax than the big_file.ini that your script generates, so it's probably not a fair comparison).

If I don't remember wrong, I used the INI file generated by dev/tests/performance/prepare.sh (you will need to clone the repository before launching that script). Maybe you could re-post the timings you get using this file?

--madmurphy

@arp242
Copy link
Author

arp242 commented Oct 1, 2023

I can write a paragraph about it in the next days (but I am not sure I will have time before mid October).

No hurries; I came across this almost 2 weeks ago. I also hope to fix the worst of the issue in tomlc99 when I have the time (not my code, so not entirely sure how hard it is).

Maybe you could re-post the timings you get using this file?

That file doesn't produce valid TOML. A few weeks ago I looked in to it and manually fixed some obvious things, but then you run in to:

% tomlv ./big_file.ini
Error in './big_file.ini': toml: line 35: Key 'section' has already been defined.

So this needs more work.

I wrote a little script to frob with TOML table and key names and generated my 50M TOML file with that, but you need a "real" parser for this (can't just s/../../ it), and this won't output comments, so hard to really reproduce that big_file.ini in a way that's valid TOML at the moment.

tomlc99 also can't parse it, which is why you got the error. I wouldn't be surprised if all of those 13 minutes you spent waiting was just for the allocation, and it didn't actually parse anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants