Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage #371

Open
kozikowskik opened this issue Sep 12, 2022 · 7 comments
Open

High memory usage #371

kozikowskik opened this issue Sep 12, 2022 · 7 comments

Comments

@kozikowskik
Copy link
Contributor

I run simple query select * from <tablen_name> where date_column >= '<date>';. The result for this query is 1854211 rows, after I got the first page of results the usql process consumed 6GB of the memory.

obraz

@nineinchnick
Copy link
Member

I can confirm that running this query on PostgreSQL 13 uses about 1GB of memory, no matter if using a pager or not, or running in batch mode:

./usql 'postgres://postgres:pw@localhost:49153?sslmode=disable' -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;"

@nineinchnick
Copy link
Member

pprof001

@nineinchnick
Copy link
Member

The allocation itself should not be that big of an issue, but I'd expect the memory usage to drop once the query is complete and the results have been processed.

@nineinchnick
Copy link
Member

Huh, I somehow thought the TableEncoder.count has a non-zero default, but it's actually not used at all. Maybe we'd want to set it to something like 100k?

@nineinchnick
Copy link
Member

Also, looks like I have no idea how the Go's GC is supposed to work. Calling runtime.GC() explicitly after executing the query drops the memory usage from under 1GB to 3MB.

@kenshaw
Copy link
Member

kenshaw commented Sep 12, 2022

Using GNU time (not the built in time alias):

$ alias time="$(which time) -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
$ time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
	0:02.62 real,	2.07 user,	0.10 sys,	0 amem,	231272 mmem
$ time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
	0:03.37 real,	6.34 user,	0.25 sys,	0 amem,	1068812 mmem

It would appear that usql uses roughly 5x the memory, and takes 3x as long to run. I'll try to see if there are ways to reduce this memory usage, but I find it unlikely.

@nineinchnick
Copy link
Member

See this proof-of-concept, where I enable batching (100k): master...nineinchnick:usql:print-batch

% alias time="gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
% time ./usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
	0:04.83 real,	4.76 user,	0.23 sys,	0 amem,	121280 mmem
% time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
	0:04.48 real,	2.75 user,	0.12 sys,	0 amem,	233536 mmem
% time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
	0:04.74 real,	5.07 user,	0.30 sys,	0 amem,	1219200 mmem
% usql --version
usql 0.12.13
% psql --version
psql (PostgreSQL) 14.5 (Homebrew)

The drawback is that the width of the table (and columns) can change every 100k rows. I wouldn't enable this by default but maybe allow setting an option? But we could add calling runtime.GC() after done executing the query to reduce the memory usage if a user just forgot to add a LIMIT to the query and actually doesn't need this many results :-)

Also I could rebase xo/tblfmt#18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants