Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to understand long report generation #711

Open
hevisko opened this issue Jan 26, 2022 · 6 comments
Open

Trying to understand long report generation #711

hevisko opened this issue Jan 26, 2022 · 6 comments

Comments

@hevisko
Copy link

hevisko commented Jan 26, 2022

Currently I have multiple "small" 100MB sized files sent to my pgBadger "processor" where the files are processed with -I -J <cores> every hour as they arived. This is generating a bunch of ___.bin files, and then takes quite a while to generate the HTML reports it seems.

So my questions:

  1. Would the HTMLreport generation be faster if the *.bin files are merged into a single file for the day/week/month? If so, is that something that is possible to achieve (Even if in a phased rocessing)

  2. Would it be possible to, not just do a report per day/week/month (thanks for those) but for a specific time that we had a problem to investigate?

@darold
Copy link
Owner

darold commented Feb 2, 2022

If you want to generate a report for a specific day you just have to give the day binary files as input, for example:

pgbadger -o myreportdir/ data/2022/02/02/*.bin

you will have a report for this specific day.

Normally mutliprocess should also be used to build the report but let me check if -J is taken in account, it is possible that I have only take care of -j.

@darold
Copy link
Owner

darold commented Feb 2, 2022

Despite what I've though there is no multiprocess used for report generation, this could be an improvement.

@hevisko
Copy link
Author

hevisko commented Feb 13, 2022

There is another issue that might be part of this: The memory grows - and grows during this process (In my one very busy DB's case >32GB and currently ~50GB while busy catching up) which might be begging the question how to limit RAM consumption during report generation time?

@darold
Copy link
Owner

darold commented Feb 14, 2022

You can limit the memory by reducing the top number of element stored and the max query length, for example: -t 15 -m 2048

@hvisage
Copy link

hvisage commented Mar 29, 2022

my trouble when I cut the statement size, is that the devilish details for this specific instance, is in the query data that gets "cut" and the problem statements are those that is in the 10-20 top elements :(=)

will have to handle the RAM spikes in this processing, was hoping for a "swap to disk" option, but I'm an outlier - yet again :D

@darold
Copy link
Owner

darold commented Mar 29, 2022

If you can send the *.bin files required to reproduce your issue to my private email I will try to see what could be improved in pgbadger otherwise I'm afraid I can't do much more for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants