Reduce database transactions during analysis #35

brianwarner · 2019-04-01T22:54:26Z

The analysis portion of a facade-worker.py run is very database intensive, for a number of reasons. When designing the analysis functions, I wanted to be able to log neurotically, and stashed data as soon as it was computed so that if/when facade-worker.py failed, very little data would need to be recalculated. (FWIW recovery from an unplanned exit isn't an issue, as facade-worker.py just sees where it needs to pick up and resumes from there)

However, when a commit has a lot of files or there are a lot of commits in an analysis, the repeated database access can really slow things down. The mysqldb module is supposed to be the fastest connector, but at a certain point the sheer volume is the issue.

One potential solution would be to accumulate analyzed data in a temporary in-memory database, and then write everything out to storage in one big transaction at the end of a repo analysis. This should have minimal impact on short runs, but potentially a much larger impact on long runs.

One other massive advantage of reducing database transactions is that it could give us the option to use the native python MySQL library, pymysql. In my tests pymysql is considerably slower than mysqldb for an individual transaction, but pymysql is necessary if we want to use PyPy. In past testing PyPy runs were slower, which is counterintuitive. The best explanation I can think of is that PyPy was hamstrung by the number of database transactions. There will be a push/pull performance tension here, but there's a pretty good chance that if we optimize database transactions, the gains from PyPy will make up for pymysql's latency.

The text was updated successfully, but these errors were encountered:

brianwarner self-assigned this Apr 1, 2019

brianwarner added this to the Next major release milestone Apr 1, 2019

brianwarner added the enhancement label Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce database transactions during analysis #35

Reduce database transactions during analysis #35

brianwarner commented Apr 1, 2019

Reduce database transactions during analysis #35

Reduce database transactions during analysis #35

Comments

brianwarner commented Apr 1, 2019