Performance issue with a lot of rows (20k+) #1125

sneko · 2020-06-09T10:00:44Z

Issue description

Hi,

I know issue section is not for free support, I just present my problem because I think I might miss a concept when using the driver and I'm interested in having your thoughts (I tried on some Slack workspace but without much success).

Context : I need to query 20k+ rows in my database to show multiple charts, but when doing it with this library it takes around 6 seconds to get all results (but with a raw MySQL client only 80ms).

I did different tests listed below, if you have an idea what could improve performance I would appreciate.

Note that when mentioning below "cluster" I mean where I'm hosting the client, and the DB server is hosted with Google Cloud SQL. They both are in the same region/zone so network latency "should not be" part of this problem. And I made sure the client program has enough CPU/memory (same for the database).

Test 1

As mentioned above:

Server: Google Cloud SQL
Client: Golang program in the cluster

rows, err := db.Query("SELECT * FROM analytics WHERE scenario_id='XXXXX'")

i := 0
for rows.Next() {
	i++
}

Duration: 6 seconds

Test 2

Server: Google Cloud SQL
Client: MySQL docker image deployed in the cluster

SELECT * FROM analytics WHERE scenario_id='XXXXX';

Duration: 0.08 second

Test 3

Server: local database (after exporting the Cloud one and imported locally)
Client: Golang program locally

rows, err := db.Query("SELECT * FROM analytics WHERE scenario_id='XXXXX'")

i := 0
for rows.Next() {
	i++
}

Duration: 0.4 second

Test 4

The weird thing compared to above Go test is that if I do:

Server: Google Cloud SQL
Client: Golang program in the cluster

rows, err := db.Query("SELECT * FROM analytics WHERE scenario_id='XXXXX'")

Duration: 0.08 second

In this case, without dealing with .Next(), it catches the raw MySQL client query duration in the Test 2.

Test 5

Server: Google Cloud SQL
Client: Golang program in the cluster

result, err := db.Exec("SELECT * FROM analytics WHERE scenario_id='XXXXX'")

Duration: 6 seconds

In this case, using .Exec() should not expect any result as returned value, but it takes the same time than if I was using all .Next()

"Conclusion"

I'm a bit lost, I don't succeed in understanding what could cause such a difference between the raw MySQL client and using Golang. And also the difference between .Query() and .Exec().

(I have only 9 columns, if I reduce the SELECT to just 1 column, indeed the duration is pulled down)

If you have an advice, I would really appreciate 👍

Thank you,

Configuration

Driver version (or git SHA): v.1.5.0
Go version: go version go1.14.3 darwin/amd64
Server version: MySQL 5.7
Server OS: (hosted by Google Cloud SQL)

The text was updated successfully, but these errors were encountered:

julienschmidt · 2020-06-09T10:18:34Z

This might be related to #1053.
Right now the driver transmits all results, regardless of whether you close the rows early.

Query and Exec are just conveniences of database/sql. Under the hood it still executes the exact same query. I guess at least in the Exec case we handle that smarter in a a rather easy and safe way. However, this is a corner case. You should never perform a SELECT on 20k rows if you don't want to actually read them.

methane · 2020-06-09T10:26:09Z

Please measure CPU/network usage too.

My expectation is, your enabled compression option in MySQL cli.
This driver doesn't support compression yet. (#24)

sneko · 2020-06-09T11:54:00Z

Thanks both for your answers 😃

@julienschmidt you say all results are transmitted and that Query and Exec are the same, so I don't understand the difference of duration between following ones:

Query + for Next() = 6 seconds
Query = 0.08 second
Exec = 6 seconds

If all results are "downloaded" as soon as the Query is called, it would mean "browsing" all downloaded rows with for Next() is what makes things longer?

(Note that my example codes are just for tests, behind the hood I definitely want to use the 20k rows)

@methane since when requesting with the "raw" MySQL docker image takes also around 80ms while displaying all results in the terminal, for sure it seems there is a difference since the environment and resource capacity are the same than with the Golang program in the cluster.

I didn't specified any --compress with the MySQL CLI, and while reading the MySQL docs it seems by default it's disabled.

I didn't see anormal CPU/memory consumption with metrics charts, but I will dig into it with pprof &co within the cluster.

methane · 2020-06-09T12:42:22Z

If all results are "downloaded" as soon as the Query is called, it would mean "browsing" all downloaded rows with for Next() is what makes things longer?

Query() doesn't wait result data. It only receive result header (e.g. column name, etc).
If you don't call rows.Next() or rows.Close(), your program doesn't wait the result data at all.
That is why Query() without Next() is so fast.

I didn't specified any --compress with the MySQL CLI, and while reading the MySQL docs it seems by default it's disabled.

--compress option is not the only way to enable the compression option. Configuration file that you didn't notice might enable the compression option silently.
Try mysql --help | grep compress, and mysql --skip-compress -e 'SELECT * FROM analytics WHERE scenario_id="XXXXX"'.

sneko · 2020-06-25T15:29:31Z

Just to follow up, and sorry for the late response I was working on other things.

In the meantime I upgraded the CPU/memory of the database, but also the "client" service, and it made it more suitable for a production product.

I cannot definitely say it's solved on my side, it would need more testing on my side. But a solution we have since it's to deal with multiple frontend charts: we're gonna apply an algorithm (Douglas-Peucker) to remove meaningless chart points in the database (it keeps extremes) (while keeping full data points in another dedicated database more suitable for this).

Thank you for your answer, I mark this as closed since it probably does not come from your library.

sneko closed this as completed Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue with a lot of rows (20k+) #1125

Performance issue with a lot of rows (20k+) #1125

sneko commented Jun 9, 2020 •

edited

julienschmidt commented Jun 9, 2020 •

edited

methane commented Jun 9, 2020

sneko commented Jun 9, 2020

methane commented Jun 9, 2020

sneko commented Jun 25, 2020 •

edited

Performance issue with a lot of rows (20k+) #1125

Performance issue with a lot of rows (20k+) #1125

Comments

sneko commented Jun 9, 2020 • edited

Issue description

Test 1

Test 2

Test 3

Test 4

Test 5

"Conclusion"

Configuration

julienschmidt commented Jun 9, 2020 • edited

methane commented Jun 9, 2020

sneko commented Jun 9, 2020

methane commented Jun 9, 2020

sneko commented Jun 25, 2020 • edited

sneko commented Jun 9, 2020 •

edited

julienschmidt commented Jun 9, 2020 •

edited

sneko commented Jun 25, 2020 •

edited