Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for prepared statement #322

Open
lbe opened this issue Feb 14, 2024 · 11 comments
Open

Add support for prepared statement #322

lbe opened this issue Feb 14, 2024 · 11 comments

Comments

@lbe
Copy link

lbe commented Feb 14, 2024

Is your feature request related to a problem? Please describe.
I would like to use Prepare and PrepareContext with Jet queries to realize the performance benefit.

I have benchmarked hand written SQL using the native features of database/sql using mattn/go-sqlite3. When using a combination of: WAL, Sync=Normal, Transactions and Prepared Statements I see a performance increase of 2.5 - 3 times that of Jet with WAL, Sync=Normal and Transactions.

On one off queries, the performance penalty is insignificant. It is significant when performing batch inserts or updates even when using upsert patterns. I have used pprof to investigate my code and find that almost all time is spent within sqlite CGo regardless of which implementation I use.

Describe the solution you'd like
I would like to have a way to instruct a query built with Jet to create a prepared statement on the first call of a query and then reuse the prepared statement for any subsequent execution. The syntax could be created by extending the existing verbs, for example, JetStmt.Exec to JetStmt.ExecPrepared and JetStmt.ExecPreparedContext.

I wish my Go skills were such that I could prepare a pull request or at least mock up a potential implementation. Unfortunately they are not.

Thank you for Jet as is and thank you for considering this request.

lbe

@go-jet
Copy link
Owner

go-jet commented Feb 14, 2024

Hi @lbe ,

I see a performance increase of 2.5 - 3 times

Interesting. I've never managed to get much performance improvement from prepared statements. Although I mainly tested with postgres. How complex are your queries? And how many times is insert or update called?

Regardless, I think we should add a prepared statements to jet.

@lbe
Copy link
Author

lbe commented Feb 15, 2024

The queries are not very complex. The insert insert/update has about 12 columns, none of them very long. My test case inserts 11K records and updates one column in the same 11K records via an upsert. The select calls each record by the key.

Since you are interested in adding prepared statements. I will create a project on GitHub that is similar to my application to demonstrate the performance differences. I will need to work on this in my spare time. I should have something available within a couple of weeks.

@go-jet
Copy link
Owner

go-jet commented Feb 15, 2024

I will create a project on GitHub that is similar to my application to demonstrate the performance differences.

That would be great. Thanks.

@lbe
Copy link
Author

lbe commented Feb 17, 2024

I was able to strip the needed parts out of my existing code and created a first cut of code to demonstrate the difference of prepared statements. Please review - https://github.com/lbe/go-sql-test

As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,

The numbers for insert scenario are now closer to 10:1 better. I found some sub-optimal code in my previous 2 - 3:1 improvement.

Please take a look and let me know if this helps. I'm happy to accept any feedback and/or pull requests to make it a better test bed.

lbe

@go-jet
Copy link
Owner

go-jet commented Feb 21, 2024

As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,

I seems it is because of a trigger.

@lbe
Copy link
Author

lbe commented Feb 25, 2024

Sorry for not responding sooner. Too much work and too little life :(

I will check out the prep-stmt branch. It will probably be next weekend before I can.

Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.

Thanks! lbe

@go-jet
Copy link
Owner

go-jet commented Feb 26, 2024

I will check out the prep-stmt branch. It will probably be next weekend before I can.

Final version of prepared statement would be different, but you still can give this version a test.

Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.

It was a guess. When I removed the trigger, performance went up.

@lbe
Copy link
Author

lbe commented Mar 3, 2024

I took a look at the prep-stmt branch and saw a 4X improvement on inserts, but it is still only about 40% of the speed of the raw insert. See below;

Run without prepared statement

go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:06:00 Execution Starting
2024/03/03 09:06:00 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:06:00 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:06:01 Sort data Starting
2024/03/03 09:06:01 Sort data Ended
2024/03/03 09:06:01 Executing insertWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 94311 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Executing selectWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 49383 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Reset database for Jet
2024/03/03 09:06:01 Executing insertWithJet
 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 9673 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Executing selectWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 38788 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Execution Completed

Run with Prepared Statement

go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:02:22 Execution Starting
2024/03/03 09:02:22 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:02:22 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:02:23 Sort data Starting
2024/03/03 09:02:23 Sort data Ended
2024/03/03 09:02:23 Executing insertWithRawSQLUpsert
 100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 103841 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 48054 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Reset database for Jet
2024/03/03 09:02:23 Executing insertWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 39444 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 37583 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Execution Completed

I don't have enough time right now to fully implement the switches to select which test to run in my test app. I quickly hacked in the code for the test above in the prep-stmt branch of my test app. The command line to run it is shown above.

One question, are you only planning on supporting the calls with context? Since database/sql supports both with and without context, I think it would be great if you supported both in a way similar to database/sql.

Thanks for improvement. I look forward to seeing more.

lbe

@go-jet
Copy link
Owner

go-jet commented Mar 7, 2024

Regarding the performance difference. I don't think your code is testing only differences without and with a prepared statement.
A side note, but still important, is that both test cases will use a prepared statement. Standard go SQL library will prepare a statement for each query before execution. So for each Query/Exec, there would be two calls to db, one to prepare and one to execute the query. With an explicitly prepared statement there would be two times fewer calls to db(one to prepare and then one execute for each query). Also, some databases like Postgres will cache the most used prepared statements.

Ordinary requests to some web server with typical time durations between each step look something like this:

User --------------> Server -----------------> Database
          ~10ms     ~200ns         ~1ms         ~1ms

These times are rough estimates and can differentiate significantly depending of the closeness of the user and database server. What is important to note is that time spent on the server is on average 10,000 less than any other step. This means server computation time is negligible and not of huge importance. If this were not the case, everybody would be forced to use C/C++ or Rust for web development.

In your tests, there is no latency between the user and the server. Also, there is no latency between the server and database, because the SQLite database is accessed from the same process as the test(server). The database is empty, meaning the computation on the database would be on par with the test code. This leads to a conclusion what your tests are additionally measuring, is how much faster querying a database using raw string vs constructing the query dynamically with each request.

So, if you want to test the difference with and without explicitly prepared statements, you can compare the time between raw query with and without explicitly prepared statements, or jet query with and without explicitly prepared statements. That would probably give you around 5% speed up. Which shouldn't make any difference for common web server requests. I don't know what type of application you are building, but if those 5% or 40% percent are making a difference, maybe go is not the right language for your project.

In summary, usually, there is no need to use a prepared statement explicitly. There are only a couple of use cases in which I would consider using a prepared statement. One is if the server and database are not close, for instance not in the same AWS region. Using a prepared cached statement would save you one round trip to the database. The second use case is if there is a huge query and I want to make sure the database is using prepared statements every time, to save time on parsing the query each time on db. The third is if you are querying a database in a for loop(for instance to migrate some data).

@go-jet
Copy link
Owner

go-jet commented Mar 7, 2024

I've deleted previous message about prepared statement branch, to avoid confusion.

The prepared statement implementation can be found on stmt-cache branch. To use it, you'll just need to update database opening code.

For instance, currenlty you would have something like this for sqlite:

var db *sql.DB


db, err := sql.Open(driverName, connectionString)

and you need to replace it with:

var db *sqlite.DB

sqlDB, err := sql.Open(driverName, connectionString)

db = sqlite.NewDB(sqlDB).WithStatementsCaching(true)

sqlite.DB is a wrapper around sql.DB and can be used in all the places where you would use sql.DB. For instance:

err := stmt.Query(db, &dest)

This wrapper will intercept all the database queries. For new queries this wrapper will first create prepared statement and cache it, and then it will execute query. Any subsequent query will use cached prepared statement.

If you don't want to use prepared statement caching, you can save *sql.DB also, or you can access underline *sql.DB from *sqlite.DB. For instance:

err := stmt.Query(sqlDB, &dest)

// or

err := stmt.Query(db.DB, &dest)

Prepared statement caching is enabled on transaction created from *sqlite.DB as well.

@lbe
Copy link
Author

lbe commented Apr 13, 2024

Regarding the performance difference. I don't think your code is testing only differences without and with a prepared
...
data).

Sorry for not responding sooner. For some reason, I did not get a notification on GitHub.

I suspect you may be correct that the remaining 60% gap can largely be attributed to the dynamic query generation. I had not originally anticipated much overhead regarding this as I thought the pre-generated code handled this with runtime similar to code written by hand. My original thought was that once the statement was prepared, there would be very little overhead in jet creating the query for each insert, select or update. After reading your response, I realize that I am probably way off here.

Will you be adding the stmt-cache into a release in the near future?

With respect to my application, the cases where I am most concerned about speed are in loops for large initial data loads which include coordination across tables. For those cases, I have concluded that I will need to write the SQL by hand to make sure that it is as efficient at runtime as possible. After running a lot of benchmarks and analyzing in pprof, I have a code base that will reliably load 100MM records across 4 tables in 4 hours vs my original run that took almost 4 days. I will use jet for the rest of the app where where execution speeds are less critical

Lastly, from an earlier post - I was able to get the expected performance on my upserts when I dropped the trigger on the changed timestamp and modified the on conflict section of the insert to handle it. I still don't know why the trigger was so slow - a quirk of sqlite having to do with non-integer primary keys is my guess. Given that this is a bespoke database and will not be used directly outside of my application, I am not sufficiently concerned over this issue to jump into C to investigate.

Cheers, lbe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants