Isn't synchronous I/O bad? #181

qiulang · 2018-10-10T03:37:08Z

Hi,
Sorry if this question is asked before, I have checked the closed issues but I can only find this one #150
My question is you said

node-sqlite3 uses asynchronous APIs for tasks that are either CPU-bound or serialized. That's not only bad design, but it wastes tons of resources.

But thru my limited experience with sqlite I found most of time the execution is I/O-bound instead of CPU-bound, correct me if I was wrong. If it is I/O-bound isn't that what asynchronous APIs is used for?

JoshuaWise · 2018-10-11T21:13:33Z

There are many legitimate concerns with synchronous filesystem I/O. However, in the case of SQLite3, there are also reasons to not be concerned.

What if I'm using a network drive, or a slow spinning disk drive?

For a Node.js program, this is very rare. Most Node.js programs are web servers, and most server providers use SSDs by default, including AWS, Google Cloud, Digital Ocean, etc. SSDs are the default choice for web servers and relational databases in 2018. If you're not using an SSD, then you're already making a bad decision, regardless of which relational database you choose to use, and regardless of which SQLite3 library you choose. The primary reason to use a spinning disk drive is to spend less money, which is typically only relevant for very large drives (in the terabyte range), and if your database size is in the terabyte range, then you probably shouldn't be using SQLite at all.

What if I'm using an operating system with a poor filesystem cache?

In 2018, nearly all modern filesystems are very efficient. When it comes to reading data, modern filesystem caches have an extremely high cache hit rate. If your database is extremely large (in the terabyte range) then random access patterns will be difficult for the filesystem to deal with, but, as noted previously, if your database size is in the terabyte range then you shouldn't be using SQLite3 at all. On the other hand, with a 60 GB database (for example), most modern filesystems are able to use cache very effectively, even with semi-random access patterns.

Isn't it slow to write data, even with an SDD?

When it comes to writing data, the typical bottleneck is fsync() operations. Fortunately, SQLite3 is very smart (especially in WAL mode), and will only perform fsyncs very rarely, when absolutely necessary. The summary is that SQLite3 does not simply do fs.writeFileSync() every time you insert or update a row—it's much smarter than that. You can read about WAL mode here and here.

What if I'm running queries that return large amounts of data?

This is usually the primary reason to discourage synchronous filesystem I/O. Even if you're using an SSD and a modern operating system, reading very large chunks of data (> 1 MB) will inevitably be slow, which is bad when that blocks your main thread. However, most system architects keep large chunks of data (such as images or videos) in separate files—not part of the relational database itself (only keeping filenames in the database). This is considered good practice even in full-fledged RDBMSs such as PostgreSQL or MySQL. Therefore, most common database reads are in the kilobyte range, or smaller. This type of access pattern is extremely fast on modern operating systems. In fact, it's typically so fast that the overhead of an asynchronous operation (thread management) would actually cause more latency than simply reading synchronously. This is not always the case, but it commonly is.

The nature of SQLite3

Even if you're using an asynchronous SQLite3 library, SQLite3 is only capable of executing one query at a time. You cannot execute queries in parallel within a single process. Typical asynchronous SQLite3 libraries (such as node-sqlite3) actually queue up your queries in the background, and execute them one-by-one, even if you attempt to execute them in parallel. So you have a choice with SQLite3: either execute your queries one-by-one in a background thread, or execute your queries one-by-one in the main thread. The only advantage to using a background thread is to avoid blocking other operations while a query is running. However, with a proper database design (as stated in the README), individual queries should generally complete within a matter of microseconds, even with a large database and complex joins. At that speed, you're not adding any practical latency to network requests or other operations performed by your main thread.

If your queries are unavoidably slow, then an asynchronous SQLite3 library would make things easier for your main thread, but it won't save you from the real bottleneck in that case: your database, which is only able to execute queries one-by-one. If you fall into this category, you should be using a full-fledged RDBMS such as PostgreSQL or MySQL. In summary, if your queries are slow, SQLite3 is not for you, regardless of which library you choose.

A synchronous SQLite3 library avoids the overhead of thread management, which is great. But in addition to that, it also enables:

transactions (you don't risk other code injecting queries into your asynchronous transaction)
user-defined functions, including aggregate and window functions
prepared statements that are re-usable (which tremendously improves performance)

I hope this helps to understand why one might choose to use a synchronous SQLite3 library in Node.js.

JoshuaWise · 2018-10-11T21:26:13Z

node-sqlite3 uses asynchronous APIs for tasks that are either CPU-bound or serialized. That's not only bad design, but it wastes tons of resources.

thru my limited experience with sqlite I found most of time the execution is I/O-bound instead of CPU-bound, correct me if I was wrong

node-sqlite3 uses an asynchronous API for preparing statements, binding parameters, reseting prepared statements, and destroying prepared statements, none of which perform any filesystem I/O. As for the operations that do perform filesystem I/O (i.e., executing queries), these operations are always serialized (executed one-by-one).

qiulang · 2018-10-12T03:30:21Z

wow I didn't expect such a detailed respond! Thanks! I am pushing hard to let my team switch from node-sqlite3 to better-sqlite 👍

takase1121 · 2019-03-27T15:14:16Z

This issue should be pinned. Its very helpful and explains a lot of things about the design decisions

JoshuaWise · 2019-04-04T13:25:53Z

Leaving this issue open as it's a common question people have.

jasonsparc · 2019-05-01T16:19:50Z

So does that mean I shouldn't use better-sqlite3 for my Electron App?

I'm building a desktop application to be consumed by the average user, and very likely, not everyone who will install my software would be owning a decent hard disk drive, let alone, an SSD.

JoshuaWise · 2019-05-01T16:29:56Z

@jasonsparc Although desktop apps can't control the environment that they run in, they only need to serve one user, instead of thousands or millions. So you'll be fine. Many people/organizations use better-sqlite3 for Electron apps, and are happy with the results.

jasonsparc · 2019-05-01T18:49:40Z

@JoshuaWise I really want to use better-sqlite3 (especially with its memory management features, as being a huge plus!). But, what prevents me from using it is that, I have just found out that it is compiled with SQLITE_THREADSAFE=0 by default.

To describe what I'm trying to accomplish:

I was considering a synchronous alternative for sqlite3 npm package so that I have more control on what background thread my queries would run, and thereby having more control on performance tunings. Also, not only that I would perform multiple queries at the same time, I would also be running an automatic peer-to-peer syncing between devices using my electron app, and also with a flutter app (for mobile).

If it were compiled instead with SQLITE_THREADSAFE=1, then I wouldn't have to worry about making my own serialized queue in the JS side on my worker thread (nor should I worry about the safety of my system's design), saving me some precious development time. I want my app to be fast, but I also want it to be maintainable enough.

I also think that SQLITE_THREADSAFE=1 should've been the default for better-sqlite3, since people should be able to toggle it off at runtime anyway. I also think that it could've have been too error-prone for users of better-sqlite3 if they are using them in an electron app with worker threads, that is, at least for people who skipped reading the docs, not realizing the dangerous implications that SQLITE_THREADSAFE=0 is set by default. Maybe an updated README file would help.

But, I'm also looking forward to the resolution for #237. Hoping that you would eventually decide to use SQLITE_THREADSAFE=1 instead—also because node.js is no longer just for single-threaded applications, and as stated by SQLite docs:

2. Recommended Compile-time Options

… Not all of these compile-time options are usable by every application. For example, the SQLITE_THREADSAFE=0 option is only usable by applications that never access SQLite from more than one thread at a time. …

JoshuaWise · 2019-05-01T18:58:05Z

@jasonsparc See my comment in #237 (comment).

Worker Threads in Node.js are a very new feature. Since better-sqlite3 is a native addon, supporting them would be more complicated than simply enabling SQLITE_THREADSAFE=1. It's something I'm interested in supporting in the future.

jasonsparc · 2019-05-01T19:01:29Z

@JoshuaWise Cool! Thanks for the fast reply.

I didn't know about that. I'll be looking forward to that then! :)

JoshuaWise added the question label Oct 11, 2018

JoshuaWise added sticky and removed question labels Oct 11, 2018

JoshuaWise changed the title ~~Question: Isn't sqlite execution I/O-bound instead of CPU-bound most of time ?~~ Isn't synchronous I/O bad? Oct 11, 2018

JoshuaWise mentioned this issue Mar 11, 2019

Better-SQLite-Pool #234

Closed

qiulang closed this as completed Apr 4, 2019

JoshuaWise reopened this Apr 4, 2019

lolopinto mentioned this issue Jun 14, 2021

TypeScript SQLite support lolopinto/ent#366

Merged

JoshuaWise mentioned this issue Dec 19, 2022

Convince me to use better-sqlite3 #262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isn't synchronous I/O bad? #181

Isn't synchronous I/O bad? #181

qiulang commented Oct 10, 2018

JoshuaWise commented Oct 11, 2018 •

edited

JoshuaWise commented Oct 11, 2018 •

edited

qiulang commented Oct 12, 2018 •

edited

takase1121 commented Mar 27, 2019

JoshuaWise commented Apr 4, 2019

jasonsparc commented May 1, 2019 •

edited

JoshuaWise commented May 1, 2019 •

edited

jasonsparc commented May 1, 2019 •

edited by JoshuaWise

2. Recommended Compile-time Options

JoshuaWise commented May 1, 2019

jasonsparc commented May 1, 2019 •

edited

Isn't synchronous I/O bad? #181

Isn't synchronous I/O bad? #181

Comments

qiulang commented Oct 10, 2018

JoshuaWise commented Oct 11, 2018 • edited

What if I'm using a network drive, or a slow spinning disk drive?

What if I'm using an operating system with a poor filesystem cache?

Isn't it slow to write data, even with an SDD?

What if I'm running queries that return large amounts of data?

The nature of SQLite3

JoshuaWise commented Oct 11, 2018 • edited

qiulang commented Oct 12, 2018 • edited

takase1121 commented Mar 27, 2019

JoshuaWise commented Apr 4, 2019

jasonsparc commented May 1, 2019 • edited

JoshuaWise commented May 1, 2019 • edited

jasonsparc commented May 1, 2019 • edited by JoshuaWise

2. Recommended Compile-time Options

JoshuaWise commented May 1, 2019

jasonsparc commented May 1, 2019 • edited

JoshuaWise commented Oct 11, 2018 •

edited

JoshuaWise commented Oct 11, 2018 •

edited

qiulang commented Oct 12, 2018 •

edited

jasonsparc commented May 1, 2019 •

edited

JoshuaWise commented May 1, 2019 •

edited

jasonsparc commented May 1, 2019 •

edited by JoshuaWise

jasonsparc commented May 1, 2019 •

edited