Skip to content

Performance Boost

Vitaly Tomilov edited this page Nov 1, 2017 · 128 revisions

Does your app have to frequently insert/update more than one record of the same type? Or do you execute frequent sequences of independent queries? If so, this article is definitely for you.

I used the word boost in the title, because the approach suggested here can improve your app's interaction with the database by more than 10 times, depending on the situation.


NOTE: Version 4.1.0 of the library introduced helpers namespace that offers a more generic and flexible way of generating multi-row INSERT and UPDATE queries (as well as single-row ones) than the implementations shown here.


The Problem

At least in 99% of all cases you will find the following to be true:

  • Your query formatting is very fast
  • Your database server is reasonably capable
  • Delivering many queries into the database server sucks

For example, run an insert batch, anywhere from 10 inserts to 100,000 inserts, and you will find that more than 90% of time your app spends trying to pump all those queries into the server through Node.js IO.

And as you typically have to do this inside a transaction, everything has to go through a single IO channel, and this is where you end up losing most of the time. That, plus the fact that your database is trying to handle each query separately.

Here's what this kind of transaction usually looks like:

// You can easily have some 10,000 objects like these:
var users = [{name: 'John', age:23}, {name: 'Mike', age: 30}, {name: 'David', age: 18}];

db.tx(t => {
        var queries = users.map(u => {
            return t.none('INSERT INTO Users(name, age) VALUES(${name}, ${age})', u);
        });
        return t.batch(queries);
    })
    .then(data => {
        // OK
    })
    .catch(error => {
        // Error
    });

Depending on how large your records are, and performance of your system, you are likely to expect for 10,000 records to be inserted in 1-3 seconds. But those figures are beside the point here. The important thing is that more than 90% of time this thing is simply trying to push the inserts through the Node.js IO and into your database server.

Most importantly, you don't have to insert 10,000 records to see the problem here. Inserting 10-100 records frequently will also create 10 or more times the load than it should. And if you are building a scalable system, this becomes a point of concern.

The Solution

What we really need is to minimize the number of interactions between your app and the database server, by concatenating insert values:

INSERT INTO Users(name, age) VALUES ('John', 23), ('Mike', 30), ('David', 18)

How many records you can concatenate like this - depends on the size of the records, but I would never go over 10,000 records with this approach. So if you have to insert many more records, you would want to split them into such concatenated batches and then execute them one by one.

This library has a very flexible query formatting, and with the help of Custom Type Formatting we can easily automate the process of concatenating inserts:

// Concatenates an array of objects or arrays of values, according to the template,
// to use with insert queries. Can be used either as a class type or as a function.
//
// template = formatting template string
// data = array of either objects or arrays of values
function Inserts(template, data) {
    if (!(this instanceof Inserts)) {
        return new Inserts(template, data);
    }
    this.rawType = true;
    this.toPostgres = () => data.map(d => '(' + pgp.as.format(template, d) + ')').join();
}

NOTE: This approach is obsolete now, replaced by helpers.insert.

With the help of this custom type we can automatically generate inserts using either Named Parameters or regular formatting variables $1, $2, etc. See the examples below.

Using a template with Named Parameters
var users = [{name: 'John', age: 23}, {name: 'Mike', age: 30}, {name: 'David', age: 18}];

var values = new Inserts('${name}, ${age}', users); // using Inserts as a type;

db.none('INSERT INTO Users(name, age) VALUES $1', values)
    .then(data => {
        // OK, all records have been inserted
    })
    .catch(error => {
        // Error, no records inserted
    });
Using a template with regular variables
var users = [['John', 23], ['Mike', 30], ['David', 18]];

// We can use Inserts as an inline function also:

db.none('INSERT INTO Users(name, age) VALUES $1', Inserts('$1, $2', users))
    .then(data => {
        // OK, all records have been inserted
    })
    .catch(error => {
        // Error, no records inserted
    });

It is important to note that when executing an insert like this, you do not need a transaction, because if one set of insert values fails, no records will be inserted.

NOTE: Version 4.1.0 of the library introduced helpers namespace that offers a better (more flexible and reliable) way to generate such queries. See also: Multi-row insert with pg-promise.

Results

Below are some performance results that I got on my local PC. All numbers are given in milliseconds.

Number of inserts Regular Task Regular TX Optimized Difference
10,000 1600 850 50 < 32x - 17x
1,000 185 115 15 < 12x - 8x
100 30 20 3 < 10x - 7x

NOTE: When using the optimized approach (via a single insert), the result didn't change whether it was executed inside a task or transaction.

Even with as few as 10 inserts you would be able to get a significant performance difference that matters a lot in a large system. I didn't provide the numbers for 10 inserts here because they are much more difficult to measure accurately.

Other Queries

The lesson learned here doesn't end with insert queries. This optimization approach is just as valid when it comes to all types of queries that need to be executed as a single step: the fewer requests to the server you make, the faster it will perform, and in a busy system - much faster.

Let's consider a simple transaction with three queries:

db.tx(t => {
        return t.batch([
            t.none('INSERT INTO Users(name, age) VALUES($1, $2)', ['John', 23]),
            t.none('DELETE FROM Log WHERE userName = $1', 'John'),
            t.one('SELECT count(*) FROM Users')
        ]);
    })
    .then(data => {
        // data[0] = undefined;
        // data[1] = undefined;
        // data[2].count = number of users;
    })
    .catch(error => {
        // Error
    });

Note that normally you would execute the last SELECT count(*) query chained to the batch, not inside the batch, but we did it this way here just for simplicity.

Below is a reusable helper for concatenating query strings. New applications should use helpers.concat instead.

// Formats and joins queries into a single SQL command;
//
// queries - a mixed array, with two types of elements:
// - a text string for a query without formatting parameters;
// - object {query, values}, where 'values' is optional.
function joinQueries(queries) {
    return queries.map(m => {
        if (typeof m === 'string') {
            // a query string without parameters:
            return m;
        }
        if (m && typeof m === 'object' && 'query' in m) {
            return pgp.as.format(m.query, m.values);
        }
        throw new TypeError('Invalid set of joined queries.');
    }).join(';');
}

Now we can re-implement our transaction like this:

db.tx(t => {
        return t.any(joinQueries([
            {query: 'INSERT INTO Users(name, age) VALUES($1, $2)', values: ['John', 23]},
            {query: 'DELETE FROM Log WHERE userName = $1', values: 'John'},
            'SELECT count(*) FROM Users'
        ]));
    })
    .then(data => {
        // only the last query returns data, that's why data.length = 1
        // data[0].count = number of users;
    })
    .catch(error => {
        // error
    });

Note that now we cannot throw away the transaction logic like with the inserts example, because even though we execute just one command, it contains three independent queries.

See helpers.concat that offers a newer and better solution for concatenating queries.

Conclusion

When it comes to building large scalable systems, and you start with some ORM-s out there, not only they are naturally slower than a query-based framework, but an optimization as highly effective as this one will not even be available to you.

As for the numbers, you might wonder, how many queries does it take for this kind of optimization to bring real efficiency. My tests show that starting from as few as just 2 queries can yield an immediate positive result. But you would need to be working on a very demanding application to consider such a tight optimization.

With all said and done, it is so nice to know that when you suddenly need to improve performance of your database service, you now know one kind of change that will make a real impact.

See also: helpers namespace - a newer and better way of generating insert and update queries.