Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrity Check performance #252

Open
ErisDS opened this issue Jul 12, 2021 · 2 comments
Open

Integrity Check performance #252

ErisDS opened this issue Jul 12, 2021 · 2 comments

Comments

@ErisDS
Copy link
Member

ErisDS commented Jul 12, 2021

When first called, knex-migrator runs an "integrity check" to make sure migrations are in a state it can understand.

Problem

This has been implemented in a straightforward way, first reading the versions folder to get all the versions, and then looping over those twice, first to lookup what is in the DB and second to lookup what is in the folder - i.e. a DB read and a file read per version.

knex-migrator/lib/index.js

Lines 1151 to 1238 in cb97928

_.each(folders, function (folder) {
// CASE: versions/1.1-members or versions/2.0-payments
if (folder !== 'init') {
try {
folder = folder.match(/([\d._]+)/)[0];
} catch (err) {
logging.warn('Cannot parse folder name.');
logging.warn('Ignore Folder: ' + folder);
return;
}
}
// CASE:
// if your current version is 1.0 and you add migration scripts for the next version 1.1
// we won't execute them until your current version changes to 1.1 or until you force KM to migrate to it
if (self.currentVersion && !force) {
if (utils.isGreaterThanVersion({smallerVersion: self.currentVersion, greaterVersion: folder})) {
futureVersions.push(folder);
}
}
operations[folder] = self.connection('migrations').where({
version: folder
}).catch(function onMigrationsLookupError(err) {
// CASE: no database selected (database.connection.database="")
if (err.errno === 1046) {
throw new errors.DatabaseIsNotOkError({
message: 'Please define a target database in your configuration.',
help: 'database: {\n\tconnection:\n\t\tdatabase:"database_name"\n\t}\n}\n',
code: 'DB_NOT_INITIALISED'
});
}
// CASE: database does not exist
if (err.errno === 1049) {
throw new errors.DatabaseIsNotOkError({
message: 'Please run knex-migrator init',
code: 'DB_NOT_INITIALISED'
});
}
// CASE: migration table does not exist
if (err.errno === 1 || err.errno === 1146) {
throw new errors.DatabaseIsNotOkError({
message: 'Please run knex-migrator init',
code: 'MIGRATION_TABLE_IS_MISSING'
});
}
throw err;
});
});
return Promise.props(operations)
.then(function (result) {
_.each(result, function (value, version) {
let actual = value.length,
expected = actual;
if (version !== 'init') {
expected = utils.listFiles(path.join(self.migrationPath, subfolder, version)).length;
} else {
expected = utils.listFiles(path.join(self.migrationPath, version)).length;
}
debug('Version ' + version + ' expected: ' + expected);
debug('Version ' + version + ' actual: ' + actual);
toReturn[version] = {
expected: expected,
actual: actual
};
});
// CASE: ensure that either you have to run `migrate --force` or they ran already
if (futureVersions.length) {
_.each(futureVersions, function (futureVersion) {
if (toReturn[futureVersion].actual !== toReturn[futureVersion].expected) {
logging.warn('knex-migrator is skipping ' + futureVersion);
logging.warn('Current version in MigratorConfig.js is smaller then requested version, use --force to proceed!');
logging.warn('Please run `knex-migrator migrate --v ' + futureVersion + ' --force` to proceed!');
delete toReturn[futureVersion];
}
});
}
return toReturn;
});

This code suffers from a performance issue, where the time it takes to run is in the order 0(2n) , where n is the number of versions. 2n rather than just n, as there are 2 loops both doing expensive operations and solving either would be an improvement.

Additionally, the file reads have been implemented using sync methods.

Solution

It should be possible to reduce the number of expensive and synchronous operations, and also get it down to a single loop, by:

  1. Using something like glob to handle reading all the files (it would be ideal to get a nested JSON structure back) asynchronously
  2. doing one request to the db for all migrations grouped by version
  3. Processing the results in a single loop

Fixing this should reduce the time this function takes with large numbers of migrations (i.e. Ghost) significantly

@tpatel
Copy link
Contributor

tpatel commented Jul 22, 2021

Daniel shared some work he did a while ago: ed0c516

daniellockyer added a commit that referenced this issue Oct 8, 2021
refs #252

- the referenced issue describes performance problems with the integrity check code in knex-migrator
- this commit solves half of the problem - 1 DB call per migration minor version folder
- this results in 70+ (and growing) DB calls at the time of writing, with each one going back and forth to the DB... not efficient
- what the code actually needed was a count of how many migrations are currently in the DB, group by minor version - this is very easy to do in SQL and therefore knex
- this commit switches the code around to group all the DB migrations first, and then compares against the folders locally
- this results in a drop of SQL queries from 70+ to 1 🚀
daniellockyer added a commit that referenced this issue Oct 8, 2021
refs #252

- the referenced issue describes performance problems with the integrity check code in knex-migrator
- this commit solves half of the problem - 1 DB call per migration minor version folder
- this results in 70+ (and growing!) DB calls at boot, with each one going back and forth to the DB... not efficient
- what the code actually needed was a count of how many migrations are currently in the DB, group by minor version - this is very easy to do in SQL and therefore knex
- this commit switches the code around to group all the DB migrations first, and then compares against the folders locally
- this results in a drop of SQL queries from 70+ to 1 🚀
daniellockyer added a commit that referenced this issue Oct 8, 2021
refs #252

- the referenced issue describes performance problems with the integrity check code in knex-migrator
- this commit solves half of the problem - 1 DB call per migration minor version folder
- this results in 70+ (and growing!) DB calls at boot, with each one going back and forth to the DB... not efficient
- what the code actually needed was a count of how many migrations are currently in the DB, grouped by version - this is very easy to do in SQL and therefore knex
- this commit switches the code around to group all the DB migrations first, and then compares against the folders locally
- this results in a drop of SQL queries from 70+ to 1 🚀
daniellockyer added a commit that referenced this issue Oct 8, 2021
refs #252

- the referenced issue describes performance problems with the integrity check code in knex-migrator
- this commit solves half of the problem - 1 DB call per migration minor version folder
- this results in 70+ (and growing!) DB calls at boot, with each one going back and forth to the DB... not efficient
- what the code actually needed was a count of how many migrations are currently in the DB, grouped by version - this is very easy to do in SQL and therefore knex
- this commit switches the code around to group all the DB migrations first, and then compares against the folders locally
- this results in a drop of SQL queries from 70+ to 1 🚀
@daniellockyer
Copy link
Member

838fe54 solves half of the problem here - the DB queries. I had the code ready from before and felt they were more important because they go across network.

The file/folder lookup still needs some work but there's some refactoring needed in this area of code first because we're heavily using promise chaining and it's making it painful to work with 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants