getCount(). Perfs issues for multi columns primary key #5314

jeromeSH26 · 2020-01-10T14:42:25Z

Issue type:

[X ] question

Database system/driver:
[X ] postgres

Version 0.2.18

I'm facing a perf issue when running getCount()
Why when counting all records of a query, queryBuilder creates a SQL query that concatenates the columns that are part of the primary key ?

Typescript :

const query: SelectQueryBuilder<TEntity> = dbConnection
					.getRepository<TEntity>(entity)
					.createQueryBuilder(entity.name)
					.where(`"${entity.name}"."${id}"= :val`, {
						val: value.toLowerCase(),
					});

				const totalRecords: number = await query.getCount();

generated sql

SELECT COUNT(DISTINCT(
	CONCAT(
		"u"."a", 
		"u"."b",
		"u"."c")
	)
) as "cnt" FROM "sc"."tbl" "u"

That is destroying the performances for a table of 140K records.
Creating an index on CONCAT using is not efficient, hard to maintain with 100 of tables, knowing that for some tables, some primay comumns are timestamp...

Is there a way or a parameter to set to avoid the concat and just running a kind of (select count(1) FROM "sc"."tbl" "u") ?

Rgds

The text was updated successfully, but these errors were encountered:

Destreyf · 2020-01-10T21:33:06Z

@jeromeSH26

I haven't used any multi-column keys, can you try this?

const query: SelectQueryBuilder<TEntity> = dbConnection
					.getRepository<TEntity>(entity)
					.createQueryBuilder(entity.name)
					.select("COUNT(*)", "count")
					.where(`"${entity.name}"."${id}"= :val`, {
						val: value.toLowerCase(),
					});

const totalRecords: number = await query.getRawOne().then(r => r.count);

This is how i perform my queries, but i'm also using it in a mixed scenario of count/sum/avg so i use the getOneRaw quite a bit.

jeromeSH26 · 2020-01-11T09:37:30Z

Hi Chris,
thanks for your feedback
I tried your solution and that runs 4.5 faster than the native getCount() (147ms Vs 680ms). Il will go with that solution, since I'm using generic resolvers for typegraphql, so the modification to access DB is easy to maintain, as this is centralized in a single function.
However it seems to me more a workaround as a solution. I would expect the native getCount() to be optimizable. But with a COUNT(DISTINCT(CONCAT(....))) query, difficult to optimize as it impacts the design of the DB (need to add specific BTREE indexes) just for a count..

Will close the issue later on, in case some team guys designing typeorm (which is a fantastic tool btw) want to give us some clues

Rgds

This is how I have refactored the query :

const rootQuery = dbConnection
	.getRepository<TEntity>(entity)
	.createQueryBuilder(entity.name);

const countQuery = rootQuery.select("COUNT(1)", "cnt");
const { cnt }: { cnt: number } = await countQuery.getRawOne();

let query = rootQuery
	.cache(withCache)
	.skip(start)
	.take(nbRecords);

	const records: TEntity[] = await query.getMany();

Destreyf · 2020-01-11T11:38:44Z

@jeromeSH26 i 100% agree that the getCount function should be flexible/adjustable, unfortunately i'm not a maintainer on this project.

jeromeSH26 · 2020-01-11T14:18:47Z

Yep, that's why I leave this question opened for a while

dsbert · 2020-05-01T16:05:31Z

Note this also causes incorrect counts to be returned.

Here is an example.

You have two number columns set as primary keys - field1 and field2.

The following two rows will return the incorrect count.

{
	field1: 11,
	field2: 101
	// concat(field1, field2) = '11101'
},
{
	field1: 1,
	field2: 1101
	// concat(field1, field2) = '11101'
},

Count should be 2, but returns 1.

imnotjames · 2020-10-08T08:23:30Z

Is there a way or a parameter to set to avoid the concat and just running a kind of (select count(1) FROM "sc"."tbl" "u") ?

In cases with joins this won't work. If there are no joins it's possible to be a performance improvement we could apply but we'd need a number of tests to validate behavior.

currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes typeorm#5989 fixes typeorm#5314 fixes typeorm#4550

imnotjames · 2020-10-08T19:35:58Z

Can you confirm that #6870 fixes the issues you're seeing?

currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes #5989 fixes #5314 fixes #4550

) currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes typeorm#5989 fixes typeorm#5314 fixes typeorm#4550

dsbert mentioned this issue May 1, 2020

Count returns incorrect results for multiple primary keys #5989

Closed

imnotjames added driver: postgres question labels Oct 6, 2020

imnotjames mentioned this issue Oct 8, 2020

fix: handle count multiple PK & edge cases more gracefully #6870

Merged

pleerock closed this as completed in #6870 Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getCount(). Perfs issues for multi columns primary key #5314

getCount(). Perfs issues for multi columns primary key #5314

jeromeSH26 commented Jan 10, 2020 •

edited

Destreyf commented Jan 10, 2020

jeromeSH26 commented Jan 11, 2020 •

edited

Destreyf commented Jan 11, 2020

jeromeSH26 commented Jan 11, 2020

dsbert commented May 1, 2020

imnotjames commented Oct 8, 2020

imnotjames commented Oct 8, 2020

getCount(). Perfs issues for multi columns primary key #5314

getCount(). Perfs issues for multi columns primary key #5314

Comments

jeromeSH26 commented Jan 10, 2020 • edited

Destreyf commented Jan 10, 2020

jeromeSH26 commented Jan 11, 2020 • edited

Destreyf commented Jan 11, 2020

jeromeSH26 commented Jan 11, 2020

dsbert commented May 1, 2020

imnotjames commented Oct 8, 2020

imnotjames commented Oct 8, 2020

jeromeSH26 commented Jan 10, 2020 •

edited

jeromeSH26 commented Jan 11, 2020 •

edited