Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getCount(). Perfs issues for multi columns primary key #5314

Closed
jeromeSH26 opened this issue Jan 10, 2020 · 7 comments · Fixed by #6870
Closed

getCount(). Perfs issues for multi columns primary key #5314

jeromeSH26 opened this issue Jan 10, 2020 · 7 comments · Fixed by #6870

Comments

@jeromeSH26
Copy link

jeromeSH26 commented Jan 10, 2020

Issue type:

[X ] question

Database system/driver:
[X ] postgres

Version 0.2.18

I'm facing a perf issue when running getCount()
Why when counting all records of a query, queryBuilder creates a SQL query that concatenates the columns that are part of the primary key ?

Typescript :

const query: SelectQueryBuilder<TEntity> = dbConnection
					.getRepository<TEntity>(entity)
					.createQueryBuilder(entity.name)
					.where(`"${entity.name}"."${id}"= :val`, {
						val: value.toLowerCase(),
					});

				const totalRecords: number = await query.getCount();

generated sql

SELECT COUNT(DISTINCT(
	CONCAT(
		"u"."a", 
		"u"."b",
		"u"."c")
	)
) as "cnt" FROM "sc"."tbl" "u"

That is destroying the performances for a table of 140K records.
Creating an index on CONCAT using is not efficient, hard to maintain with 100 of tables, knowing that for some tables, some primay comumns are timestamp...

Is there a way or a parameter to set to avoid the concat and just running a kind of (select count(1) FROM "sc"."tbl" "u") ?

Rgds

@Destreyf
Copy link

@jeromeSH26

I haven't used any multi-column keys, can you try this?

const query: SelectQueryBuilder<TEntity> = dbConnection
					.getRepository<TEntity>(entity)
					.createQueryBuilder(entity.name)
					.select("COUNT(*)", "count")
					.where(`"${entity.name}"."${id}"= :val`, {
						val: value.toLowerCase(),
					});

const totalRecords: number = await query.getRawOne().then(r => r.count);

This is how i perform my queries, but i'm also using it in a mixed scenario of count/sum/avg so i use the getOneRaw quite a bit.

@jeromeSH26
Copy link
Author

jeromeSH26 commented Jan 11, 2020

Hi Chris,
thanks for your feedback
I tried your solution and that runs 4.5 faster than the native getCount() (147ms Vs 680ms). Il will go with that solution, since I'm using generic resolvers for typegraphql, so the modification to access DB is easy to maintain, as this is centralized in a single function.
However it seems to me more a workaround as a solution. I would expect the native getCount() to be optimizable. But with a COUNT(DISTINCT(CONCAT(....))) query, difficult to optimize as it impacts the design of the DB (need to add specific BTREE indexes) just for a count..

Will close the issue later on, in case some team guys designing typeorm (which is a fantastic tool btw) want to give us some clues

Rgds

This is how I have refactored the query :

const rootQuery = dbConnection
	.getRepository<TEntity>(entity)
	.createQueryBuilder(entity.name);

const countQuery = rootQuery.select("COUNT(1)", "cnt");
const { cnt }: { cnt: number } = await countQuery.getRawOne();

let query = rootQuery
	.cache(withCache)
	.skip(start)
	.take(nbRecords);

	const records: TEntity[] = await query.getMany();

@Destreyf
Copy link

@jeromeSH26 i 100% agree that the getCount function should be flexible/adjustable, unfortunately i'm not a maintainer on this project.

@jeromeSH26
Copy link
Author

Yep, that's why I leave this question opened for a while

@dsbert
Copy link
Contributor

dsbert commented May 1, 2020

Note this also causes incorrect counts to be returned.

Here is an example.

You have two number columns set as primary keys - field1 and field2.

The following two rows will return the incorrect count.

{
	field1: 11,
	field2: 101
	// concat(field1, field2) = '11101'
},
{
	field1: 1,
	field2: 1101
	// concat(field1, field2) = '11101'
},

Count should be 2, but returns 1.

@imnotjames
Copy link
Contributor

Is there a way or a parameter to set to avoid the concat and just running a kind of (select count(1) FROM "sc"."tbl" "u") ?

In cases with joins this won't work. If there are no joins it's possible to be a performance improvement we could apply but we'd need a number of tests to validate behavior.

imnotjames added a commit to imnotjames/typeorm that referenced this issue Oct 8, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
imnotjames added a commit to imnotjames/typeorm that referenced this issue Oct 8, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
@imnotjames
Copy link
Contributor

Can you confirm that #6870 fixes the issues you're seeing?

pleerock pushed a commit that referenced this issue Oct 9, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes #5989
fixes #5314
fixes #4550
zaro pushed a commit to zaro/typeorm that referenced this issue Jan 12, 2021
)

currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants