Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getCount() not DISTINCT()? #4550

Closed
evan-lin9 opened this issue Aug 7, 2019 · 8 comments · Fixed by #6870
Closed

getCount() not DISTINCT()? #4550

evan-lin9 opened this issue Aug 7, 2019 · 8 comments · Fixed by #6870

Comments

@evan-lin9
Copy link

evan-lin9 commented Aug 7, 2019

Issue type:

[ ] question
[ ] bug report
[x ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[x ] mssql
[ ] mysql / mariadb
[ ] oracle
[ ] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[ x] latest
[ ] @next
[ ] 0.x.x (or put your version here)

Steps to reproduce or a small repository showing the problem:

now getCount() => select count(DISTINCT(id)), i want select count(id)orselect count(*)

.getCount({ DISTINCT: false })
@Kononnable
Copy link
Contributor

Could you elaborate on your problem? If Id is unique(primary key) what seems to be the problem?

@evan-lin9
Copy link
Author

Id is primary key. My problem is when the amount of data is too large, the query speed is not as good

@dboune
Copy link

dboune commented Oct 17, 2019

I just ran into this as well in a slightly different way with mysql/mariadb on a table with 850k rows, and two primary keys, just wanting to get a total count of rows. This is on my dev machine, so we can expect unusually high query times.

In this case it is running a COUNT(DISTINCT(CONCAT...

Using another SQL tool...
Simple Count on First Primary Key: 1.867 seconds
Count+Distinct: 2.683 seconds (Obviously this would render an undesirable result for this case)
Count+Concat: 2.484 seconds
Count+Distinct+Concat: 22.873 seconds

I realize my case here adds the complexity of two primary keys, and that I could handle this a bit differently. Still, it does highlight how the construction of the count query is not always straightforward, and can currently lead to unnecessary complexity for what looks like a simple case. For a table with far more rows, and I'll have a few of those, the impact would be even greater.

For some cases, it may be possible to simply add an additional index to the primary key.

For my case, I'll go back and build out this count manually, which should resolve my own issue.

@Zippersk
Copy link

Zippersk commented Sep 7, 2020

This also happens on postgres. Is there an option to turn off DISTINCT on count()?

1.8 sec with DISTINCT:

image


0.235 sec without DISTINCT:

image

@palashCItobuz
Copy link

palashCItobuz commented Sep 22, 2020

I have a scenario where the querybuilder query is returning two rows of data with same id, but getCount() is applying a distinct on the query and thus returning 1 as count which is wrong in this context.

"data": {
      "items": [
            {
                "campaign_id": "1",
                "campaign_name": "September Bonanza 2",
                "promoter_join_date": "2020-09-15T09:05:21.425Z",
                "promoter_id": "3",
                "promoter_full_name": "Charles Trenet",
                "promoter_email": "charles.trenet@gmail.com"
            },
            {
                "campaign_id": "1",
                "campaign_name": "September Bonanza 2",
                "promoter_join_date": "2020-09-15T09:05:47.889Z",
                "promoter_id": "4",
                "promoter_full_name": "Rupert Bear",
                "promoter_email": "rupert.bear@gmail.com"
            }
     ],
     "itemsCount": 1,
     "totalPages": 1,
     "currentPage": 1
}

@imnotjames
Copy link
Contributor

Pretty sure distinct on count is because the joins that happen as part of an entity loading. Without it the count would not work as expected.

@palashCItobuz in your case you'd want to update the query builder to be on the promoter - not the campaign.

@imnotjames
Copy link
Contributor

While there can be performance improvements in some cases, this is working as intended as far as I can see.

imnotjames added a commit to imnotjames/typeorm that referenced this issue Oct 8, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
imnotjames added a commit to imnotjames/typeorm that referenced this issue Oct 8, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
@imnotjames imnotjames reopened this Oct 8, 2020
pleerock pushed a commit that referenced this issue Oct 9, 2020
currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes #5989
fixes #5314
fixes #4550
@Leulz
Copy link

Leulz commented Nov 29, 2020

@imnotjames I was thinking of creating a new issue here for a Feature Request, since I'd like the repository.count() to have the possibility to be not distinct. But then I found this issue and read your comments, and now I'm not so sure.

This feature is wanted by me due to the fact that in Postgres, a COUNT(DISTINCT...)) query can be pretty heavy. There are quite a few references online talking about that, this article being one example. So, for Postgres, it'd be great if we could tell typeorm that we know that the distinct is not needed. The default behaviour would keep being the current one, of course.

Do you think this feature would be useful? I'm currently getting around it by using a QueryBuilder, but it'd be far less verbose if I could just pass an option to count.

zaro pushed a commit to zaro/typeorm that referenced this issue Jan 12, 2021
)

currently we use concatenation of multiple primary keys and a
COUNT DISTINCT of that to figure out how many records we have
matched in a query.

however, that fails if the records have keys when the keys
are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`)

the fact that we do a distinct can also be a performance impact
that isn't needed when we aren't doing joins

as such, in MySQL & Postgres we can use the built in counting
of multiple distinct values to resolve some of the issues,
and in other environments we can make it SLIGHTLY better by
adding delimiters between the concatenated values.  It is not
perfect because it technically could run into the same issue
if the delimiters are in the primary keys but it's BETTER
in most cases.

also, in cases where we do not perform any joins we can
short circuit all of this and do a much more performant
`COUNT(1)` operation

fixes typeorm#5989
fixes typeorm#5314
fixes typeorm#4550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants