getCount() not DISTINCT()? #4550

evan-lin9 · 2019-08-07T10:18:16Z

Issue type:

[ ] question
[ ] bug report
[x ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[x ] mssql
[ ] mysql / mariadb
[ ] oracle
[ ] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[ x] latest
[ ] @next
[ ] 0.x.x (or put your version here)

Steps to reproduce or a small repository showing the problem:

now getCount() => select count(DISTINCT(id)), i want select count(id)orselect count(*)

.getCount({ DISTINCT: false })

The text was updated successfully, but these errors were encountered:

Kononnable · 2019-08-31T19:27:16Z

Could you elaborate on your problem? If Id is unique(primary key) what seems to be the problem?

evan-lin9 · 2019-09-02T03:05:23Z

Id is primary key. My problem is when the amount of data is too large, the query speed is not as good

dboune · 2019-10-17T04:01:28Z

I just ran into this as well in a slightly different way with mysql/mariadb on a table with 850k rows, and two primary keys, just wanting to get a total count of rows. This is on my dev machine, so we can expect unusually high query times.

In this case it is running a COUNT(DISTINCT(CONCAT...

Using another SQL tool...
Simple Count on First Primary Key: 1.867 seconds
Count+Distinct: 2.683 seconds (Obviously this would render an undesirable result for this case)
Count+Concat: 2.484 seconds
Count+Distinct+Concat: 22.873 seconds

I realize my case here adds the complexity of two primary keys, and that I could handle this a bit differently. Still, it does highlight how the construction of the count query is not always straightforward, and can currently lead to unnecessary complexity for what looks like a simple case. For a table with far more rows, and I'll have a few of those, the impact would be even greater.

For some cases, it may be possible to simply add an additional index to the primary key.

For my case, I'll go back and build out this count manually, which should resolve my own issue.

Zippersk · 2020-09-07T22:01:42Z

This also happens on postgres. Is there an option to turn off DISTINCT on count()?

1.8 sec with DISTINCT:

0.235 sec without DISTINCT:

palashCItobuz · 2020-09-22T04:10:20Z

I have a scenario where the querybuilder query is returning two rows of data with same id, but getCount() is applying a distinct on the query and thus returning 1 as count which is wrong in this context.

"data": {
      "items": [
            {
                "campaign_id": "1",
                "campaign_name": "September Bonanza 2",
                "promoter_join_date": "2020-09-15T09:05:21.425Z",
                "promoter_id": "3",
                "promoter_full_name": "Charles Trenet",
                "promoter_email": "charles.trenet@gmail.com"
            },
            {
                "campaign_id": "1",
                "campaign_name": "September Bonanza 2",
                "promoter_join_date": "2020-09-15T09:05:47.889Z",
                "promoter_id": "4",
                "promoter_full_name": "Rupert Bear",
                "promoter_email": "rupert.bear@gmail.com"
            }
     ],
     "itemsCount": 1,
     "totalPages": 1,
     "currentPage": 1
}

imnotjames · 2020-10-04T03:53:41Z

Pretty sure distinct on count is because the joins that happen as part of an entity loading. Without it the count would not work as expected.

@palashCItobuz in your case you'd want to update the query builder to be on the promoter - not the campaign.

imnotjames · 2020-10-04T03:54:45Z

While there can be performance improvements in some cases, this is working as intended as far as I can see.

currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes typeorm#5989 fixes typeorm#5314 fixes typeorm#4550

currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes #5989 fixes #5314 fixes #4550

Leulz · 2020-11-29T03:11:54Z

@imnotjames I was thinking of creating a new issue here for a Feature Request, since I'd like the repository.count() to have the possibility to be not distinct. But then I found this issue and read your comments, and now I'm not so sure.

This feature is wanted by me due to the fact that in Postgres, a COUNT(DISTINCT...)) query can be pretty heavy. There are quite a few references online talking about that, this article being one example. So, for Postgres, it'd be great if we could tell typeorm that we know that the distinct is not needed. The default behaviour would keep being the current one, of course.

Do you think this feature would be useful? I'm currently getting around it by using a QueryBuilder, but it'd be far less verbose if I could just pass an option to count.

) currently we use concatenation of multiple primary keys and a COUNT DISTINCT of that to figure out how many records we have matched in a query. however, that fails if the records have keys when the keys are ambigious when concatenated (`"A", "AA"` & `"AA", "A"`) the fact that we do a distinct can also be a performance impact that isn't needed when we aren't doing joins as such, in MySQL & Postgres we can use the built in counting of multiple distinct values to resolve some of the issues, and in other environments we can make it SLIGHTLY better by adding delimiters between the concatenated values. It is not perfect because it technically could run into the same issue if the delimiters are in the primary keys but it's BETTER in most cases. also, in cases where we do not perform any joins we can short circuit all of this and do a much more performant `COUNT(1)` operation fixes typeorm#5989 fixes typeorm#5314 fixes typeorm#4550

Kononnable added the needs more info label Aug 31, 2019

dsbert mentioned this issue May 1, 2020

Count returns incorrect results for multiple primary keys #5989

Closed

imnotjames closed this as completed Oct 4, 2020

imnotjames mentioned this issue Oct 8, 2020

fix: handle count multiple PK & edge cases more gracefully #6870

Merged

imnotjames reopened this Oct 8, 2020

imnotjames closed this as completed Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getCount() not DISTINCT()? #4550

getCount() not DISTINCT()? #4550

evan-lin9 commented Aug 7, 2019 •

edited

Kononnable commented Aug 31, 2019

evan-lin9 commented Sep 2, 2019

dboune commented Oct 17, 2019

Zippersk commented Sep 7, 2020 •

edited

palashCItobuz commented Sep 22, 2020 •

edited

imnotjames commented Oct 4, 2020

imnotjames commented Oct 4, 2020

Leulz commented Nov 29, 2020

getCount() not DISTINCT()? #4550

getCount() not DISTINCT()? #4550

Comments

evan-lin9 commented Aug 7, 2019 • edited

Kononnable commented Aug 31, 2019

evan-lin9 commented Sep 2, 2019

dboune commented Oct 17, 2019

Zippersk commented Sep 7, 2020 • edited

1.8 sec with DISTINCT:

0.235 sec without DISTINCT:

palashCItobuz commented Sep 22, 2020 • edited

imnotjames commented Oct 4, 2020

imnotjames commented Oct 4, 2020

Leulz commented Nov 29, 2020

evan-lin9 commented Aug 7, 2019 •

edited

Zippersk commented Sep 7, 2020 •

edited

palashCItobuz commented Sep 22, 2020 •

edited