Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query with multiple COUNT clauses returns incorrect result depending on order in projection #1312

Open
pmaria opened this issue Mar 12, 2024 · 4 comments

Comments

@pmaria
Copy link

pmaria commented Mar 12, 2024

Issue type:

  • 馃悰 Bug

Description:

Given the following input

@prefix ex: <http://example.org/> .

<http://foo.org/id/graph/foo> {
  <http://foo.org/id/object/foo>
    ex:foo ex:Foo ;
    ex:bar [] ;
    ex:baz "baz1", "baz2", "baz3" .
}

running this query:

SELECT (COUNT(DISTINCT ?s) AS ?subjects) (COUNT(DISTINCT ?o) as ?objects) ?p
FROM <http://foo.org/id/graph/foo>
{
  ?s ?p ?o .
}
GROUP BY ?p

results in

image

while running this query (object count switched with subject count in projection):

SELECT (COUNT(DISTINCT ?o) as ?objects) (COUNT(DISTINCT ?s) AS ?subjects) ?p
FROM <http://foo.org/id/graph/foo>
{
  ?s ?p ?o .
}
GROUP BY ?p

results in the expected

image


Environment:

software version
Comunica Engine 2.10.2
node v21.6.1
npm 10.2.4
yarn Yarn is unavailable
Operating System linux (Linux 5.15.133.1-microsoft-standard-WSL2)

NOTE I'm using npx comunica-sparql-file-http on the input file above.

I've tried this on several other SPARQL implementations and these do not show this behavior.

Crash log:

none

Copy link

Thanks for reporting!

@rubensworks rubensworks added this to Triage in Maintenance Mar 12, 2024
@rubensworks rubensworks moved this from Triage to To Do (prio:high) in Maintenance Apr 19, 2024
@simonvbrae
Copy link
Contributor

As discussed with @rubensworks I will work on this issue.

@rubensworks rubensworks moved this from To Do (prio:high) to In Progress in Maintenance Apr 22, 2024
@simonvbrae
Copy link
Contributor

@jitsedesmet I was told to ping you for my questions related to the expressions code.

One thing that stuck out to me in the case where counts are incorrect is that startTerm (code) equals the subject of the current binding, rather than the predicate which we are supposed to be counting.

My question is what does startTerm mean?

@jitsedesmet
Copy link
Member

On the top of my head: start term is basically the value you'd return when put was never called.
So for the count aggregator, this would be 0, but the sample aggregator does not have a start term (a sample of nothing is simply an error (irc)).

I hope that helps. (I did not really read the issue, so if it didn't, just ping me again :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Maintenance
  
In Progress
Development

No branches or pull requests

4 participants