Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total returns more values than expected (not applying distinct on count) #674

Open
Riya-900 opened this issue May 17, 2023 · 16 comments · May be fixed by #791
Open

Total returns more values than expected (not applying distinct on count) #674

Riya-900 opened this issue May 17, 2023 · 16 comments · May be fixed by #791
Assignees
Labels
question Further information is requested

Comments

@Riya-900
Copy link

Hi,

I have the tables Author and Books.

Author Book
Alice A
Alice B
Bob C

In my FastAPI endpoint, when paginating the select(Author).join(Books) I want to respond with a list that has 2 dicts:

  • Alice (with her two books)
  • Bob (with his one book)

It works great but the total is off. Instead of counting the unique Authors, it counts the total rows of the query.

Is there an option to fix that?

Perhaps relevant:

  • using FastAPI (+pydantic)
  • async PostgreSQL
@Riya-900 Riya-900 changed the title Total returns more values than expected (not applying distinct on count) Total returns more values than expected (not applying distinct on count) [question] May 17, 2023
@Riya-900 Riya-900 changed the title Total returns more values than expected (not applying distinct on count) [question] Total returns more values than expected (not applying distinct on count) May 17, 2023
@uriyyo uriyyo self-assigned this May 17, 2023
@uriyyo uriyyo added the question Further information is requested label May 17, 2023
@uriyyo
Copy link
Owner

uriyyo commented May 17, 2023

Hi @Riya-900,

Could you please change your query to smth like this:

select(Author).options(selectinload(Author.books))

@bnku
Copy link

bnku commented Jul 6, 2023

@Riya-900 Hello! Have you resolved this issue? If yes, then how did you do it?

@bnku
Copy link

bnku commented Jul 6, 2023

I passed the total_group_by parameter to the count_query function and called query.group_by(total_group_by) to ensure the correct calculation of total:

# file: fastapi_pagination/ext/sqlalchemy.py

def count_query(query: Select, *, use_subquery: bool = True, total_group_by=None) -> Select:
    query = query.order_by(None).options(noload("*"))

    if total_group_by:
        query = query.group_by(total_group_by)

    if use_subquery:
        return select(func.count()).select_from(query.subquery())

    return query.with_only_columns(  # noqa: PIE804
        func.count(),
        **{"maintain_column_froms": True},
    )

@uriyyo, perhaps this can help you improve your product.

@uriyyo
Copy link
Owner

uriyyo commented Jul 6, 2023

@bnku I guess the issue is with how you load relationships. I am not sure that we need to use group_by to get count.

@uriyyo
Copy link
Owner

uriyyo commented Jul 14, 2023

@Riya-900 Any updates, does this comment help you?

#674 (comment)

@lqmanh
Copy link

lqmanh commented Aug 4, 2023

@uriyyo In my use case, I really need to use .join(). How can I fix this issue?

@uriyyo
Copy link
Owner

uriyyo commented Aug 4, 2023

Hi @lqmanh,

Could you please show an example?

@lqmanh
Copy link

lqmanh commented Aug 15, 2023

@uriyyo Sorry for my late reply, but I'm doing some kind of 3-way joining like this:

class Status(DeclarativeBase):
    group_id: Mapped[UUID]

class Item(DeclarativeBase):
    status: Mapped[Status] = relationship(...)
    documents: Mapped[list[Document]] = relationship(...)

class Document(DeclarativeBase):
    item_id: Mapped[UUID]
    status_group_id: Mapped[UUID]


stmt = (
    select(Item)
    .join(Item.status)
    .join(Item.documents)
    .where(Status.group_id == Document.status_group_id)  # only include documents of current status group
    .options(
        contains_eager(Item.status),
        contains_eager(Item.documents),
    )
)

UPDATED: Can fastapi-pagination support distinct count, by passing column(s) to paginate()? paginate(..., count_distinct=Item.id) for example.

@uriyyo
Copy link
Owner

uriyyo commented Aug 16, 2023

@lqmanh Could you try to change your query to:

stmt = (
    select(Item)
    .join(Item.status)
    .join(Item.documents)
    .options(
        contains_eager(Item.status),
        selectinload(Item.documents),  # use selectinload() to load all documents
    )
)

@lqmanh
Copy link

lqmanh commented Aug 16, 2023

@uriyyo Then how do I filter only documents associated with current status group?
Ah sorry, misread your reply. Using selectinload or joinedload independently won't work anyway

@uriyyo
Copy link
Owner

uriyyo commented Aug 16, 2023

@lqmanh If you have your relationship configured correctly, it will be done automatically

@uriyyo
Copy link
Owner

uriyyo commented Aug 16, 2023

@lqmanh Could you please config like this:

from __future__ import annotations

from operator import and_
from typing import Any
from uuid import UUID, uuid4

from sqlalchemy import ForeignKey, create_engine, select
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    joinedload,
    mapped_column,
    relationship,
    sessionmaker,
)

from fastapi_pagination import Page, Params, set_page
from fastapi_pagination.ext.sqlalchemy import paginate

engine = create_engine("sqlite:///:memory:", echo=True)


class Base(DeclarativeBase):
    pass


class Status(Base):
    __tablename__ = "statuses"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))

    group_id: Mapped[UUID] = mapped_column()

    item_documents: Mapped[list[Document]] = relationship(
        primaryjoin=lambda: and_(
            Document.item_id == Status.item_id,
            Document.status_group_id == Status.group_id,
        ),
        foreign_keys=[item_id, group_id],
        viewonly=True,
        uselist=True,
    )


class Item(Base):
    __tablename__ = "items"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    status: Mapped[Status] = relationship()
    documents: Mapped[list[Document]] = relationship()


class Document(Base):
    __tablename__ = "documents"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))
    status_group_id: Mapped[UUID]


Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)

with Session() as session:
    group_id_1 = uuid4()
    group_id_2 = uuid4()

    session.add_all([
        Item(
            status=Status(
                group_id=group_id_1,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
            ],
        ),
        Item(
            status=Status(
                group_id=group_id_2,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
                Document(status_group_id=group_id_2),
            ],
        ),
    ])
    session.commit()

with set_page(Page[Any]), Session() as session:
    stmt = (
        select(Item)
        .options(joinedload(Item.status).joinedload(Status.item_documents))
    )

    page = paginate(session, stmt, Params())

    print(page)

@uriyyo
Copy link
Owner

uriyyo commented Aug 16, 2023

@lqmanh Or you can try this config:

from __future__ import annotations

from operator import and_
from typing import Any
from uuid import UUID, uuid4

from sqlalchemy import ForeignKey, create_engine, select
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    joinedload,
    mapped_column,
    relationship,
    sessionmaker,
)

from fastapi_pagination import Page, Params, set_page
from fastapi_pagination.ext.sqlalchemy import paginate

engine = create_engine("sqlite:///:memory:", echo=True)


class Base(DeclarativeBase):
    pass


class Status(Base):
    __tablename__ = "statuses"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))

    group_id: Mapped[UUID] = mapped_column()


class Item(Base):
    __tablename__ = "items"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    status: Mapped[Status] = relationship()
    documents: Mapped[list[Document]] = relationship()

    status_documents: Mapped[list[Document]] = relationship(
        secondary=Status.__table__,
        primaryjoin=lambda: Status.item_id == Item.id,
        secondaryjoin=lambda: and_(
            Document.item_id == Status.item_id,
            Document.status_group_id == Status.group_id,
        ),
        viewonly=True,
    )


class Document(Base):
    __tablename__ = "documents"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))
    status_group_id: Mapped[UUID]


Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)

with Session() as session:
    group_id_1 = uuid4()
    group_id_2 = uuid4()

    session.add_all([
        Item(
            status=Status(
                group_id=group_id_1,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
            ],
        ),
        Item(
            status=Status(
                group_id=group_id_2,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
                Document(status_group_id=group_id_2),
            ],
        ),
    ])
    session.commit()

with set_page(Page[Any]), Session() as session:
    stmt = (
        select(Item)
        .options(
            joinedload(Item.status),
            joinedload(Item.status_documents),
        )
    )

    page = paginate(session, stmt, Params())

    print(page)

@lqmanh
Copy link

lqmanh commented Aug 17, 2023

@uriyyo Looks nice, but it won't work without setting Status.item_id, which isn't true because Status-Item is one-to-many

@lqmanh
Copy link

lqmanh commented Aug 17, 2023

Actually, I managed to solve this issue by applying distinct() to count_query():

def count_query(query: Select, *, use_subquery: bool = True) -> Select:
    query = query.order_by(None).options(noload("*"))

    if use_subquery:
        return select(func.count()).select_from(query.distinct().subquery())

    return query.distinct().with_only_columns(  # noqa: PIE804
        func.count(),
        **{"maintain_column_froms": True},
    )

Do you think it works for all cases?

@uriyyo
Copy link
Owner

uriyyo commented Nov 25, 2023

Hi all,

Sorry for the long response.

Unfortunately, we can't fix this issue using distinct count cause it will break the way pagination works. You need to play around with the way you fetch relationships, it should help to solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants