WIP - Add cursor-based stable ID pagination #357

richmolj · 2021-05-03T14:03:58Z

This code is more for reference than review. The intent of this PR is to increase transparency and collaboration so we can get input from the community on the best direction to head - as with everything, there are tradeoffs.

The Why

This JSON:API "Cursor Pagination" Profile explains one use case. The other is that we're in the process of adding GraphQL support to Graphiti, and cursors are more common in the GraphQL world.

The What

There are really two ways to do cursor-based pagination, and the JSON:API proposal only considers one. We can do this will offset-based cursors, or "stable IDS". This is best explained by this post on Stable Relation Connections in GraphQL-Pro. Vanilla graphql-ruby does offset-based, Pro adds support for stable IDs.

The third thing to consider is omitting cursors and supporting the page[offset] parameter. This is the simplest thing that supports the most use cases, though downsides are noted above.

The How

This PR implements stable IDs. That means we need to tell Graphiti which attributes are unique, incrementing keys. I've defaulted this to id, though it's notable that will need to be overridden if using non-incrementing UUIDs. So, the code:

class EmployeeResource < ApplicationResource
  # Tell Graphiti we are opting-in to cursor pagination
  self.cursor_paginatable = true

  # Tell Graphiti this is a stable ID that can be used as cursor
  sort :created_at, :datetime, cursor: true

  # Override the default cursor from 'id' to 'created_at'
  self.default_cursor = :created_at
end

(NB: One reason to have cursor_paginatable a separate flag from default_cursor is so ApplicationResource can say "all resources should support cursor pagination" but then throw helpful errors if id is a UUID or we elsewise don't have a default stable cursor defined)

This will cause us to render base64 encoded cursors:

{

  data: [
    {
      id: "10",
      type: "employees",
      attributes: { ... },
      meta: { cursor: "abc123" }
    }
  ]
}

Which can be used as offsets with ?page[after]=abc123. This would by default cause the SQL query SELECT * FROM employees WHERE id > 10. So far so good.

The client might also pass a sort. So sort=-id (ID descending) would cause the reverse query ...where id < 10.

A little trickier: the client might pass a sort on an attribute that is not the cursor. This is one reason we want to flag the sorts with cursor: true - so the user can

sort :created_at, cursor: true

Then call

?sort=created_at

Which will then use created_at as the cursor which means

?sort=created_at&page[after]=abc123

Will fire

SELECT * from employees where created_at > ? order by created_at asc

OK but now let's say the user tries to sort on something that ISN'T a
stable ID:

?sort=age

Under-the-hood we will turn this into a multi-sort like age,id. Then when paging after the SQL would be something like:

SELECT * FROM employees
WHERE age > 40 OR (age = 40 AND id > 10)
ORDER BY age ASC, id ASC

Finally, we need to consider datetime. By default we render to the second (e.g. created_at.iso8601) but to be a stable ID we need to render to nanosecond precision (created_at.iso8601(6)). So the serializer block will be honored, even if the attribute is unreadable:

attribute :timestamp, :datetime, readable: false do
  @object.created_at
end

But we override the typecasting logic that would normally call .iso8601 and instead call .iso8601(6). For everything else we omit typecasting entirely since cursors should be referencing "raw" values and there is no need to case to and fro.

There are three downsides to stable ID cursor pagination:

The developer needs to know all this, and has a little more work to do (like specifying cursor: true).
The OR logic above would be specific to the ActiveRecord adapter. Adapters in general do not have an OR concept, and I'm not sure there is a good general-purpose one. This means stable IDs only work when
limiting sort capabilities and/or only using ActiveRecord.
The before cursor is complicated. We need to reverse the direction of the clauses, then re-reverse the records in memory. See SQL Option 2: Double Reverse. This feels like it might be buggy down the line.

For these reasons I propose:

Default to offset-based cursors
Opt-in to stable IDs by specifying .default_cursor
This all goes through a cursor_paginate adapter method (alternative is we can call the relevant adapter methods like filter_integer_gt and introduce an "OR" concept, but this feels shaky). Implementing this will be non-trivial for non-AR datastores.
This means adapters need an "offset" concept, likely in the current paginate method as another yielded arg.

This seems to give us the best balance of ease-of-use (offset-based) and opt-in power (stable-id-based). It also allows us to more easily say "all endpoints implement cursor-based pagination" (so we avoid having to remember to specify cursors in all resources).

But, there are enough moving pieces here this is usually where I stop and get advice from people smarter than me. This is often @wadetandy but also includes basically anyone reading this PR. So, what are your thoughts?

timbogit · 2021-05-03T15:44:41Z

lib/graphiti/adapters/abstract.rb

@@ -255,6 +255,10 @@ def paginate(scope, current_page, per_page)
        raise "you must override #paginate in an adapter subclass"
      end

+      def cursor_paginate(scope, current_page, per_page)


Nitpick: arguments should probably be called after and size?

Whoops, yep good catch - force-pushed 👍

timbogit · 2021-05-03T15:55:55Z

lib/graphiti/scoping/paginate.rb

@@ -1,13 +1,16 @@
 module Graphiti
  class Scoping::Paginate < Scoping::Base
    DEFAULT_PAGE_SIZE = 20
+    VALID_QUERY_PARAMS = [:number, :size, :before, :after]


QQ: I don't see :before implemented here thus far, correct? Is adding that query param for a later stage of this PR?

Yeah, I have some before code locally but removed it before pushing up (and forgot this).

If you have IDs 1-100 and ordering ID asc, you end up saying select * from employees where id < 100 limit 1 which gives you 1 instead of 99. The linked solution is in the PR (SQL query the opposite order then reverse in-memory) but that was a little more than I had time for at the moment - and I think it's an open question if this is "worth it" with the extra complexity, or if we should only implement offset-based for now - so pulled it out.

@wadetandy

This code is more for reference than review. The intent of this PR is to increase transparency and collaboration so we can get input from the community on the best direction to head - as with everything, there are tradeoffs. [This JSON:API "Cursor Pagination" Profile](https://jsonapi.org/profiles/ethanresnick/cursor-pagination/) explains one use case. The other is that we're in the process of adding GraphQL support to Graphiti, and cursors are more common in the GraphQL world. There are really two ways to do cursor-based pagination, and the JSON:API proposal only considers one. We can do this will offset-based cursors, or "stable IDS". This is best explained by [this post on Stable Relation Connections](https://graphql-ruby.org/pagination/stable_relation_connections) in GraphQL-Pro. Vanilla `graphql-ruby` does offset-based, Pro adds support for stable IDs. The third thing to consider is omitting cursors and supporting the `page[offset]` parameter. This is the simplest thing that supports the most use cases, though downsides are noted above. This PR implements stable IDs. That means we need to tell Graphiti which attributes are unique, incrementing keys. I've defaulted this to `id`, though it's notable that will need to be overridden if using non-incrementing UUIDs. So, the code: ```ruby class EmployeeResource < ApplicationResource # Tell Graphiti we are opting-in to cursor pagination self.cursor_paginatable = true # Tell Graphiti this is a stable ID that can be used as cursor sort :created_at, :datetime, cursor: true # Override the default cursor from 'id' to 'created_at' self.default_cursor = :created_at end ``` (*NB: One reason to have `cursor_paginatable` a separate flag from `default_cursor` is so `ApplicationResource` can say "all resources should support cursor pagination" but then throw helpful errors if `id` is a UUID or we elsewise don't have a default stable cursor defined*) This will cause us to render base64 encoded cursors: ```ruby { data: [ { id: "10", type: "employees", attributes: { ... }, meta: { cursor: "abc123" } } ] } ``` Which can be used as offsets with `?page[after]=abc123`. This would by default cause the SQL query `SELECT * FROM employees WHERE id > 10`. So far so good. The client might also pass a sort. So `sort=-id` (ID descending) would cause the reverse query `...where id < 10`. A little trickier: the client might pass a sort on an attribute that is not the cursor. This is one reason we want to flag the sorts with `cursor: true` - so the user can ```ruby sort :created_at, cursor: true ``` Then call ``` ?sort=created_at ``` Which will then use `created_at` as the cursor which means ``` ?sort=created_at&page[after]=abc123 ``` Will fire ``` SELECT * from employees where created_at > ? order by created_at asc ``` OK but now let's say the user tries to sort on something that ISN'T a stable ID: ``` ?sort=age ``` Under-the-hood we will turn this into a multi-sort like `age,id`. Then when paging `after` the SQL would be something like: ``` SELECT * FROM employees WHERE age > 40 OR (age = 40 AND id > 10) ORDER BY age ASC, id ASC ``` Finally, we need to consider `datetime`. By default we render to the second (e.g. `created_at.iso8601`) but to be a stable ID we need to render to nanosecond precision (`created_at.iso8601(6)`). So the serializer block will be honored, even if the attribute is unreadable: ```ruby attribute :timestamp, :datetime, readable: false do @object.created_at end ``` But we override the typecasting logic that would normally call `.iso8601` and instead call `.iso8601(6)`. For everything else *we omit typecasting entirely* since cursors should be referencing "raw" values and there is no need to case to and fro. There are three downsides to stable ID cursor pagination: * The developer needs to know all this, and has a little more work to do (like specifying `cursor: true`). * The `OR` logic above would be specific to the `ActiveRecord` adapter. Adapters in general do not have an `OR` concept, and I'm not sure there is a good general-purpose one. This means stable IDs only work when limiting sort capabilities and/or only using `ActiveRecord`. * The `before` cursor is complicated. We need to reverse the direction of the clauses, then re-reverse the records in memory. See [SQL Option 2: Double Reverse](https://blog.reactioncommerce.com/how-to-implement-graphql-pagination-and-sorting/). This feels like it might be buggy down the line. For these reasons I propose: * Default to offset-based cursors * Opt-in to stable IDs by specifying `.default_cursor` * This all goes through a `cursor_paginate` adapter method (alternative is we can call the relevant adapter methods like `filter_integer_gt` and introduce an "OR" concept, but this feels shaky). Implementing this will be non-trivial for non-AR datastores. * This means adapters need an "offset" concept This seems to give us the best balance of ease-of-use (offset-based) and opt-in power (stable-id-based). It also allows us to more easily say "all endpoints implement cursor-based pagination" (so we avoid having to remember to specify cursors in all resources). But, there are enough moving pieces here this is usually where I stop and get advice from people smarter than me. This is often @wadetandy but also includes basically anyone reading this PR. So, what are your thoughts?

gaustin-salesforce · 2021-05-04T14:46:29Z

From my perspective (i.e., implementing a graphql API that consumes JSON-APIs), and given my team's particular requirements, as long as we have a cursor-like implementation it doesn't matter to us whether it's offset-oriented or stable ids. This PR gives us what we need to remove some expensive/complicated cursor-emulation code.

wadetandy · 2021-05-04T15:43:02Z

as long as we have a cursor-like implementation it doesn't matter to us whether it's offset-oriented or stable ids

This is a really good point from a usability perspective. Maybe it makes sense to build a default implementation which is encoding the existing offset behavior into a cursor, and then each adapter can customize the behavior to provide an actual stable cursor as needed?

richmolj · 2021-05-04T15:53:13Z

Maybe it makes sense to build a default implementation which is encoding the existing offset behavior into a cursor

Yep, this is my plan right now. I think we should do both, but should default to offset-based. If the developer specifies self.default_cursor = :id then they opt-in to stable IDs.

timbogit reviewed May 3, 2021

View reviewed changes

richmolj force-pushed the cursor_pagination branch from e935a2c to caaf203 Compare May 3, 2021 16:07

richmolj mentioned this pull request Apr 14, 2022

Support for arbitrary pagination strategy #112

Closed

jkeen marked this pull request as draft February 27, 2024 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Add cursor-based stable ID pagination #357

WIP - Add cursor-based stable ID pagination #357

richmolj commented May 3, 2021

timbogit May 3, 2021

richmolj May 3, 2021

timbogit May 3, 2021

richmolj May 3, 2021

gaustin-salesforce commented May 4, 2021

wadetandy commented May 4, 2021

richmolj commented May 4, 2021

WIP - Add cursor-based stable ID pagination #357

Are you sure you want to change the base?

WIP - Add cursor-based stable ID pagination #357

Conversation

richmolj commented May 3, 2021

The Why

The What

The How

timbogit May 3, 2021

Choose a reason for hiding this comment

richmolj May 3, 2021

Choose a reason for hiding this comment

timbogit May 3, 2021

Choose a reason for hiding this comment

richmolj May 3, 2021

Choose a reason for hiding this comment

gaustin-salesforce commented May 4, 2021

wadetandy commented May 4, 2021

richmolj commented May 4, 2021