Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using @reference for 1:n relationships #295

Open
phillipcurl opened this issue Jun 13, 2023 · 5 comments
Open

Using @reference for 1:n relationships #295

phillipcurl opened this issue Jun 13, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@phillipcurl
Copy link

phillipcurl commented Jun 13, 2023

Hi cruddl team! Thank you for all of your great work on this project. I apologize in advance for the length of this issue, but want to make sure I'm providing as much context as possible.

We currently have a scenario where we're leveraging the @relation directive for a number of 1:n relationships. This has given us a lot of flexibility in our API in terms of being able to fetch for entities in both directions of that relationship. What we're seeing as our collections have grown in size, though, is slow performance when those edges are leveraged, particularly when used in filtering.

Example of current schema:

type House @rootEntity {
  description: String
  people: [Person] @relation(inverseOf: "house")
}

type Person @rootEntity {
  name: String
  house: House @relation
}

We really like that this allows us to do both of the following queries:

query {
  allHouses {
    id
    description
    people {
      id
      name
    }
  }
  allPeople {
    id
    house {
      id
      description
    }
  }
}

Where this had led to slow performance for larger collections, is that querying for allPeople that are associated with a particular House.id generates the following AQL:

WITH houses
RETURN {
  "allPeople": (
    FOR v_person1
    IN people
    FILTER (FIRST((
      FOR v_node1 // this nested traversal is costly 
      IN OUTBOUND v_person1 people_house
      FILTER v_node1 != null
      RETURN v_node1
    ))._key IN ["some_id"])
    RETURN {
      "id": v_person1._key,
      "name": v_person1.`name`
    }
  )
}

The filtering in that query has been inefficient on large collections because of the graph traversal, regardless of how optimized the indices are. We know that we can solve this by leveraging @reference and bypassing the edge traversal when filtering in those scenarios. Updating the schema to:

type HouseWithRef @rootEntity {
  description: String
  uuid: String @key
}

type PersonWithRef @rootEntity {
  name: String
  houseUuid: String
  house: HouseWithRef @reference(keyField: "houseUuid")
}

and running a query like:

query AllPeople($filter: PersonWithRefFilter) {
  allPersonWithRefs(filter: $filter) {
    id
    name
    house {
      id
      description
  }
}

{
  "filter": {
    "houseUuid_in": ["some_id", "another_id"]
  }
}

generates the following AQL, which is much more efficient:

RETURN {
  "allPersonWithRefs": (
    FOR v_personWithRef1
    IN personWithRefs
    FILTER (v_personWithRef1.`houseUuid` IN ["some_id","another_id"])
    LET v_houseWithRef1 = (IS_NULL(v_personWithRef1.`houseUuid`) ? null : FIRST((
      FOR v_house1
      IN houseWithRefs
      FILTER ((v_house1.`uuid` > NULL) && (v_house1.`uuid` == v_personWithRef1.`houseUuid`))
      LIMIT 1
      RETURN v_house1
    )))
    RETURN {
      "id": v_personWithRef1._key,
      "name": v_personWithRef1.`name`,
      "house": (IS_NULL(v_houseWithRef1) ? null : {
        "id": v_houseWithRef1._key,
        "description": v_houseWithRef1.`description`
      })
    }
  )
}

The downside to doing this is that you lose the queryability in both directions that leveraging @relation provides (e.g. querying for people in a house). So finally my question, is there a way in cruddl to leverage @reference and generate the resolvers for resolving entities in the opposite direction of the reference - essentially a @reference-based version of @relation(inverseOf: "")? I know the modeling docs mention "In contrast to relations, it is however not possible to navigate from the referenced object to the referencing object", but I wanted to check with you all.

If there's not a way to do this in cruddl, have there been any discussions on adding support for it? This can be done in pure AQL and would really help improve the performance when using cruddl for things that are a 1:n relationship. From the Arango docs:

ArangoDB does not require you to store your data in graph structures with edges and vertices, you can also decide to embed attributes such as which groups a user is part of, or _ids of documents in another document instead of connecting the documents with edges. It can be a meaningful performance optimization for 1:n relationships, if your data is not focused on relations and you don’t need graph traversal with varying depth.

Finally, is there anything else that's lost when switching a relationship from @relation to @reference aside from being able to query in both directions?

Please let me know if there's any other context I can provide, and thank you again for all of your great work on this project!

@mfusser
Copy link
Contributor

mfusser commented Jun 19, 2023

Hi @phillipcurl,
A reference is supposed to be a more loosely coupled connection between two objects.
In that way it is more of a shortcut for "also find me an object with this key".
Adding a reference does not really change anything about the data in the database, it is just a feature of the API.
Adding a backlink would mean that an "inverseOf" field would need to be added and updated whenever necessary, which is why it would not really fit a reference (at least in my opinion).

The other thing you will loose by switching to a reference is the ability to add onDelete actions .

@Yogu Yogu added the enhancement New feature or request label Jun 27, 2023
@Yogu Yogu self-assigned this Jun 27, 2023
@Yogu
Copy link
Member

Yogu commented Jun 27, 2023

Hi @phillipcurl, thank you for explaining this issue on detail on call. I think inverse references are a valuable addition to cruddl.

I would propose this modelling syntax

type House @rootEntity {
  description: String
  uuid: String @key
  people: [Person] @reference(foreignKeyField: "houseUuid")
}

type Person @rootEntity {
  name: String
  houseUuid: String
  house: House @reference(keyField: "houseUuid")
}

foreignKeyField is the key field in Person, not the reference. In fact, I would allow people to exist without house.

Not quite sure about the name foreignKeyField. Is it understandable?

  • We could call it inverseOf to mirror @relation. However, You might expect that you specify a @relation field there, and not the key field.
  • Could also call it inverseKeyField. However, you're not really inverting anything if there is no reference on the other side.

References are intended to be weak links, i.e. they don't add any restrictions between the two objects. I still want to keep this property in place. Therefore, there will be a few restrictions compared to relations:

  • There's no equivalent to 1:1 relations. 1:1 relations work by enforcing uniqueness on both sides, which defeats the idea of loose coupling. -> as soon as you use foreignKeyField, your field needs to be a list.
  • onDelete: CASCADE and onDelete: RESTRICT also won't be supported.
  • The time-to-live feature won't be able to delete referenced objects (in either direction).

Do you think the feature would work for you with these restrictions?

@phillipcurl
Copy link
Author

Hi @Yogu! Thank you again for taking the time to chat and for the detailed response!

I think foreignKeyField makes sense. inverseKeyField might be a little more self-explanatory, given the existing inverseOf option, but I can understand how that might cause confusion since nothing is truly being inverted like you mentioned.

I also think all of those restrictions make sense and align with what I was expecting. I think I just have a couple questions around the expected functionality:

  • Will the inverse reference have the same robust filtering options available to the inverse relation?
    •   allHouses {
          id
          people(filter: {
            name_like: "%some name%"
          }) {
            id
          }
        }
  • Will the inverse reference have a _meta query associated with it to get the total count like inverse relations do?
    •   allHouses {
          id
          _peopleMeta(filter: {
            name_like: "%some name%"
          }) {
            count
          }
        }

Thank you again for your thoughts and work around this! Please let me know if there are any other details I can provide.

@Yogu
Copy link
Member

Yogu commented Jun 28, 2023

Thanks for your input on the name. Not quite sure about it yet.

Will the inverse reference have the same robust filtering options available to the inverse relation?

All filter options from allPeople will also be available on the inverse reference people. It will basically be the same like allPeople, just with a hidden AND person.houseUuid == house.uuid added.

You're using the term "robust" which leads me to another possible issue: indices. Currently, cruddl automatically creates indices on key fields. ArangoDB already has indices on _key and on _from / _to of edge collections. These indices combined ensure that all "simple" queries that don't use any sorting or filtering are efficient, including relation traversal and reference lookups. The inverse references need a new index, namely on the field that holds the reference value. Not sure if it's a good idea to add an index to the persons collection just because House adds a reference though (would introduce coupling).

Will the inverse reference have a _meta query associated with it to get the total count like inverse relations do?

Yes, will add a meta field, just like with collect and relation fields.

@phillipcurl
Copy link
Author

Sorry for the delay, and thank you for those details!

All filter options from allPeople will also be available on the inverse reference people.

That's great news - thank you! Indices is a really interesting point. As I've been denormalizing/duplicating/flattening some data, I've found myself adding composite indices for commonly used filtering patterns on an ad hoc basis. I would be in support of automatically adding the indices on the inverse reference. I can totally see your point around coupling, but I think when adding the inverse reference, I'm intentionally introducing coupling. Would be very interested to hear how your use cases or thoughts might differ on that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants