Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADR] Object Id #68

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

[ADR] Object Id #68

wants to merge 1 commit into from

Conversation

yordis
Copy link
Member

@yordis yordis commented Mar 25, 2024

No description provided.

Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Comment on lines +16 to +22
More often than not, working with UUIDs, integers or any random characters can
be cumbersome and error-prone. It can be hard to remember, debug, and integrate
with other systems; since those IDs, most likely, do not carry any context about
the object type.
It is even harder for Business Analysts that deal with massive amounts of data
from different sources, and they need to understand the data structure and
relationships between objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few thoughts here, and I think I'll start with my general opinion on IDs, and then give a little context.

My preferred way to ID most things (with the exception of events, themselves) is to use two fields: type: string; and id: ULID; (ULID spec, if you are interested, but TL;DR: its all caps, human-readable random characters that are ordered). I haven't been doing much relational data, lately, so even when I'm using something like PostgreSQL, I'm mainly storing a table of events and a table of views, and my queries to look up views look something like this if I know the ID:

SELECT data, id, type, version FROM views
  WHERE type = 'DETAILED_USER_POST'
  AND id = '01HV16H8QQHFTCKEM812V5DVD2';

And for my events table, where I don't always know the ID, it looks like:

SELECT data, id, type, version FROM events
  WHERE type IN [...list_of_types]
  AND time > [time I care about];

BUT, when I'm working with a shared process like Redis or some kind of messaging/event queue, I revert to a joined ID like you referenced from the Stripe API, but my IDs start looking like ACCOUNT:01HV16H8QQHFTCKEM812V5DVD2 (following the Redis convention) or ACCOUNT#01HV16H8QQHFTCKEM812V5DVD2 (DDB convention)

Of course, following a single-table design in DDB can really throw a wrench in the readability of the keys, since the row is not necessarily the object you want, but rather a piece of that object, and in the case of events there often ends up being a mess of different secondary keys that look something like:
EVENTS#SOMETHING_HAPPENED#01HV16H8QQHFTCKEM812V5DVD2 and EVENTS#01HV16H8QQHFTCKEM812V5DVD2#SOMETHING_HAPPENED

These days I'm mostly back in the SQL world, because it performs great, querying is dead-simple, and I can write as little or as much validation at the DB-level as I want. DDB was fun when I was dealing with many millions of events, but almost everywhere I've worked has ended up using a secondary database (Elastic/OpenSearch) as an escape hatch because they came up with a new query that could be answered with the existing tables, but the queries sucked to write in DDB form, and no one wanted to generate yet another set of composite keys and run a (don't call it a migration or you'll upset the AWS Gods) script to get the data in a better shape (time + $$$).

All that to say, the main reason I'm using a separate type and id field is because of the way I structure my SQL DB into two tables (with very similar schemas):

// I also normally have `actor_type` and `actor_id` stored at the top-level, but sometimes in the `data` column only
type EventsTable = {
  /** ULID, lexigraphically sortable by insertion time */
  id: string;
  /** The effective time of the event; Does not always match up with the insertion time recorded by the ULID */
  time: DateTime;
  /** The event schema version. In a perfect world, it stays at 1 and never changes */
  version: number;
  /** The type of the event. Past tense. */
  type: string;
  /** The data contained by the event, structure determined by `type` + `version` */
  data: jsonb;
}

type ViewsTable = {
  /** ULID, lexigraphically sortable by insertion time */
  id: string;
  /** The latest time this view was regenerated, regeneration does not mean that anything in the `data` column actually changed, just that the view is up to date at that point in time */
  updated: DateTime;
  /** The latest of the events that was related to the regeneration of this view */
  latest_event_id: string;
  /** The latest time of the latest event that was related to the regeneration of this view */
  latest_event_time: DateTime;
  /** The version of the data schema **/
  version: number;
  /** The type of the view (could be anything, depending on the business needs) **/
  type: string;
  /** The data of the view. Schema determined by `type` + `version` */
  data: jsonb;
}

A structure like this makes it easy to have type-level validations in my server code (whether typescript types or protobufs or whatever). So I can create a simple typescript type for the ID field (or just use string), and I can also use an enum or object as const to define the allowed types in the system, and even further (but not really related to this ADR) I can use the version and union types to really make the type system comfortable for daily coding.
In some of my databases, I define the enums in the postgres schema, as well, but (again, I'm lazy) I don't like running migrations, so 90% of the time I just use a TEXT type and call it a day.

More thoughts:

What I described above are probable more along the lines of implementation patterns for the systems themselves. I don't have anything against using something like account_01HV16H8QQHFTCKEM812V5DVD2 (or some kind of similar structure) for APIs for internal/external clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants