[ADR] Object Id #68

yordis · 2024-03-25T17:41:08Z

No description provided.

Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>

dallashall · 2024-04-09T12:06:32Z

src/adrs/4860595695/README.md

+More often than not, working with UUIDs, integers or any random characters can
+be cumbersome and error-prone. It can be hard to remember, debug, and integrate
+with other systems; since those IDs, most likely, do not carry any context about
+the object type.
+It is even harder for Business Analysts that deal with massive amounts of data
+from different sources, and they need to understand the data structure and
+relationships between objects.


I have a few thoughts here, and I think I'll start with my general opinion on IDs, and then give a little context.

My preferred way to ID most things (with the exception of events, themselves) is to use two fields: type: string; and id: ULID; (ULID spec, if you are interested, but TL;DR: its all caps, human-readable random characters that are ordered). I haven't been doing much relational data, lately, so even when I'm using something like PostgreSQL, I'm mainly storing a table of events and a table of views, and my queries to look up views look something like this if I know the ID:

SELECT data, id, type, version FROM views WHERE type = 'DETAILED_USER_POST' AND id = '01HV16H8QQHFTCKEM812V5DVD2';

And for my events table, where I don't always know the ID, it looks like:

SELECT data, id, type, version FROM events WHERE type IN [...list_of_types] AND time > [time I care about];

BUT, when I'm working with a shared process like Redis or some kind of messaging/event queue, I revert to a joined ID like you referenced from the Stripe API, but my IDs start looking like ACCOUNT:01HV16H8QQHFTCKEM812V5DVD2 (following the Redis convention) or ACCOUNT#01HV16H8QQHFTCKEM812V5DVD2 (DDB convention)

Of course, following a single-table design in DDB can really throw a wrench in the readability of the keys, since the row is not necessarily the object you want, but rather a piece of that object, and in the case of events there often ends up being a mess of different secondary keys that look something like:
EVENTS#SOMETHING_HAPPENED#01HV16H8QQHFTCKEM812V5DVD2 and EVENTS#01HV16H8QQHFTCKEM812V5DVD2#SOMETHING_HAPPENED

These days I'm mostly back in the SQL world, because it performs great, querying is dead-simple, and I can write as little or as much validation at the DB-level as I want. DDB was fun when I was dealing with many millions of events, but almost everywhere I've worked has ended up using a secondary database (Elastic/OpenSearch) as an escape hatch because they came up with a new query that could be answered with the existing tables, but the queries sucked to write in DDB form, and no one wanted to generate yet another set of composite keys and run a (don't call it a migration or you'll upset the AWS Gods) script to get the data in a better shape (time + $$$).

All that to say, the main reason I'm using a separate type and id field is because of the way I structure my SQL DB into two tables (with very similar schemas):

// I also normally have `actor_type` and `actor_id` stored at the top-level, but sometimes in the `data` column only type EventsTable = { /** ULID, lexigraphically sortable by insertion time */ id: string; /** The effective time of the event; Does not always match up with the insertion time recorded by the ULID */ time: DateTime; /** The event schema version. In a perfect world, it stays at 1 and never changes */ version: number; /** The type of the event. Past tense. */ type: string; /** The data contained by the event, structure determined by `type` + `version` */ data: jsonb; } type ViewsTable = { /** ULID, lexigraphically sortable by insertion time */ id: string; /** The latest time this view was regenerated, regeneration does not mean that anything in the `data` column actually changed, just that the view is up to date at that point in time */ updated: DateTime; /** The latest of the events that was related to the regeneration of this view */ latest_event_id: string; /** The latest time of the latest event that was related to the regeneration of this view */ latest_event_time: DateTime; /** The version of the data schema **/ version: number; /** The type of the view (could be anything, depending on the business needs) **/ type: string; /** The data of the view. Schema determined by `type` + `version` */ data: jsonb; }

A structure like this makes it easy to have type-level validations in my server code (whether typescript types or protobufs or whatever). So I can create a simple typescript type for the ID field (or just use string), and I can also use an enum or object as const to define the allowed types in the system, and even further (but not really related to this ADR) I can use the version and union types to really make the type system comfortable for daily coding.
In some of my databases, I define the enums in the postgres schema, as well, but (again, I'm lazy) I don't like running migrations, so 90% of the time I just use a TEXT type and call it a day.

More thoughts:

What I described above are probable more along the lines of implementation patterns for the systems themselves. I don't have anything against using something like account_01HV16H8QQHFTCKEM812V5DVD2 (or some kind of similar structure) for APIs for internal/external clients.

[ADR] Object Id

776c64a

Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>

dallashall reviewed Apr 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADR] Object Id #68

[ADR] Object Id #68

yordis commented Mar 25, 2024

dallashall Apr 9, 2024

[ADR] Object Id #68

Are you sure you want to change the base?

[ADR] Object Id #68

Conversation

yordis commented Mar 25, 2024

dallashall Apr 9, 2024

Choose a reason for hiding this comment

More thoughts: