-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADR] Object Id #68
base: master
Are you sure you want to change the base?
[ADR] Object Id #68
Conversation
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
More often than not, working with UUIDs, integers or any random characters can | ||
be cumbersome and error-prone. It can be hard to remember, debug, and integrate | ||
with other systems; since those IDs, most likely, do not carry any context about | ||
the object type. | ||
It is even harder for Business Analysts that deal with massive amounts of data | ||
from different sources, and they need to understand the data structure and | ||
relationships between objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few thoughts here, and I think I'll start with my general opinion on IDs, and then give a little context.
My preferred way to ID most things (with the exception of events, themselves) is to use two fields: type: string;
and id: ULID;
(ULID spec, if you are interested, but TL;DR: its all caps, human-readable random characters that are ordered). I haven't been doing much relational data, lately, so even when I'm using something like PostgreSQL, I'm mainly storing a table of events
and a table of views
, and my queries to look up views look something like this if I know the ID:
SELECT data, id, type, version FROM views
WHERE type = 'DETAILED_USER_POST'
AND id = '01HV16H8QQHFTCKEM812V5DVD2';
And for my events table, where I don't always know the ID, it looks like:
SELECT data, id, type, version FROM events
WHERE type IN [...list_of_types]
AND time > [time I care about];
BUT, when I'm working with a shared process like Redis or some kind of messaging/event queue, I revert to a joined ID like you referenced from the Stripe API, but my IDs start looking like ACCOUNT:01HV16H8QQHFTCKEM812V5DVD2
(following the Redis convention) or ACCOUNT#01HV16H8QQHFTCKEM812V5DVD2
(DDB convention)
Of course, following a single-table design in DDB can really throw a wrench in the readability of the keys, since the row is not necessarily the object you want, but rather a piece of that object, and in the case of events there often ends up being a mess of different secondary keys that look something like:
EVENTS#SOMETHING_HAPPENED#01HV16H8QQHFTCKEM812V5DVD2
and EVENTS#01HV16H8QQHFTCKEM812V5DVD2#SOMETHING_HAPPENED
These days I'm mostly back in the SQL world, because it performs great, querying is dead-simple, and I can write as little or as much validation at the DB-level as I want. DDB was fun when I was dealing with many millions of events, but almost everywhere I've worked has ended up using a secondary database (Elastic/OpenSearch) as an escape hatch because they came up with a new query that could be answered with the existing tables, but the queries sucked to write in DDB form, and no one wanted to generate yet another set of composite keys and run a (don't call it a migration or you'll upset the AWS Gods) script to get the data in a better shape (time + $$$).
All that to say, the main reason I'm using a separate type
and id
field is because of the way I structure my SQL DB into two tables (with very similar schemas):
// I also normally have `actor_type` and `actor_id` stored at the top-level, but sometimes in the `data` column only
type EventsTable = {
/** ULID, lexigraphically sortable by insertion time */
id: string;
/** The effective time of the event; Does not always match up with the insertion time recorded by the ULID */
time: DateTime;
/** The event schema version. In a perfect world, it stays at 1 and never changes */
version: number;
/** The type of the event. Past tense. */
type: string;
/** The data contained by the event, structure determined by `type` + `version` */
data: jsonb;
}
type ViewsTable = {
/** ULID, lexigraphically sortable by insertion time */
id: string;
/** The latest time this view was regenerated, regeneration does not mean that anything in the `data` column actually changed, just that the view is up to date at that point in time */
updated: DateTime;
/** The latest of the events that was related to the regeneration of this view */
latest_event_id: string;
/** The latest time of the latest event that was related to the regeneration of this view */
latest_event_time: DateTime;
/** The version of the data schema **/
version: number;
/** The type of the view (could be anything, depending on the business needs) **/
type: string;
/** The data of the view. Schema determined by `type` + `version` */
data: jsonb;
}
A structure like this makes it easy to have type-level validations in my server code (whether typescript types or protobufs or whatever). So I can create a simple typescript type for the ID
field (or just use string
), and I can also use an enum
or object as const
to define the allowed types in the system, and even further (but not really related to this ADR) I can use the version and union types to really make the type system comfortable for daily coding.
In some of my databases, I define the enums in the postgres schema, as well, but (again, I'm lazy) I don't like running migrations, so 90% of the time I just use a TEXT
type and call it a day.
More thoughts:
What I described above are probable more along the lines of implementation patterns for the systems themselves. I don't have anything against using something like account_01HV16H8QQHFTCKEM812V5DVD2
(or some kind of similar structure) for APIs for internal/external clients.
No description provided.