Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this repository still active? #55

Open
ultimaweapon opened this issue Apr 1, 2021 · 4 comments
Open

Is this repository still active? #55

ultimaweapon opened this issue Apr 1, 2021 · 4 comments

Comments

@ultimaweapon
Copy link

Just wonder due to there are a lot of opened PRs.

@amcgregor
Copy link

I also question this, and am sad I missed the initial development of this idea. Most of the PRs appear to be language-specific implementations, or references to such. The developer has been otherwise active, which relieves me a bit.

In my toolbox, this has been an already-solved problem for years using coordination-free ObjectIDs—link to my slightly more functional clean–room reimplementation—which also offers additional room for taint/tracing/origin information. Plus all of the other goodness of replacing a creation time field, being range filterable and sortable, and so on. For more compact than hexadecimal representation (which would be 24 characters), I use HHC, treating the ObjectID as a 96-bit integer.

The timestamp not being millisecond accurate is resolved by the inclusion of a per-process counter with random IV, but does harm replacement of a creation time field if milliseconds are required. Two generations within the same second from the same process on the same machine will have unique counters.

  • Binary: 12 bytes. (Might not look smaller than 26 bytes, but it is.)
    b'^\x00[\xc9\x9dy\xebk\xd5\xb7P\x96'

  • Raw BSON: 15 bytes minimum. 1 identification byte + null-terminated key name + 12 byte compact identifier struct.

  • HHC: 16 bytes.
    'CoaE_HfG_uPXUI_d'

  • Base64: 16 bytes. This example is lucky, there are no URL-unsafe characters like + being generated.
    'XgBbyZ1562vVt1CW'

  • Hexadecimal: 24 bytes.
    '5e005bc99d79eb6bd5b75096'

  • JSON: 36 bytes as a hex-encoded string within a compound object with type-identifying key name.
    '{"$oid": "5e005bc99d79eb6bd5b75096"}'

I'm… too often "that asshole", but I have to ask: why is? Kudos for formalizing a specification, though, independent of a specific use case.

@ultimaweapon
Copy link
Author

Thanks for sharing. For me it can be anything that meets my requirements. In the first place I was considered to use Twitter Snowflake but it required machine identifier, which is not container friendly. I don't remember how I found ULID but it meet all my requirements.

@amcgregor
Copy link

@ultimaweapon The clean-room ObjectID implementation I linked (already-solved problem) allows for all official forms of "machine identifier" generation from MongoDB, their modern "random identifier on startup" approach, as well as implements hardware MAC hashing (last byte used to XOR all prior bytes of the MAC) and fully custom identifiers.

In virtual machine cases (i.e. containers) you have complete control over the MAC, hostname, and absolutely can specify the internal identifier explicitly. From the module docstring:

To determine which approach is used for generation, specify the hwid keyword argument to the ObjectID() constructor. Possibilities include:

  • The string legacy: use the host name MD5 substring value and process ID. Note if FIPS compliance is enabled, the md5 hash will literally be unavailable for use, resulting in the inability to utilize this choice.
  • The string fips: use the FIPS-compliant FNV hash of the host name, in combination with the current process ID. Requires the fnv package be installed.
  • The string mac: use the hardware MAC address of the default interface as the identifier. Because a MAC address is one byte too large for the field, the final byte is used to XOR the prior ones.
  • The string random: pure random bytes, the default, aliased as modern.
  • Any 5-byte bytes value: use the given HWID explicitly.

You are permitted to add additional entries to this mapping within your own application, if desired.

One potential use is in client-side identifier generation. Each user may be given a HWID for this purpose, permitting auditing of which records were populated by which users—one possible use. The variants using an actual machine identifier of some kind are useful for auditing of server-side behavior.

@ultimaweapon
Copy link
Author

Thanks for information. My application don't need extra information in the identifier so ULID is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants