Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the expected behavior of serializing BigInt to JSON? #515

Open
gajus opened this issue Oct 5, 2023 · 6 comments
Open

What is the expected behavior of serializing BigInt to JSON? #515

gajus opened this issue Oct 5, 2023 · 6 comments
Labels

Comments

@gajus
Copy link
Owner

gajus commented Oct 5, 2023

given this test:

test.only('serializes BigInt to JSON', async (t) => {
  const pool = await createPool(t.context.dsn, {
    PgPool,
  });

  const result = await pool.oneFirst(sql.unsafe`
    SELECT json_object_agg('foo', ${BigInt(9_007_199_254_740_999n)}::bigint)
  `);

  t.is(result, /* ??? */);

  await pool.end();
});

What is the expected value of the result?

I would think it should be {foo: BigInt(9_007_199_254_740_999n)}, but I am not certain if that is even possible, nor would that be valid JSON.

I am leaning towards throwing an error.

Probably related to #513

@gajus gajus added the question label Oct 5, 2023
@AndrewJo
Copy link

AndrewJo commented Oct 6, 2023

I would think it should be {foo: BigInt(9_007_199_254_740_999n)}, but I am not certain if that is even possible, nor would that be valid JSON.

The above representation is a spec violation according to IETF RFC8259: The JavaScript Object Notation (JSON) Data Interchange Format.


I think you should handle it similar to how Date or RegExp is commonly serialized in JSON. The most common cases I've come across is for BigInt to be serialized into a string or into a plain number.

Serialization into strings

// POJO
{
  // Serialize into integer with n at the end
  commentId: 9_007_199_254_740_999n,

  // Serialize into ISO8601 format
  createdAt: new Date("2023-10-05T12:34:56"),

  // Serialize into regular expression that can be passed back in to RegExp constructor
  regEx: /abc[dD]/
}

// JSON
{
  "commentId": "9007199254740999n",
  "createdAt": "2023-10-05T12:34:56",
  "regEx": "abc[dD]"
}

Pros:

  • You don't violate RFC 8259 JSON spec.
  • By default, you don't get serialization/deserialization errors out of the box with all known JSON serialization/deserialization libraries across multiple languages.
  • If you're in control of the data contract, you can basically enforce the schema such that you can basically say, "BigInt types are always serialized into a string representation of integer with n at the end".

Cons:

  • Further type conversion needed; deserialization libraries will treat your fields as string type by default:
    new BigInt(json.commentId)
    new Date(json.createdAt)
    new RegExp(json.regEx)

Serialization into plain number

// POJO
{
  // Serialize into an integer
  commentId: 9_007_199_254_740_999n,

  // Serialize into UNIX epoch in seconds
  createdAt: new Date("2023-10-05T12:34:56")
}

// JSON
{
  "commentId": 9007199254740999,
  "createdAt": 1696509296,
}

Pros:

  • Technically not a violation of the JSON spec (IETF RFC 8259 - Section 6). See quote:

    This specification allows implementations to set limits on the range and precision of numbers accepted.

  • Clean representation without the need to added string quotes
  • Some JSON libraries that support larger integers have no problem parsing values larger than Number.MAX_SAFE_INTEGER:
    • Languages with support for larger integers generally have no problem although not applicable everywhere.
      • Built-in JSON parser for Python will parse it into a native 64-bit integer without errors
      • C++ libraries such as libjson are generally implemented more for interoperability and will not parse integers larger than 53 bits of precision (see: Stack Overflow – Does libjson support 64 bit int types?)
    • Some JavaScript/TypeScript libraries have support for serializing/deserializing numbers larger than Number.MAX_SAFE_INTEGER into BigInt automatically: (e.g. json-with-bigint)
    • You can also do it without any libraries if you override reviver/replacer arguments for JSON.parse/JSON.stringify functions.

Cons:

  • You may break compatibility with lot of existing libraries that depend on default JSON (de)serializer functionality. Specifically, the RFC 8259 warns:

    A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

    Note that when such software is used, numbers that are integers and are in the range [-(253)+1, (253)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.

@gajus
Copy link
Owner Author

gajus commented Oct 6, 2023

Thank you.

What do you think about just throwing an error if an attempt it made to serialize BigInt into JSON?, i.e. require that user cast the value to X before serialization, e.g.

await pool.oneFirst(sql.unsafe`
  SELECT json_object_agg('foo', ${BigInt(9_007_199_254_740_999n)}::bigint::text)
`);

The above is perfectly fine.

My thinking is that this way the handling of the value does not depend on Slonik and it is less likely to produce unexpected results.

The error message could include suggestions of how to cast the value.

@AndrewJo
Copy link

AndrewJo commented Oct 11, 2023

I think throwing an error may be a valid behavior because BigInt introduces a footgun that will take many devs by surprise if they're not keenly aware of the implementation details and differences of JavaScript's BigInt vs PostgreSQL's bigint. That being said, I think it should be well documented and Slonik should provide escape hatch for advanced developers who know what they're getting themselves into (i.e. should be an opt-in functionality) instead of blocking them from doing their work.

Just to highlight how error-prone handling BigInt can be, I need to remind myself that BigInt type in JavaScript is very different to bigint data type in PostgreSQL even though they have pretty much identical type names.

In PostgreSQL bigint is just a type alias for int8, or just a signed 8-byte/64-bit integer which has an upper bound and isn't arbitrarily infinite. In this case, it's 263 - 1 = 9,223,372,036,854,775,807.

However, BigInt in JavaScript theoretically has no upper bound (practically, Chrome defines a limit of billion bits, so we can infer that Node.js' V8 engine also has the same limitation). You can make a number arbitrarily large past the upper bound of a 64-bit signed integer. BigInt does not prevent you from creating a number that's many orders of magnitude higher such as 170,141,183,460,469,231,731,687,303,715,884,105,727 (2127 - 1) which is an integer with 39 digits.

Reading 64-bit integer out of PostgreSQL as BigInt is not a big deal but you can very easily overflow PostgreSQL's bigint data type by binding BigInt value that's larger than 263 - 1.

Therefore, the most analogous data type in PostgreSQL is not actually bigint, but rather numeric type. numeric is one of "user-defined precision" types that lets you specify up to 131,072 digits before the decimal point and up to 16,383 digits after the decimal point. I'm pretty sure for most intents and purposes, this is "unlimited upper bound". (Given that number of atoms in the observable universe is about 1082, or 83 digits before the decimal point.)


Basic rule of thumb should be:

  1. Use BigInt if selecting existing data out of PostgreSQL table with bigint column and there's no value binding anywhere:

    const schema = z.object({ id: z.bigint(), name: z.string() });
    const foo = pool.many(sql.type(schema)`SELECT id, name FROM foo`);
  2. Binding BigInt values should be done with caution:

    1. If binding into a query that contains bigint/int8 data type, some input validation is needed to make sure the value will always be less than or equal to 263 - 1. (e.g. z.bigint().lte(9_223_372_036_854_775_807n))

    2. Otherwise, type cast to ::numeric so that you don't run into integer overflows that's pretty much only detectable at runtime AFTER you deployed your code to production.

      For example, given this query:

      const biggerThan64bit = 9223372036854775808n;
      const schema = z.object({ biggerThan64bit: z.bigint() });
      const num = await pool.oneFirst(sql.type(schema)`SELECT ${biggerThan64bit}::bigint AS biggerThan64bit`);

      You'll get a query error like this:

      Query 1 ERROR: ERROR:  bigint out of range
      

      This query is perfectly valid:

      const biggerThan64bit = 9223372036854775808n;
      const schema = z.object({ biggerThan64bit: z.bigint() });
      const num = await pool.oneFirst(sql.type(schema)`SELECT ${biggerThan64bit}::numeric AS biggerThan64bit`);

@AndrewJo
Copy link

AndrewJo commented Oct 11, 2023

In raw psql client, PostgreSQL seems to take Option 2 and serialize into plain number for JSON:

SELECT json_build_object('biggerThan64bit', '9223372036854775809'::numeric) AS big_json;
-- Returns {"biggerThan64bit": 9223372036854775809}

So at least PL/pgSQL dialect seems to treat that as a valid JSON fwiw, regardless of whether this causes issues for node-postgres driver.

Executing in TablePlus:

image

@gajus
Copy link
Owner Author

gajus commented Oct 11, 2023

Very interesting. Learned a lot! Thank you

@AndrewJo
Copy link

You're welcome!

PS: Also forgot to mention numeric/decimal trades off performance for precision quite drastically so it should be used very sparingly for business use cases that require absolute precision (e.g. banking, finance transactions, etc.)

For most use cases, 53 bits of integer precision should be enough. Even if you use them for primary key id fields, it gives you over 9 quadrillion unique rows before you have to migrate to something else like GUIDs (in which case, either your business is a unicorn or your engineering team is doing something very wrong). For storing datetime as UNIX epoch, even at millisecond granularity, 53-bit integer can describe dates 232,000 years into the future (highly doubt humanity, let alone software written in JavaScript will survive that long).

All this is to say, engineers should think before they start introducing BigInt everywhere in the codebase because it should be a deliberate, educated decision after learning about the trade-offs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants