Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape \0 in identifiers and literals #2900

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

benjie
Copy link
Sponsor Contributor

@benjie benjie commented Jan 19, 2023

Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.)
-- https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS (emphasis mine)

Current escaping was ported from the PostgreSQL source code, wherein \0 cannot occur in a string (because C). However, it can in JS.

Currently if you escapeIdentifier with a \0 in the string, no error is thrown, but when you run the query an 'invalid message format' error is thrown:

const pg = require('pg');
const p = new pg.Pool({connectionString: 'test'});
p.query('select 1 as ' + p.Client.prototype.escapeIdentifier('"one\0"')).then(a=>a, e=>e).then(console.log.bind(console))
> error: invalid message format
    at Parser.parseErrorMessage (.../node_modules/pg-protocol/dist/parser.js:287:98)
    at Parser.handlePacket (.../node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (.../node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (.../node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:520:28)
    at Socket.emit (node:domain:537:15)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  length: 81,
  severity: 'ERROR',
  code: '08P01',
  detail: undefined,
  hint: undefined,
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'pqformat.c',
  line: '640',
  routine: 'pq_getmsgend'
}

I don't think any legitimate code would ever trigger this, but it does feel like escapeIdentifier should ensure that independent of the input, no PostgreSQL protocol errors should be raised. To address this with minimal impact on performance, I've simply had \0 be replaced with ". There are definitely other options for handling it, but I don't think any of them are worth the trade-off in performance. Since we already have a loop in escapeLiteral I just had \0 stripped from there as if it didn't exist.

Thanks for pg ❤️

@charmander
Copy link
Collaborator

This should probably throw an (early) error instead of silently changing the data, similar to what text parameters do (right?).

@benjie
Copy link
Sponsor Contributor Author

benjie commented Jan 23, 2023

We can certainly do that, but it would be more expensive and given I doubt anyone has hit this issue so far (ever!) it feels undesirable to make everyone incur this performance cost just in case someone at some time ever tries to put \0 into an identifier or literal.

@benjie
Copy link
Sponsor Contributor Author

benjie commented Mar 11, 2023

CI failed for a reason seemingly unrelated to this PR, so I've pushed a bump commit.

@charmander
Copy link
Collaborator

Can you quantify the performance cost?

As an alternative, E'\x00' could work for the escapeLiteral case, reusing the way backslashes are escaped.

@benjie
Copy link
Sponsor Contributor Author

benjie commented Mar 13, 2023

Can you quantify the performance cost?

I was thinking about escapeIdentifier and adding a check before running the regex, which would mean looping over the string twice (both via regexp). But of course we can throw an error in the escapeLiteral case with no overhead 🤦‍♂️


Almost 5 years ago, the escapeIdentifier logic was changed from a loop to a regexp:

00d749c

On that commit you'll see there's a comment from me in 2019 where I benchmarked to find that this was a 11x performance gain. Sadly jsperf has been broken for years, and of course JS performance has changed since then, so I've built a new benchmark for three implementations:

https://jsbench.me/bblf6kjk4z/2

function escapeIdentifierLoop(str) {
  var escaped = '"'

  for (var i = 0; i < str.length; i++) {
    var c = str[i]
    if (c === '"') {
      escaped += c + c
    } else if (c === '\0') {
      throw new Error("\\0 forbidden");
    } else {
      escaped += c
    }
  }

  escaped += '"'

  return escaped
}

function escapeIdentifierRegexpAndError(str) {
  if (/\0/.test(str)) throw new Error("\\0 forbidden");
  return '"' + str.replace(/"/g, '""') + '"'
}

function escapeIdentifierRegexpAndInvisibleSub(str) {
  return '"' + str.replace(/["\0]/g, '""') + '"'
}

function escapeIdentifierUnsafe(str) {
  return '"' + str.replace(/"/g, '""') + '"'
}

const strings = [];
const escaped = [];

for (let i = 0; i < 100_000; i++) {
	strings.push(String(i) + "some_identifier" + String(i));
}

(Note: none of the benchmarked strings contain \0 because no reasonable strings in production will include that. We only care about the performance of reasonable strings.)

The result in Chrome is that my original proposal is indeed fastest, but only by ~25% (6.9 Mops/s vs 5.5 Mops/s) - I was really expecting a prior check to double the time but of course it's ready only so I guess it has less overhead. Even two regexps is still significantly faster than replacing the regexps with a loop (~1.8 Mops/s).

So I think the performance overhead of checking and throwing an error is probably reasonable, so I've switched over to that. Every call to escapeIdentifier will now be ~29% slower (depending on the length of the input). Or I can go back to the bad substitution fix which would make escapeIdentifier just 9% slower but is less correct. Either way, we should do something about \0 :)

Copy link
Collaborator

@charmander charmander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style suggestions, mostly for consistency

packages/pg/lib/client.js Outdated Show resolved Hide resolved
packages/pg/lib/client.js Outdated Show resolved Hide resolved
packages/pg/lib/client.js Outdated Show resolved Hide resolved
Co-authored-by: Charmander <~@charmander.me>
Copy link
Collaborator

@charmander charmander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this!

@charmander charmander requested a review from brianc March 13, 2023 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants