Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow char and varchar cases to fall through when creating parameters #7933

Conversation

Ceshion
Copy link
Contributor

@Ceshion Ceshion commented Jul 19, 2021

Description of change

When we are converting MS-SQL parameters from typeorm's to native, we allow the varchar case to fall through to nvarchar, and char to nchar, in order to correctly enable saving strings to columns of those types that contain characters longer than one byte.

Fixes #7932

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run lint passes with this change
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits follow conventions explained in CONTRIBUTING.md

@Ceshion Ceshion force-pushed the prevent-char-and-varchar-characters-from-truncating branch from 8e38fe7 to 73816ff Compare July 19, 2021 17:37
Ceshion added a commit to Ceshion/typeorm that referenced this pull request Jul 19, 2021
…ns retains data

add tests to ensure that using nchar and nvarchar parameters will correctly store data,
and that the actual underlying data type has not changed.

Tests typeorm#7933
Ceshion added a commit to Ceshion/typeorm that referenced this pull request Jul 19, 2021
…or var/char throws

ensure that we will fail loudly when an attempt is made to store a string that cannot be
converted to the underlying type because it has too-long characters

Tests typeorm#7933
@imnotjames
Copy link
Contributor

This seems like a somewhat "nuclear" option, don't you think?

This ends up doubling the storage space of every text field & halving the maximum size. Why not use the nvarchar type instead in your entity?

Also, this saves us on the two-byte unicode case. What about 3 or 4 byte emoji? (I don't know the width of the example emoji you used)

@Ceshion
Copy link
Contributor Author

Ceshion commented Jul 20, 2021

This seems like a somewhat "nuclear" option, don't you think?

This ends up doubling the storage space of every text field & halving the maximum size. Why not use the nvarchar type instead in your entity?

Also, this saves us on the two-byte unicode case. What about 3 or 4 byte emoji? (I don't know the width of the example emoji you used)

I see the concern, but given this only changes the data type for parameters it would only even potentially have an impact on data length in transit, would it not? I believe it would not even change that, while there may be a performance overhead to using them when not needed, and a solution with more nuance (i.e. pre-checking the string for non-ascii characters and discriminating on that fact) may be more performant, but I have not checked. That kind of check would itself cause overhead of course, and I am trusting that the developers of ADO and JDBC have already taken that into account when choosing this approach for their own work (referencing the tedious issue linked in this PR's linked issue). Do you think we should re-examine it?

The emoji that I included in the test should both be 3-byte characters, and appear to throw an error on storage like they should- although SQL Server can (and will soon) begin storing those in char and varchar columns in some specific situations anyway, so shouldn't we leave determining what could or couldn't be stored in general to the database itself?

@Ceshion Ceshion force-pushed the prevent-char-and-varchar-characters-from-truncating branch 2 times, most recently from 1b22bbe to a7ab0c9 Compare July 22, 2021 18:16
…ns retains data

add tests to ensure that using nchar and nvarchar parameters will correctly store data,
and that the actual underlying data type has not changed.

test: verify that storing strings with characters that are too wide for var/char throws

ensure that we will fail loudly when an attempt is made to store a string that cannot be
converted to the underlying type because it has characters that are not in the codepage
used by the database/table/column

Validates typeorm#7932
@Ceshion Ceshion force-pushed the prevent-char-and-varchar-characters-from-truncating branch from a7ab0c9 to 870ac67 Compare July 22, 2021 18:17
…de equivalent

this allows the server to handle mapping extended ASCII characters into the correct
codepage for a given column without needing to pre-check collation settings

Fixes typeorm#7932
@Ceshion Ceshion force-pushed the prevent-char-and-varchar-characters-from-truncating branch from ca57c5b to 22048d3 Compare July 23, 2021 13:07
Copy link
Contributor

@imnotjames imnotjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this isn't changing the way we way we're persisting on disk & only in-transit this seems reasonable.

I also happy to defer to the JDBC folks, too.

@imnotjames imnotjames merged commit 62d7976 into typeorm:master Jul 31, 2021
@Ceshion Ceshion deleted the prevent-char-and-varchar-characters-from-truncating branch July 31, 2021 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

non-ascii characters assigned to var/char columns in SQL are truncated to one byte
2 participants