Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: preserve unquoted unicode whitespace in array literals #1266

Merged
merged 25 commits into from Aug 10, 2020

Conversation

bokken
Copy link
Member

@bokken bokken commented Jul 21, 2018

The postgresql backend only quotes ascii whitespace.
#1257

@codecov-io
Copy link

codecov-io commented Jul 21, 2018

Codecov Report

Merging #1266 into master will increase coverage by <.01%.
The diff coverage is 50%.

@@             Coverage Diff              @@
##             master    #1266      +/-   ##
============================================
+ Coverage     68.78%   68.78%   +<.01%     
- Complexity     3908     3914       +6     
============================================
  Files           179      179              
  Lines         16420    16421       +1     
  Branches       2674     2674              
============================================
+ Hits          11295    11296       +1     
+ Misses         3880     3879       -1     
- Partials       1245     1246       +1

@vlsi
Copy link
Member

vlsi commented Jul 21, 2018

Please create a method near

public static boolean isSpace(char c) {
return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';

@vlsi
Copy link
Member

vlsi commented Jul 21, 2018

Tests for leading/trailing/multiple whitespaces are needed as well

@@ -529,6 +529,32 @@ private synchronized void buildArrayList() throws SQLException {
}
}

/**
* Is whitespace which postgresql will force quotes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we elaborate a bit here. I my not have enough coffee yet but I have difficulty reading this line.

@bokken
Copy link
Member Author

bokken commented Jul 21, 2018

Should the Parser.isSpace method also consider vertical tab (0x0B)? That appears to be the only difference.

If we do need a different method, do you prefer the style of the switch statement or the chained OR equals? We should probably have both methods use same style.
I slightly prefer the switch as I find it easier to see all the chars vertically aligned. It is also very slightly faster, but so close that readability is what matters.

@bokken
Copy link
Member Author

bokken commented Jul 21, 2018

@davecramer

Is this better:

Identifies white space characters which the backend uses to determine if a {@code String} value needs to be quoted in array representation.

@davecramer
Copy link
Member

@bokken yes, thanks!

more unit tests
relocate utility method to Parser
@vlsi
Copy link
Member

vlsi commented Jul 21, 2018

Should the Parser.isSpace method also consider vertical tab (0x0B)?

I think it should be left as is and a comment pointing to https://github.com/postgres/postgres/blob/17bb62501787c56e0518e61db13a523d47afd724/src/backend/parser/scan.l#L194-L198 should be added to existing isSpace

@bokken
Copy link
Member Author

bokken commented Jul 21, 2018

I will add the comment.
Do you have an opinion on switch vs chained || ==?

@vlsi
Copy link
Member

vlsi commented Jul 21, 2018

I would follow the style of isSpace

add comment to existing isSpace method
change from switch to chained ORs
@vlsi
Copy link
Member

vlsi commented Jul 22, 2018

PostgreSQL does drop whitespace at the start and at the end of the value:

postgres=# select version();
                                                    version
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.9 on x86_64-apple-darwin17.5.0, compiled by Apple LLVM version 9.1.0 (clang-902.0.39.1), 64-bit

postgres=# select '{ test test ,
postgres'# abc
postgres'#
postgres'# }'::varchar[];
      varchar
-------------------
 {"test test",abc}
(1 row)

@vlsi vlsi changed the title fix: don't ignore unquoted non-ascii whitespace fix: preserve unquoted non-ascii whitespace in array literals Jul 22, 2018
@vlsi vlsi changed the title fix: preserve unquoted non-ascii whitespace in array literals fix: preserve unquoted unicode whitespace in array literals Jul 22, 2018
@vlsi
Copy link
Member

vlsi commented Jul 22, 2018

Just to clarify: sending data to the backend is not affected.

@bokken
Copy link
Member Author

bokken commented Jul 22, 2018

Correct, this only affects the parsing of strings, presumably coming from the backend.

Nothing is changed in assembling strings to send to backend.

@@ -416,8 +416,7 @@ public void testDirectFieldString() throws SQLException {
assertEquals(" \fnew \n ", array[2]);
assertEquals(" ", array[3]);

// this is likely undesired behavior of dropping white spaces mid-value (not
// leading/trailing)
// PostgreSQL drops leading and trailing whitespace, so does the driver
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test asserts that the driver is also dropping ascii white space in the middle of an unquoted value.
This is not consistent with the backend. However, the backend should never produce this type of value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlsi thoughts?
This is asserting dropping of mid value white spaces, which is probably undesired, but practically a non issue as the backend should never produce such a value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not consistent with the backend

Why do you think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test when the same values get passed to the backend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlsi I have added test (aae0d48)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still different from testDirectFieldString

add test showing inner white spaces not dropped on unquoted strings by
backend.
@bokken
Copy link
Member Author

bokken commented Jul 24, 2018 via email

@vlsi
Copy link
Member

vlsi commented Jul 26, 2018

The tests in this PR are hard to review.
It is not obvious what spaces get trimmed by the driver, which are trimmed (or not) by the backend and it is not clear if the behaviour of the both agree.

I agree the existing tests in that area are weak, however if you think there's a mismatch between pgjdbc and backend behavior it would be great if you could create a test that highlights that diff.

@bokken
Copy link
Member Author

bokken commented Jul 26, 2018 via email

@vlsi
Copy link
Member

vlsi commented Jul 26, 2018

Ah, I see now.

bokken and others added 12 commits August 13, 2018 07:40
add test explicitly showing differences between driver and backend
handling unquoted string array values
more unit tests
relocate utility method to Parser
add comment to existing isSpace method
change from switch to chained ORs
add test showing inner white spaces not dropped on unquoted strings by
backend.
add test explicitly showing differences between driver and backend
handling unquoted string array values
@AppVeyorBot
Copy link

Build pgjdbc 1.0.228 completed (commit 636f9c9b39 by @bokken)

@bokken bokken mentioned this pull request Apr 16, 2019
@vlsi vlsi force-pushed the master branch 2 times, most recently from c25d807 to adcb194 Compare March 7, 2020 20:41
# Conflicts:
#	pgjdbc/src/main/java/org/postgresql/jdbc/PgArray.java
@bokken
Copy link
Member Author

bokken commented Aug 6, 2020

@davecramer I updated this to be able to merge.

@davecramer davecramer merged this pull request into pgjdbc:master Aug 10, 2020
davecramer pushed a commit that referenced this pull request Aug 11, 2020
* fix: don't ignore unquoted non-ascii whitespace

The postgresql backend only quotes ascii whitespace.
#1257

add comment to existing isSpace method
change from switch to chained ORs

* test: fix comment in ArrayTest.testDirectFieldString

* fix: revert spaces in array literal

* add changelog entry

add test showing inner white spaces not dropped on unquoted strings by
backend.

Co-authored-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Co-authored-by: BO8979 <BO8979@W1971362.northamerica.cerner.net>
davecramer pushed a commit to davecramer/pgjdbc that referenced this pull request Aug 20, 2020
)

* fix: don't ignore unquoted non-ascii whitespace

The postgresql backend only quotes ascii whitespace.
pgjdbc#1257

add comment to existing isSpace method
change from switch to chained ORs

* test: fix comment in ArrayTest.testDirectFieldString

* fix: revert spaces in array literal

* add changelog entry

add test showing inner white spaces not dropped on unquoted strings by
backend.

Co-authored-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Co-authored-by: BO8979 <BO8979@W1971362.northamerica.cerner.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants