Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backslash handling in rows_from_chunks #249

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ahiijny
Copy link

@ahiijny ahiijny commented Feb 12, 2021

This should fix #242.

In the rows_from_chunks function, when deciding whether or not a quote character indicates the end of a string, it only checked the preceding character to see if it was a backslash or not. But it didn't check if that backslash was escaped or not.

So, if a row contains a string with a properly escaped backslash right before the closing quote (e.g. "\\"), the row chunker will think that the closing quote is escaped. So then the number of braces it sees in the rest of the input won't be balanced, and then it won't ever properly process the rest of the JSON.

I changed rows_from_chunks to keep track of escaped character state in a string to avoid this, and added a couple of unit tests to check for this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string...')
1 participant