New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of empty CSV files #9557
Conversation
Thanks for making a pull request to JupyterLab! To try out this branch on binder, follow this link: |
It seems that the disconnect here is that resetting the parser ensures that the rowcount is 1 with row offset jupyterlab/packages/csvviewer/src/model.ts Lines 599 to 601 in 22f33da
However, when we parse an empty file, we get jupyterlab/packages/csvviewer/src/model.ts Line 450 in 22f33da
Usually we wouldn't even hit this case since we check for nrows of 0: jupyterlab/packages/csvviewer/src/model.ts Lines 438 to 444 in 22f33da
However, that check is ignored when first opening the file because _startedParsing is false.
|
So I just tried: if ((this._startedParsing && nrows <= 1) || nrows == 0) { but it results in And the cleaner alternative (modifying the if condition) results in: If my feeling that the first option is better is correct, the clean fix would need something more to produce the same result. |
But maybe we can argue that empty csv has one empty row with zero columns... The issue would be if someone interprets the current visual cues as the file having one unnamed column and one row with literal |
I'm currently trying to reason through why this has a jupyterlab/packages/csvviewer/src/model.ts Line 450 in 22f33da
That is essentially what is throwing us off - it makes the new row count 0, instead of keeping it at 1. |
Also, handle the case when we get zero rows back from the parser, for the cases of empty files.
@krassowski - what do you think of the changes I just pushed here? I realized why the counting logic was weird - we're often reparsing that last row and have to account for the duplicate row in the parsed output. I also tried to comment the code to answer questions I had looking at it. Hopefully it is clearer. In an ideal world, I suppose the parser would return one more index, i.e., the index before the first unparsed row, instead of just the index of the row it just finished parsing, so we didn't have to reparse a row every update. |
It is easier to understand what is going on with the added comments indeed. The fix works for me too. |
Great! And your changes here to the tests look great too. Thanks again for working on this - you working on this helped me to prioritize it in my development time too. |
References
Fixes #9531
Code changes
constructor → parseAsync → _computeRowOffsets
contained the culprit)empty_values
as it was misleading/ambiguous (it was testing empty values, not an empty file)User-facing changes
Opening empty CSV files works.
Backwards-incompatible changes
None