Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of empty CSV files #9557

Merged
merged 4 commits into from Jan 6, 2021
Merged

Conversation

krassowski
Copy link
Member

References

Fixes #9531

Code changes

  • Added CSV parser test for empty file (not strictly required, since it was not broken, but may help to prevent regresions)
  • Added DSVModel test for empty CSV file (constructor → parseAsync → _computeRowOffsets contained the culprit)
  • Renamed name of the test case using empty.csv from csv-spectrum to empty_values as it was misleading/ambiguous (it was testing empty values, not an empty file)

User-facing changes

Opening empty CSV files works.

Backwards-incompatible changes

None

@jupyterlab-dev-mode
Copy link

Thanks for making a pull request to JupyterLab!

To try out this branch on binder, follow this link: Binder

@jasongrout
Copy link
Contributor

It seems that the disconnect here is that resetting the parser ensures that the rowcount is 1 with row offset [0]:

// First row offset is *always* 0, so we always have the first row offset.
this._rowOffsets = new Uint32Array(1);
this._rowCount = 1;

However, when we parse an empty file, we get nrows of 0, so we end up setting rowcount to 0 here:

this._rowCount = oldRowCount + nrows - 1;

Usually we wouldn't even hit this case since we check for nrows of 0:

// Return if we didn't actually get any new rows beyond the one we've
// already parsed.
if (this._startedParsing && nrows <= 1) {
this._doneParsing = true;
this._ready.resolve(undefined);
return;
}

However, that check is ignored when first opening the file because _startedParsing is false.

@krassowski
Copy link
Member Author

So I just tried:

    if ((this._startedParsing && nrows <= 1) || nrows == 0) {

but it results in [{}] rather than []. In other words, my workaround results in:

Screenshot from 2021-01-05 20-12-04

And the cleaner alternative (modifying the if condition) results in:

Screenshot from 2021-01-05 20-10-51

If my feeling that the first option is better is correct, the clean fix would need something more to produce the same result.

@krassowski
Copy link
Member Author

krassowski commented Jan 5, 2021

But maybe we can argue that empty csv has one empty row with zero columns... The issue would be if someone interprets the current visual cues as the file having one unnamed column and one row with literal 1 inside - without the contrast to actual data rows it is currently ambiguous.

@jasongrout
Copy link
Contributor

I'm currently trying to reason through why this has a -1:

this._rowCount = oldRowCount + nrows - 1;

That is essentially what is throwing us off - it makes the new row count 0, instead of keeping it at 1.

Also, handle the case when we get zero rows back from the parser, for the cases of empty files.
@jasongrout
Copy link
Contributor

@krassowski - what do you think of the changes I just pushed here? I realized why the counting logic was weird - we're often reparsing that last row and have to account for the duplicate row in the parsed output.

I also tried to comment the code to answer questions I had looking at it. Hopefully it is clearer.

In an ideal world, I suppose the parser would return one more index, i.e., the index before the first unparsed row, instead of just the index of the row it just finished parsing, so we didn't have to reparse a row every update.

@krassowski
Copy link
Member Author

It is easier to understand what is going on with the added comments indeed. The fix works for me too.

@jasongrout
Copy link
Contributor

Great! And your changes here to the tests look great too. Thanks again for working on this - you working on this helped me to prioritize it in my development time too.

@jasongrout jasongrout merged commit d2e6a23 into jupyterlab:master Jan 6, 2021
@jtpio jtpio mentioned this pull request Jan 8, 2021
@github-actions github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Jul 6, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pkg:csvviewer status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening empty CSV file doesn't work
2 participants