Fix parsing of empty CSV files #9557

krassowski · 2021-01-05T19:19:00Z

References

Fixes #9531

Code changes

Added CSV parser test for empty file (not strictly required, since it was not broken, but may help to prevent regresions)
Added DSVModel test for empty CSV file (constructor → parseAsync → _computeRowOffsets contained the culprit)
Renamed name of the test case using empty.csv from csv-spectrum to empty_values as it was misleading/ambiguous (it was testing empty values, not an empty file)

User-facing changes

Opening empty CSV files works.

Backwards-incompatible changes

None

jupyterlab-dev-mode · 2021-01-05T19:19:12Z

Thanks for making a pull request to JupyterLab!

To try out this branch on binder, follow this link:

jasongrout · 2021-01-05T19:54:46Z

It seems that the disconnect here is that resetting the parser ensures that the rowcount is 1 with row offset [0]:

jupyterlab/packages/csvviewer/src/model.ts

Lines 599 to 601 in 22f33da

    
           // First row offset is *always* 0, so we always have the first row offset. 
        
           this._rowOffsets = new Uint32Array(1); 
        
           this._rowCount = 1;

However, when we parse an empty file, we get nrows of 0, so we end up setting rowcount to 0 here:

jupyterlab/packages/csvviewer/src/model.ts

Line 450 in 22f33da

this._rowCount = oldRowCount + nrows - 1;

Usually we wouldn't even hit this case since we check for nrows of 0:

jupyterlab/packages/csvviewer/src/model.ts

Lines 438 to 444 in 22f33da

    
           // Return if we didn't actually get any new rows beyond the one we've 
        
           // already parsed. 
        
           if (this._startedParsing && nrows <= 1) { 
        
             this._doneParsing = true; 
        
             this._ready.resolve(undefined); 
        
             return; 
        
           }

However, that check is ignored when first opening the file because _startedParsing is false.

krassowski · 2021-01-05T20:14:24Z

So I just tried:

    if ((this._startedParsing && nrows <= 1) || nrows == 0) {

but it results in [{}] rather than []. In other words, my workaround results in:

And the cleaner alternative (modifying the if condition) results in:

If my feeling that the first option is better is correct, the clean fix would need something more to produce the same result.

krassowski · 2021-01-05T20:18:00Z

But maybe we can argue that empty csv has one empty row with zero columns... The issue would be if someone interprets the current visual cues as the file having one unnamed column and one row with literal 1 inside - without the contrast to actual data rows it is currently ambiguous.

jasongrout · 2021-01-05T20:36:22Z

I'm currently trying to reason through why this has a -1:

jupyterlab/packages/csvviewer/src/model.ts

Line 450 in 22f33da

this._rowCount = oldRowCount + nrows - 1;

That is essentially what is throwing us off - it makes the new row count 0, instead of keeping it at 1.

Also, handle the case when we get zero rows back from the parser, for the cases of empty files.

jasongrout · 2021-01-05T23:25:49Z

@krassowski - what do you think of the changes I just pushed here? I realized why the counting logic was weird - we're often reparsing that last row and have to account for the duplicate row in the parsed output.

I also tried to comment the code to answer questions I had looking at it. Hopefully it is clearer.

In an ideal world, I suppose the parser would return one more index, i.e., the index before the first unparsed row, instead of just the index of the row it just finished parsing, so we didn't have to reparse a row every update.

… to reparse.

krassowski · 2021-01-06T16:11:33Z

It is easier to understand what is going on with the added comments indeed. The fix works for me too.

jasongrout · 2021-01-06T18:09:27Z

Great! And your changes here to the tests look great too. Thanks again for working on this - you working on this helped me to prioritize it in my development time too.

krassowski added 2 commits January 5, 2021 19:11

Reproduce failure of DSVModel to parse empty file

cab94f3

Fix parsing of empty CSV files

45df277

github-actions bot added the pkg:csvviewer label Jan 5, 2021

Make the initial rowCount 0 in the CSV viewer model.

bcb5d04

Also, handle the case when we get zero rows back from the parser, for the cases of empty files.

jasongrout force-pushed the empty-csv branch from 7398269 to 91346ba Compare January 6, 2021 00:29

Simplify logic by changing ‘reparse’ to the number of rows we request…

c0f4a8a

… to reparse.

jasongrout force-pushed the empty-csv branch from 91346ba to c0f4a8a Compare January 6, 2021 00:34

jasongrout added this to the 3.0 milestone Jan 6, 2021

jasongrout merged commit d2e6a23 into jupyterlab:master Jan 6, 2021

jtpio mentioned this pull request Jan 8, 2021

Fix breadcrumb links #9572

Merged

github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Jul 6, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jul 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing of empty CSV files #9557

Fix parsing of empty CSV files #9557

krassowski commented Jan 5, 2021

jupyterlab-dev-mode bot commented Jan 5, 2021

jasongrout commented Jan 5, 2021

krassowski commented Jan 5, 2021

krassowski commented Jan 5, 2021 •

edited

jasongrout commented Jan 5, 2021

jasongrout commented Jan 5, 2021

krassowski commented Jan 6, 2021

jasongrout commented Jan 6, 2021

Fix parsing of empty CSV files #9557

Fix parsing of empty CSV files #9557

Conversation

krassowski commented Jan 5, 2021

References

Code changes

User-facing changes

Backwards-incompatible changes

jupyterlab-dev-mode bot commented Jan 5, 2021

jasongrout commented Jan 5, 2021

krassowski commented Jan 5, 2021

krassowski commented Jan 5, 2021 • edited

jasongrout commented Jan 5, 2021

jasongrout commented Jan 5, 2021

krassowski commented Jan 6, 2021

jasongrout commented Jan 6, 2021

krassowski commented Jan 5, 2021 •

edited