Large files, option to only return the column mapping insted of full content #214

pintaf · 2024-03-07T14:18:28Z

Hi there.

We've already built an (ugly) but efficient piece of code that is able to read a csv file (first x lines) and then allow the user to tell what is the data type of each columns. Then we send the untouched file, with the mapping to the backend that does the import.

Indeed, with huge files (million lines) it would surcharge the browser to read the entire file and return the result.
I propose an option to return only the mapping instead of the full result set.

I can work on it and propose an implementation as I plan to replace our little piece of code with your library. (We already use it in some parts of our apps where the files uploaded are not that big).

ciminelli · 2024-03-07T22:27:38Z

Hi @pintaf! For large file use cases, I was thinking about implementing streaming so the backend could receive the data in chunks. We could think about adding support for just sending the mapping (and the file data) but this might be tough to support with future updates to add a review screen and validations. Let me know your thoughts.

pintaf · 2024-03-08T09:37:24Z

First I know you mainly use this for your business in TableFlow, so indeed it's important to limit the options, otherwise that can quickly become a spaghetti bowl. Thanks for open sourcing this part BTW.

Indeed, with just the mapping, you would not be able to do validation. But it it feasible to open a 1Million lines excel in the browser and check every line for validation ? Sure, there are possibly ways to chunk the process to avoid high Ram usage, but I wonder if that's not going to transform the end user machine in a heater.

Nice idea the streaming, but as per current lib design, this seems out of the scope here no ? currently the lib only return the result, and it's up to the lib user to do what they want with results ?

Maybe there could be two processes:

Normal one, with review allowing to correct some data (maybe) and output being an object with corrected data
Mapping only, where you skip the review and validation and get either the whole data as a JS object or mapping plus file object.

From my point of view, there is not enough power in our computers OR we don't want to vamparize the computer resources with large files, especially if you have mobile support in mind.

If you want, I can come up with a configuration object proposal and result proposal depending on the options ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files, option to only return the column mapping insted of full content #214

Large files, option to only return the column mapping insted of full content #214

pintaf commented Mar 7, 2024 •

edited

ciminelli commented Mar 7, 2024

pintaf commented Mar 8, 2024

Large files, option to only return the column mapping insted of full content #214

Large files, option to only return the column mapping insted of full content #214

Comments

pintaf commented Mar 7, 2024 • edited

ciminelli commented Mar 7, 2024

pintaf commented Mar 8, 2024

pintaf commented Mar 7, 2024 •

edited