Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files, option to only return the column mapping insted of full content #214

Open
pintaf opened this issue Mar 7, 2024 · 2 comments

Comments

@pintaf
Copy link

pintaf commented Mar 7, 2024

Hi there.

We've already built an (ugly) but efficient piece of code that is able to read a csv file (first x lines) and then allow the user to tell what is the data type of each columns. Then we send the untouched file, with the mapping to the backend that does the import.

Indeed, with huge files (million lines) it would surcharge the browser to read the entire file and return the result.
I propose an option to return only the mapping instead of the full result set.

I can work on it and propose an implementation as I plan to replace our little piece of code with your library. (We already use it in some parts of our apps where the files uploaded are not that big).

@ciminelli
Copy link
Member

Hi @pintaf! For large file use cases, I was thinking about implementing streaming so the backend could receive the data in chunks. We could think about adding support for just sending the mapping (and the file data) but this might be tough to support with future updates to add a review screen and validations. Let me know your thoughts.

@pintaf
Copy link
Author

pintaf commented Mar 8, 2024

First I know you mainly use this for your business in TableFlow, so indeed it's important to limit the options, otherwise that can quickly become a spaghetti bowl. Thanks for open sourcing this part BTW.

Indeed, with just the mapping, you would not be able to do validation. But it it feasible to open a 1Million lines excel in the browser and check every line for validation ? Sure, there are possibly ways to chunk the process to avoid high Ram usage, but I wonder if that's not going to transform the end user machine in a heater.

Nice idea the streaming, but as per current lib design, this seems out of the scope here no ? currently the lib only return the result, and it's up to the lib user to do what they want with results ?

Maybe there could be two processes:

  • Normal one, with review allowing to correct some data (maybe) and output being an object with corrected data
  • Mapping only, where you skip the review and validation and get either the whole data as a JS object or mapping plus file object.

From my point of view, there is not enough power in our computers OR we don't want to vamparize the computer resources with large files, especially if you have mobile support in mind.

If you want, I can come up with a configuration object proposal and result proposal depending on the options ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants