Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json and csv serializers generate error with large amounts of data #949

Open
apalacio9502 opened this issue Mar 29, 2024 · 1 comment
Open

Comments

@apalacio9502
Copy link

Hello,

I have found that when using serializer_json or serializer_csv, an error can be generated because the result is converted to a string in the process and remember that R only supports "The number of bytes in a character string is limited to 2^31 - 1 ~ 2*10^9". Is there any reason to generate a string and not a raw? In this case, would it be better to generate raw? For example, in the case of csv you could write a temporary file with readr::write_csv() and read the raw with readBin()

Regards,

@schloerke
Copy link
Collaborator

Yes, from https://search.r-project.org/R/refmans/base/html/Memory-limits.html there is a ~ 2 gb limit per string, even on 64-bit machines.

We did not consider the large file size.

Could you create a PR with a working approach (and a test that fails with the current serializer)?

I'm wondering if we'd want to have a different serializer for large files as reading / writing to disk is slow when unnecessary.

Even if we got it working, given the large size... would it be better to use {arrow} serialization (or something other than plain text)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants