-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CSV.open with StringIO argument #302
base: master
Are you sure you want to change the base?
Conversation
lib/csv.rb
Outdated
pos = filename_or_io.pos | ||
f = StringIO.new(filename_or_io.read) | ||
filename_or_io.seek(pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use StringIO#string
here?
pos = filename_or_io.pos | |
f = StringIO.new(filename_or_io.read) | |
filename_or_io.seek(pos) | |
f = StringIO.new(filename_or_io.string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd then run into an issue similar to #98, which I confirmed with another test:
def test_foreach_stringio_with_bom
s = StringIO.new
s << "\ufeff" # BOM
s << @data
s.rewind
s.read(3)
rows = CSV.foreach(s, col_sep: "\t", row_sep: "\r\n").to_a
assert_equal(@rows, rows)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the most relevant counter-argument is, CSV.new
also uses read
, so using read
here is also a matter of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Could you also add a test for the BOM case?
How about using StringIO#set_encoding_by_bom
for it?
f = StringIO.new(filename_or_io.string)
unless f.set_encoding_by_bom
f.set_encoding(filename_or_io.external_encoding)
end
CSV.new
uses already opened IO. So it must not ignore the current position. So we must use read
.
But CSV.open
opens a new IO. So it must use all contents instead of contents after the current position.
Could you check the current behavior of CSV.open
with sought real IO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But CSV.open opens a new IO. So it must use all contents instead of contents after the current position.
You're right, CSV.open
reads the whole file content. I changed open
again to use string
instead of read
.
Could you also add a test for the BOM case?
Sure thing. I added it, and together with using the **file_opts
in the new StringIO
, it's handled correctly.
How about using StringIO#set_encoding_by_bom for it?
I couldn't see any effect of doing it. It seems like including the file_opts
is enough. Or is there a case I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see any effect of doing it.
Ah, b706d91 is a trick for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything you still feel is missing, after I've added file_opts
with the trick from b706d91 ?
4020400
to
013bb4a
Compare
Could you rebase on main? Our CI on master broken... and I've fixed it now. |
d30d9bb
to
d820ea2
Compare
Thank you, I've rebase it now. |
Thanks. But sorry... The master CI was broken... I've fixed it. Could you rebase on master again? |
…nstead of reusing it
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
d820ea2
to
30a4dd0
Compare
No problem, I rebased it now. |
Fix #300