Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

Closed
spring-projects-issues opened this issue Oct 14, 2016 · 4 comments
Labels
in: core status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details type: bug

Comments

@spring-projects-issues
Copy link
Collaborator

Alexis NICOLAS opened BATCH-2547 and commented

Right now the AbstractItemCountingItemStreamItemReader use a int to store the read count which means any implementation of this can't read more than Integer.MAX_VALUE items. Using a long would allow to read a lot more elements, this can be useful for processes dealing with huge amount of data.


Affects: 3.0.7

@fmbenhassine fmbenhassine added this to the 5.0.0-M1 milestone Nov 27, 2020
@fmbenhassine fmbenhassine removed the status: waiting-for-triage Issues that we did not analyse yet label Jan 21, 2021
@parikshitdutta
Copy link
Contributor

Hi @benas,
I am working on this one, you can assign to me if you like.

Thank you.

@fmbenhassine
Copy link
Contributor

@parikshitdutta Great! Thank you for letting me know about that. I did not evaluate the impact of this change request yet, but the changes here might overlap with those required for #3650, for which I opened a draft PR (see #3845). If this is the case, please make sure to take into account those changes to avoid duplicate efforts. Otherwise, there would be no issue, we would have two independent PRs.

@parikshitdutta
Copy link
Contributor

Apparently updating read counters to long in AbstractItemCountingItemStreamItemReader should help it's immediate derivatives such as FlatFileItemReader, JpaCursorItemReader, but the refactoring going to touch several core areas such as:

And,

and all its derivatives such as PassThroughLineMapper and PatternMatchingCompositeLineMapper etc.

On the other hand, these changes not going to help AbstractCursorItemReader and it's derived classes such as JdbcCursorItemReader, StoredProcedureItemReader etc, as those inherently got methods with int param which can not be changed such as this:

Because, RowMapper.mapRow only accepts int as argument:

I just mentioned some of the refactoring, while there are much more than what meets the eye, I believe we should handle it in different way than just refactoring int to long.

Your thought @benas?

@fmbenhassine fmbenhassine modified the milestones: 5.0.0-M1, 5.0.0 Jan 2, 2022
@fmbenhassine
Copy link
Contributor

@parikshitdutta Thank you for your feedback. I came to the same conclusions. The impact of this change is huge. Moreover, in order to implement this correctly (ie without data loss during type conversion), this would require a change in RowMapper#mapRow from SF and java.sql.ResultSet#absolute / java.sql.Statement#setMaxRows from Java.

The ResultSet API is designed with the int type for row numbers (getRow, absolute, etc). I'm curious to see the behaviour of ResultSet#Last() followed by a ResultSet#getRow for a query that returns more than Integer.MAX_VALUE records. That said, I wonder if working with more than 2.1B records in the same result set is frequent or even practical. The same applies for files as well.

So while this issue is theoretically valid, I believe it is still rare in practice (that's probably one of the reasons why ResultSet is still using int and not long). And even if it happens, partitioning the dataset should be possible in most cases, and using a partitioned step is the way to go.

For all these reasons (impediments + frequency of the case), I believe the requested change is not worth the added value. I would be grateful to anyone who faces this issue to upvote it and add a comment in order to measure how often this happens in practice.

@fmbenhassine fmbenhassine removed this from the 5.0.0 milestone Apr 11, 2022
@fmbenhassine fmbenhassine added the status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details type: bug
Projects
None yet
Development

No branches or pull requests

3 participants