AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

spring-projects-issues · 2016-10-14T15:30:26Z

Alexis NICOLAS opened BATCH-2547 and commented

Right now the AbstractItemCountingItemStreamItemReader use a int to store the read count which means any implementation of this can't read more than Integer.MAX_VALUE items. Using a long would allow to read a lot more elements, this can be useful for processes dealing with huge amount of data.

Affects: 3.0.7

parikshitdutta · 2021-02-04T07:06:52Z

Hi @benas,
I am working on this one, you can assign to me if you like.

Thank you.

fmbenhassine · 2021-02-04T10:30:24Z

@parikshitdutta Great! Thank you for letting me know about that. I did not evaluate the impact of this change request yet, but the changes here might overlap with those required for #3650, for which I opened a draft PR (see #3845). If this is the case, please make sure to take into account those changes to avoid duplicate efforts. Otherwise, there would be no issue, we would have two independent PRs.

parikshitdutta · 2021-03-05T20:46:14Z

Apparently updating read counters to long in AbstractItemCountingItemStreamItemReader should help it's immediate derivatives such as FlatFileItemReader, JpaCursorItemReader, but the refactoring going to touch several core areas such as:

spring-batch/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/ItemCountAware.java

Line 33 in a6f283b

void setItemCount(int count);

And,

spring-batch/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/file/LineMapper.java

Line 47 in a6f283b

T mapLine(String line, int lineNumber) throws Exception;

and all its derivatives such as PassThroughLineMapper and PatternMatchingCompositeLineMapper etc.

On the other hand, these changes not going to help AbstractCursorItemReader and it's derived classes such as JdbcCursorItemReader, StoredProcedureItemReader etc, as those inherently got methods with int param which can not be changed such as this:

spring-batch/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/AbstractCursorItemReader.java

Line 495 in a6f283b

T item = readCursor(rs, currentRow);

Because, RowMapper.mapRow only accepts int as argument:

spring-batch/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/JdbcCursorItemReader.java

Line 142 in a6f283b

return rowMapper.mapRow(rs, currentRow);

I just mentioned some of the refactoring, while there are much more than what meets the eye, I believe we should handle it in different way than just refactoring int to long.

Your thought @benas?

fmbenhassine · 2022-04-11T10:51:24Z

@parikshitdutta Thank you for your feedback. I came to the same conclusions. The impact of this change is huge. Moreover, in order to implement this correctly (ie without data loss during type conversion), this would require a change in RowMapper#mapRow from SF and java.sql.ResultSet#absolute / java.sql.Statement#setMaxRows from Java.

The ResultSet API is designed with the int type for row numbers (getRow, absolute, etc). I'm curious to see the behaviour of ResultSet#Last() followed by a ResultSet#getRow for a query that returns more than Integer.MAX_VALUE records. That said, I wonder if working with more than 2.1B records in the same result set is frequent or even practical. The same applies for files as well.

So while this issue is theoretically valid, I believe it is still rare in practice (that's probably one of the reasons why ResultSet is still using int and not long). And even if it happens, partitioning the dataset should be possible in most cases, and using a partitioned step is the way to go.

For all these reasons (impediments + frequency of the case), I believe the requested change is not worth the added value. I would be grateful to anyone who faces this issue to upvote it and add a comment in order to measure how often this happens in practice.

spring-projects-issues added type: bug status: waiting-for-triage Issues that we did not analyse yet in: core labels Dec 16, 2019

fmbenhassine mentioned this issue Feb 19, 2020

StepExecution counts integer overflow #3650

Closed

fmbenhassine added this to the 5.0.0-M1 milestone Nov 27, 2020

fmbenhassine removed the status: waiting-for-triage Issues that we did not analyse yet label Jan 21, 2021

fmbenhassine modified the milestones: 5.0.0-M1, 5.0.0 Jan 2, 2022

fmbenhassine closed this as completed Apr 11, 2022

fmbenhassine removed this from the 5.0.0 milestone Apr 11, 2022

fmbenhassine added the status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details label Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

spring-projects-issues commented Oct 14, 2016

parikshitdutta commented Feb 4, 2021

fmbenhassine commented Feb 4, 2021

parikshitdutta commented Mar 5, 2021

fmbenhassine commented Apr 11, 2022

AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

AbstractItemCountingItemStreamItemReader should store the read count as a long [BATCH-2547] #1055

Comments

spring-projects-issues commented Oct 14, 2016

parikshitdutta commented Feb 4, 2021

fmbenhassine commented Feb 4, 2021

parikshitdutta commented Mar 5, 2021

fmbenhassine commented Apr 11, 2022