Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StepExecution counts integer overflow #3650

Closed
ArtyomGabeev opened this issue Jan 27, 2020 · 5 comments
Closed

StepExecution counts integer overflow #3650

ArtyomGabeev opened this issue Jan 27, 2020 · 5 comments

Comments

@ArtyomGabeev
Copy link
Contributor

Hi,

In StepExecution class counts like readCount/writeCount, etc has int type.
In case when step operates on a big amount of items, there may be an int overflow and may affect e.g. progress monitoring if someone use these counters.

Should we change it to long instead?

@ArtyomGabeev
Copy link
Contributor Author

Also noted, apply method which increments counts is synchronized. Should we switch from volatile to atomic long?

@fmbenhassine fmbenhassine added the status: waiting-for-triage Issues that we did not analyse yet label Feb 7, 2020
@fmbenhassine
Copy link
Contributor

Similar/Related to #1055.

@ArtyomGabeev
Copy link
Contributor Author

@benas any update on this? we are getting integer overflow :)

@fmbenhassine fmbenhassine added in: core type: bug and removed status: waiting-for-triage Issues that we did not analyse yet labels Nov 18, 2020
@fmbenhassine fmbenhassine added this to the 5.0.0-M1 milestone Nov 18, 2020
@adityausathe
Copy link
Contributor

@benas, If this is up for grabs, I would like to take a shot at it. It would be helpful if you could provide some pointers.

@fmbenhassine
Copy link
Contributor

@ArtyomGabeev Thank you for reporting this issue. The problem can also happen in a locally or remotely partitioned/chunked step: each worker could contribute a read/write count and the aggregation might silently fail with an integer overflow on the manager side, leading to incorrect counters in the aggregate StepExecution. An example can be found here.

Changing the type of metrics from int to long seems reasonable to me for a major version. Moreover, columns in the BATCH_STEP_EXECUTION table are already designed to accept long values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants