Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

spring-projects-issues · 2010-06-23T09:36:16Z

Jimmi Changa opened BATCH-1586 and commented

OBSERVED BEHAVIOR

The primary keys for the the spring_batch tables can are not always being saved in sequence. For example, we would see an rows created in the following order: 1, 2, 3, 4, 20, 21, 22, 40, 41, 23, 24, 42.

Having gaps between primary keys is common due to Oracle sequences not reverting during a rollback. What is not common, especially in a non-clustered environment, is the sequences having the appearance of decreasing. In this example, the sequence went from 41 to 23. This is also normally not an issue if the developer does not rely on using the primary key for ordering or querying for order. However, in this case, the third party Spring Batch library does indeed query on the primary key for ordering which is a design flaw leading to Spring Batch thinking our job has already run ("JobInstance already exists and is not restartable").

ROOT CAUSE

In a clustered DB environment, whenever someone opens a database connection, there is no guarantee which cluster node the user will connect to if the sequences are setup per the SQL files included with Spring Batch. In Oracle, the default cache size for a sequence is 20 when created as such in the resources file business-schema-oracle10g.sql:

CREATE SEQUENCE CUSTOMER_SEQ START WITH 5;
CREATE SEQUENCE BATCH_STAGING_SEQ START WITH 0 MINVALUE 0;
CREATE SEQUENCE TRADE_SEQ START WITH 0 MINVALUE 0;

This eventually leads to possibilities where the primary key can be written out of sequence specially when new DB connections are established. This is not a problem because most DBs only guarantee a unique primary key -- not a primary key in order. The true problem seems to be that Spring Batch is relying on the primary key being in order which I've always been taught should not be done. Developers shouldn't read the primary key for ordering.

PROPOSED SOLUTIONS

There are several potential solutions.

(1) Specify 'nocache' for each of the three sequences
(e.g., CREATE SEQUENCE CUSTOMER_SEQ START WITH 5 NOCACHE;)
(2) Specify a shared pooling for each of the three sequences
(3) Design the Spring Batch tables/logic not to rely on the primary key increasing in order. A new column could be added to the table, for example, for tracking ordering.

I lean towards solution (3) because it would solve the problem regardless of which DB utilized and is considered good software design. This primary key sequencing issue may also be present in other DBs besides Oracle so solution (3) would be database agnostic. Solution (1) could be a cheap and easy fix until (3) is completed.

Affects: 2.0.4

2 votes, 6 watchers

spring-projects-issues · 2010-06-23T19:53:04Z

Dave Syer commented

I'm not aware of anywhere in Spring Batch where we rely on ordering of primary keys. Can you be more specific?

spring-projects-issues · 2010-06-24T05:18:46Z

Jimmi Changa commented

Our incrementer, via JobParametersBuilder, I think is pulling the primary key value from one of the job or instance tables, and that is what is got us into trouble. It seems to pull the largest primary key which isn't always the most recent job. If we're doing something wrong in our incrementer below, please let us know.

public class RunIdIncrementer implements JobParametersIncrementer {

final String RUN_ID = "run.id";

public JobParameters getNext(JobParameters parameters) {
    if (parameters == null || parameters.isEmpty()) {
        return new JobParametersBuilder().addLong(RUN_ID, 1L).toJobParameters();
    }
    
    long id = parameters.getLong(RUN_ID, 1L) + 1;
    return new JobParametersBuilder().addLong(RUN_ID, id).toJobParameters();
}

}

spring-projects-issues · 2010-06-29T10:36:56Z

Jimmi Changa commented

I pasted the wrong three sequences in the original description... Of course, I meant these three:

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NOCYCLE;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NOCYCLE;
CREATE SEQUENCE BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NOCYCLE;

And adding NOCACHE at the end is how we quickly dealt with the problem.

spring-projects-issues · 2010-07-08T02:42:37Z

Dave Syer commented

I don't see what the incrementer has to do with the database. If you need to take a global lock while the incrementer is being called that's up to you (the framework only provides that interface as a courtesy really anyway). It certainly has nothing to do with Oracle sequences, unless I'm missing something. Maybe you need to explain how you use the incrementer?

spring-projects-issues · 2012-07-23T18:54:40Z

Javier Andrade commented

Dave, I'm running into a similar issue with an Oracle DB, but I think the problem where Spring Batch is relying on the ordering of the primary keys is in the SimpleJobExplorer class. When we call to get the last job instance:

List<JobInstance> lastInstances = jobExplorer.getJobInstances(jobName, 0, 1);

it appears to use the JdbcJobInstanceDao to do so, which uses this query:

private static final String FIND_LAST_JOBS_BY_NAME = "SELECT JOB_INSTANCE_ID, JOB_NAME from %PREFIX%JOB_INSTANCE where JOB_NAME = ? order by JOB_INSTANCE_ID desc";

where you can see that it is ordering by JOB_INSTANCE_ID. Because of these gaps in the sequence, this query will not always return the last job execution based on timestamp. I hope that helps to clarify this issue.

spring-projects-issues · 2012-07-24T00:45:08Z

Dave Syer commented

That's not really the same issue, and I'm not sure it's serious enough to merit a change. The JobExplorer is read only so no damage is going to be done if you get some instances back out of order from it - the main thing is that the order is repeatable. We could add an order by clause on the creation time, but that might not suit everyone because it would require an extra index. How much of a problem is it really?

spring-projects-issues · 2012-07-24T01:22:57Z

Javier Andrade commented

Sorry for mixing up issues. I'm trying implement a generic batch job launcher to be used to schedule jobs in Quartz. This is based on a sample class you wrote previously for this same purpose. The problem is that the job launcher needs to get the last job instance so that it can get the next value out of the incrementer defined for that job. Since I'm not always getting the last job instance, I get execution errors since I end up using the same run.id value as an existing job execution. For now I modified the oracle sequence objects to NOCACHE, but I'm thinking a solution based on timestamp would be better. Is there a better way to do what I need?

spring-projects-issues · 2012-07-24T01:58:07Z

Dave Syer commented

I see. I suppose the workaround with no cache is the best thing for now. If you want to change the JdbcJobInstanceDao to use a timestamp filter and check that it works for you that would help us to decide when and how to implement it in the framework.

The SimpleJobService from Spring Batch Admin has the same problem as your launcher, so we probably should find a better solution in the long term.

spring-projects-issues · 2012-07-24T02:02:08Z

Dave Syer commented

Actually, the SimpleJobService does not have the same problem because it relies on JobExecutionDao.getLastJobExecution() not JonExecutionDao.getJobInstances() and the JDBC implementation uses timestamps to filter the executions. Maybe you can do the same?

spring-projects-issues · 2013-10-21T14:15:54Z

Becca Gaspard commented

I also ran into this issue and was able to work around by adding ORDER to the sequence.

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;
CREATE SEQUENCE BATCH_JOB_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;

From the oracle documentation:

Specify ORDER to guarantee that sequence numbers are generated in order of request. This clause is useful if you are using the sequence numbers as timestamps. Guaranteeing order is usually not important for sequences used to generate primary keys.

If you don't specify ORDER, NOORDER is the default.

spring-projects-issues · 2016-08-30T23:14:22Z

Roan Brasil Monteiro commented

HI,

How can I help to fix issues?

spring-projects-issues · 2017-12-22T22:55:44Z

Gary L Peskin commented

FYI. We've found this to be a significant problem when restarting a STOPPED job. The query to find the most recent execution for an instance in JdbcJobExecutionDao.getLastJobExecution is

private static final String GET_LAST_EXECUTION = "SELECT JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, CREATE_TIME, LAST_UPDATED, VERSION, JOB_CONFIGURATION_LOCATION "
			+ "from %PREFIX%JOB_EXECUTION E where JOB_INSTANCE_ID = ? and JOB_EXECUTION_ID in (SELECT max(JOB_EXECUTION_ID) from %PREFIX%JOB_EXECUTION E2 where E2.JOB_INSTANCE_ID = ?)";

This obviously requires that the JOB_EXECUTION_ID be monotonically increasing. However, that was not the case with our Oracle RAC setup until we added the ORDER keyword to the sequences. Before that, a STOPPED job was sometimes getting restarted with a Job Execution Context that belonged to a job execution that was not the most recent job execution.

Issue #1422 Issue #2000

spring-projects-issues added type: bug status: waiting-for-triage labels Dec 16, 2019

fmbenhassine added has: votes and removed status: waiting-for-triage labels Jan 26, 2021

fmbenhassine added this to the 5.0.0 milestone Jan 26, 2021

fmbenhassine closed this as completed in ef81b67 Sep 17, 2021

fmbenhassine modified the milestones: 5.0.0, 5.0.0-M1 Sep 17, 2021

fmbenhassine added a commit that referenced this issue Sep 23, 2021

Add migration script to make Oracle sequences ordered

e422f59

Issue #1422 Issue #2000

fmbenhassine mentioned this issue Oct 13, 2023

JdbcJobInstanceDao.getLastJobInstance() and cached sequences #4436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

spring-projects-issues commented Jun 23, 2010

spring-projects-issues commented Jun 23, 2010

spring-projects-issues commented Jun 24, 2010

spring-projects-issues commented Jun 29, 2010

spring-projects-issues commented Jul 8, 2010

spring-projects-issues commented Jul 23, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Oct 21, 2013

spring-projects-issues commented Aug 30, 2016

spring-projects-issues commented Dec 22, 2017

Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

Comments

spring-projects-issues commented Jun 23, 2010

OBSERVED BEHAVIOR

ROOT CAUSE

PROPOSED SOLUTIONS

spring-projects-issues commented Jun 23, 2010

spring-projects-issues commented Jun 24, 2010

spring-projects-issues commented Jun 29, 2010

spring-projects-issues commented Jul 8, 2010

spring-projects-issues commented Jul 23, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Jul 24, 2012

spring-projects-issues commented Oct 21, 2013

spring-projects-issues commented Aug 30, 2016

spring-projects-issues commented Dec 22, 2017