Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle clustered environment with cached sequences can lead to Spring Batch thinking new job already exists [BATCH-1586] #2000

Closed
spring-projects-issues opened this issue Jun 23, 2010 · 12 comments
Labels
has: votes Issues that have votes type: bug
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

Jimmi Changa opened BATCH-1586 and commented

OBSERVED BEHAVIOR

The primary keys for the the spring_batch tables can are not always being saved in sequence. For example, we would see an rows created in the following order: 1, 2, 3, 4, 20, 21, 22, 40, 41, 23, 24, 42.

Having gaps between primary keys is common due to Oracle sequences not reverting during a rollback. What is not common, especially in a non-clustered environment, is the sequences having the appearance of decreasing. In this example, the sequence went from 41 to 23. This is also normally not an issue if the developer does not rely on using the primary key for ordering or querying for order. However, in this case, the third party Spring Batch library does indeed query on the primary key for ordering which is a design flaw leading to Spring Batch thinking our job has already run ("JobInstance already exists and is not restartable").

ROOT CAUSE

In a clustered DB environment, whenever someone opens a database connection, there is no guarantee which cluster node the user will connect to if the sequences are setup per the SQL files included with Spring Batch. In Oracle, the default cache size for a sequence is 20 when created as such in the resources file business-schema-oracle10g.sql:

CREATE SEQUENCE CUSTOMER_SEQ START WITH 5;
CREATE SEQUENCE BATCH_STAGING_SEQ START WITH 0 MINVALUE 0;
CREATE SEQUENCE TRADE_SEQ START WITH 0 MINVALUE 0;

This eventually leads to possibilities where the primary key can be written out of sequence specially when new DB connections are established. This is not a problem because most DBs only guarantee a unique primary key -- not a primary key in order. The true problem seems to be that Spring Batch is relying on the primary key being in order which I've always been taught should not be done. Developers shouldn't read the primary key for ordering.

PROPOSED SOLUTIONS

There are several potential solutions.

(1) Specify 'nocache' for each of the three sequences
(e.g., CREATE SEQUENCE CUSTOMER_SEQ START WITH 5 NOCACHE;)
(2) Specify a shared pooling for each of the three sequences
(3) Design the Spring Batch tables/logic not to rely on the primary key increasing in order. A new column could be added to the table, for example, for tracking ordering.

I lean towards solution (3) because it would solve the problem regardless of which DB utilized and is considered good software design. This primary key sequencing issue may also be present in other DBs besides Oracle so solution (3) would be database agnostic. Solution (1) could be a cheap and easy fix until (3) is completed.


Affects: 2.0.4

2 votes, 6 watchers

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

I'm not aware of anywhere in Spring Batch where we rely on ordering of primary keys. Can you be more specific?

@spring-projects-issues
Copy link
Collaborator Author

Jimmi Changa commented

Our incrementer, via JobParametersBuilder, I think is pulling the primary key value from one of the job or instance tables, and that is what is got us into trouble. It seems to pull the largest primary key which isn't always the most recent job. If we're doing something wrong in our incrementer below, please let us know.

public class RunIdIncrementer implements JobParametersIncrementer {

final String RUN_ID = "run.id";

public JobParameters getNext(JobParameters parameters) {
    if (parameters == null || parameters.isEmpty()) {
        return new JobParametersBuilder().addLong(RUN_ID, 1L).toJobParameters();
    }
    
    long id = parameters.getLong(RUN_ID, 1L) + 1;
    return new JobParametersBuilder().addLong(RUN_ID, id).toJobParameters();
}

}

@spring-projects-issues
Copy link
Collaborator Author

Jimmi Changa commented

I pasted the wrong three sequences in the original description... Of course, I meant these three:

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NOCYCLE;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NOCYCLE;
CREATE SEQUENCE BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NOCYCLE;

And adding NOCACHE at the end is how we quickly dealt with the problem.

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

I don't see what the incrementer has to do with the database. If you need to take a global lock while the incrementer is being called that's up to you (the framework only provides that interface as a courtesy really anyway). It certainly has nothing to do with Oracle sequences, unless I'm missing something. Maybe you need to explain how you use the incrementer?

@spring-projects-issues
Copy link
Collaborator Author

Javier Andrade commented

Dave, I'm running into a similar issue with an Oracle DB, but I think the problem where Spring Batch is relying on the ordering of the primary keys is in the SimpleJobExplorer class. When we call to get the last job instance:

List<JobInstance> lastInstances = jobExplorer.getJobInstances(jobName, 0, 1);

it appears to use the JdbcJobInstanceDao to do so, which uses this query:

private static final String FIND_LAST_JOBS_BY_NAME = "SELECT JOB_INSTANCE_ID, JOB_NAME from %PREFIX%JOB_INSTANCE where JOB_NAME = ? order by JOB_INSTANCE_ID desc";

where you can see that it is ordering by JOB_INSTANCE_ID. Because of these gaps in the sequence, this query will not always return the last job execution based on timestamp. I hope that helps to clarify this issue.

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

That's not really the same issue, and I'm not sure it's serious enough to merit a change. The JobExplorer is read only so no damage is going to be done if you get some instances back out of order from it - the main thing is that the order is repeatable. We could add an order by clause on the creation time, but that might not suit everyone because it would require an extra index. How much of a problem is it really?

@spring-projects-issues
Copy link
Collaborator Author

Javier Andrade commented

Sorry for mixing up issues. I'm trying implement a generic batch job launcher to be used to schedule jobs in Quartz. This is based on a sample class you wrote previously for this same purpose. The problem is that the job launcher needs to get the last job instance so that it can get the next value out of the incrementer defined for that job. Since I'm not always getting the last job instance, I get execution errors since I end up using the same run.id value as an existing job execution. For now I modified the oracle sequence objects to NOCACHE, but I'm thinking a solution based on timestamp would be better. Is there a better way to do what I need?

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

I see. I suppose the workaround with no cache is the best thing for now. If you want to change the JdbcJobInstanceDao to use a timestamp filter and check that it works for you that would help us to decide when and how to implement it in the framework.

The SimpleJobService from Spring Batch Admin has the same problem as your launcher, so we probably should find a better solution in the long term.

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

Actually, the SimpleJobService does not have the same problem because it relies on JobExecutionDao.getLastJobExecution() not JonExecutionDao.getJobInstances() and the JDBC implementation uses timestamps to filter the executions. Maybe you can do the same?

@spring-projects-issues
Copy link
Collaborator Author

Becca Gaspard commented

I also ran into this issue and was able to work around by adding ORDER to the sequence.

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;
CREATE SEQUENCE BATCH_JOB_SEQ START WITH 0 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE ORDER;

From the oracle documentation:

Specify ORDER to guarantee that sequence numbers are generated in order of request. This clause is useful if you are using the sequence numbers as timestamps. Guaranteeing order is usually not important for sequences used to generate primary keys.

If you don't specify ORDER, NOORDER is the default.

@spring-projects-issues
Copy link
Collaborator Author

Roan Brasil Monteiro commented

HI,

How can I help to fix issues?

@spring-projects-issues
Copy link
Collaborator Author

Gary L Peskin commented

FYI. We've found this to be a significant problem when restarting a STOPPED job. The query to find the most recent execution for an instance in JdbcJobExecutionDao.getLastJobExecution is

private static final String GET_LAST_EXECUTION = "SELECT JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, CREATE_TIME, LAST_UPDATED, VERSION, JOB_CONFIGURATION_LOCATION "
			+ "from %PREFIX%JOB_EXECUTION E where JOB_INSTANCE_ID = ? and JOB_EXECUTION_ID in (SELECT max(JOB_EXECUTION_ID) from %PREFIX%JOB_EXECUTION E2 where E2.JOB_INSTANCE_ID = ?)";

This obviously requires that the JOB_EXECUTION_ID be monotonically increasing. However, that was not the case with our Oracle RAC setup until we added the ORDER keyword to the sequences. Before that, a STOPPED job was sometimes getting restarted with a Job Execution Context that belonged to a job execution that was not the most recent job execution.

@spring-projects-issues spring-projects-issues added type: bug status: waiting-for-triage Issues that we did not analyse yet labels Dec 16, 2019
@fmbenhassine fmbenhassine added has: votes Issues that have votes and removed status: waiting-for-triage Issues that we did not analyse yet labels Jan 26, 2021
@fmbenhassine fmbenhassine added this to the 5.0.0 milestone Jan 26, 2021
@fmbenhassine fmbenhassine modified the milestones: 5.0.0, 5.0.0-M1 Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has: votes Issues that have votes type: bug
Projects
None yet
Development

No branches or pull requests

2 participants