Revisit the configuration of infrastructure beans with @EnableBatchProcessing #3942

fmbenhassine · 2021-06-16T09:43:32Z

Compared to the XML configuration style where infrastructure beans (JobRepository, JobLauncher, etc) should be defined manually, @EnableBatchProcessing does a good job in configuring those beans automatically and making them available for autowiring in users configuration classes. However, several issues have been reported regarding the default behaviour of this annotation as well as the customization of its behaviour. Here is a non exhaustive list:

1. Customization of infrastructure beans is not straightforward

For example, as reported in #3765, in order to create a custom serializer, one needs to provide a custom JobRepository. Now in order to provide a custom JobRepository, one needs to provide a custom BatchConfigurer (either by implementing the interface or by extending the default one and override a method), something like:

@Configuration
@EnableBatchProcessing
public class MyJobConfigWithCustomSerializer {

    @Bean
    public BatchConfigurer batchConfigurer() {
        return new DefaultBatchConfigurer() {
            @Override
            public JobRepository getJobRepository() {
                ExecutionContextSerializer serializer = new Jackson2ExecutionContextStringSerializer();
                // customize serializer
                JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
                factory.setSerializer(serializer);
                // set other properties on the factory bean
                try {
                    factory.afterPropertiesSet();
                    return factory.getObject();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }

}

Moreover, in this case, the custom serializer should also be set on the JobExplorer in order to correctly deserialize the execution context while exploring meta-data that was persisted with the JobRepository. So one needs to do the following as well:

@Configuration
@EnableBatchProcessing
public class MyJobConfigWithCustomSerializer {

    @Bean
    public BatchConfigurer batchConfigurer() {
        return new DefaultBatchConfigurer() {
            @Override
            public JobRepository getJobRepository() {
                JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
                factory.setSerializer(createCustomSerializer());
                // set other properties on the factory bean
                try {
                    factory.afterPropertiesSet();
                    return factory.getObject();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }

            @Override
            public JobExplorer getJobExplorer() {
                JobExplorerFactoryBean factoryBean = new JobExplorerFactoryBean();
                factoryBean.setSerializer(createCustomSerializer());
                // set other properties on the factory bean
                try {
                    factoryBean.afterPropertiesSet();
                    return factoryBean.getObject();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }

            private ExecutionContextSerializer createCustomSerializer() {
                Jackson2ExecutionContextStringSerializer serializer = new Jackson2ExecutionContextStringSerializer();
                // customize serializer
                return serializer;
            }
        };
    }
    
}

The process is the same for other properties of the job repository/explorer like the tables prefix, LobHandler, etc.

2. Unconditional exposure of some beans

While batch specific beans like JobRepository, JobLauncher, etc could be exposed in the application context "safely", some beans like the transaction manager could be used in other parts of the application and exposing it unconditionally could be problematic (see #816). This is especially true when using Spring Boot, and this requires bean overriding which is not always wanted by users.

3. Extending infrastructure beans is not straightforward / possible

Since infrastructure beans are systematically defined and exposed by @EnableBatchProcessing and not looked up from the application context first, it is not easy/possible to extend those beans to add custom behaviour (like adding tracing to the JobRepository for instance, as reported in #3899) and use the extensions in place of default beans.

4. `BatchConfigurer` is eventually an unnecessary level of indirection

Most people tend to declare infrastructure beans in the application context and expect them to be picked up by Spring Batch (this is not a wrong expectation). Here are some examples:

If @EnableBatchProcessing is changed to look for beans in the application context first, requiring users to provide a custom BatchConfigurer would not become mandatory anymore. For example, the following way of configuring a custom JobRepository:

@Configuration
@EnableBatchProcessing
public class MyJobConfigWithCustomJobRepository {

    @Bean
    public BatchConfigurer batchConfigurer() {
        return new DefaultBatchConfigurer() {
            @Override
            public JobRepository getJobRepository() {
                JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
                // set properties on the factory bean
                try {
                    factory.afterPropertiesSet();
                    return factory.getObject();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
        };
    }

   // job bean definition
}

could become:

@Configuration
@EnableBatchProcessing
public class MyJobConfigWithCustomJobRepository {

    @Bean
    public JobRepository jobRepository() throws Exception {
        JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
        // set properties on the factory bean
        factory.afterPropertiesSet();
        return factory.getObject();
    }

   // job bean definition
   
}

5. Confusing configuration when batch meta-data is not required

The configuration of the JobRepository/JobExplorer in @EnableBatchProcessing is based on the presence of a DataSource bean (if no data source is provided, a Map-based job repository/explorer is configured, which was deprecated anyway #3780). If the application context contains one or more datasource that should not be used by Spring Batch for its meta-data, things seem to become complicated and confusing to many people. Here are some examples:

It is concerning that people end up with an empty setter for the datasource:

The data source is actually an implementation detail of a particular JobRepository implementation, which is the JDBC based JobRepository. Other implementations of JobRepository might not need a data source at all (like a MongoDB based job repository for example). The point here is that the data source should not be a first order concern in terms of configuration, but rather a second order concern. In other words, @EnableBatchProcessing should first make sure the user wants to use the JDBC based job repository, and if so, only then check for a data source bean in the application context.

Possible solutions

I see a couple of options here, but I'm open for other suggestions as well.

1. Use annotation attributes to customize infrastructure beans

The idea here is to make @EnableBatchProcessing first look for infrastructure beans in the application context (this is similar and consistent with the way other projects from the portfolio configure apps, like Spring Security for instance). If those beans are not defined, then create them and register them in the application context. The same naming conventions used with XML configuration style should be used for consistency. For example:

@Configuration
@EnableBatchProcessing(dataSource = "myDataSource",  // could be omitted if named "dataSource"
                       transactionManager = "myTransactionManager", // could be omitted if named "transactionManager"
                       serializer = "mySerializer")
public class MyJobConfiguration {

    @Bean
    public Job job(JobBuilderFactory jobBuilderFactory) {
        return jobBuilderFactory.get("myJob")
                // define job flow
                .build();
    }

    @Bean // could be the one auto-configured by Spring Boot
    public ExecutionContextSerializer mySerializer() {
        ExecutionContextSerializer serializer = new Jackson2ExecutionContextStringSerializer();
        // customize serializer
        return serializer;
    }

}

In this example, @EnableBatchProcessing would first look for a JobRepository bean named jobRepository (same naming convention as XML) in the application context. If no such bean is defined, then it should create one by setting collaborators as defined in the annotation attributes.

2. Provide a base configuration class with infrastructure beans

Similar to the base XML application context that defines infrastructure beans with XML configuration, the idea is to provide a similar mechanism but for Java configuration. This means providing a base configuration class (which could be something like the current AbstractBatchConfiguration but with ready-to-use bean definitions in it) that users can extend to define their batch jobs:

@Configuration
public class MyBatchApplication extends BatchConfiguration {

    @Bean
    public Job job(JobBuilderFactory jobBuilderFactory) {
        return jobBuilderFactory.get("job")
                // define job
                .build();
    }

}

Any bean that needs customization can be declared in the user's class.

The text was updated successfully, but these errors were encountered:

This commit removes the deprecated Map-based job repository and job explorer implementations with their respective DAOs. Using the `EnableBatchProcessing` annotation now requires a datasource bean to be defined in the application context. This will be reviewed as part of #3942. This commit is a first pass that updates related tests to use the JDBC-based job repository/explorer with an embedded database. A second pass should be done to improve tests by caching/reusing embedded databases if possible. Issue #3836

Before this commit, the configuration of infrastructure beans was confusing and not straightforward to customize. This commit changes the way Batch infrastructure beans are configured. The most important changes are: * EnableBatchProcessing now provides new attributes to configure properties of infrastructure beans * Bean registration is now done programmatically with a BeanDefinitionRegistrar instead of importing a class with statically annotated bean definition methods * Bean are now resolved from the application context directly instead of being resolved from a BatchConfigurer * Both a data source and a transaction manager are now required to be defined in the application context Issue spring-projects#3942

Before this commit, the configuration of infrastructure beans was confusing and not straightforward to customize. This commit changes the way Batch infrastructure beans are configured. The most important changes are: * EnableBatchProcessing now provides new attributes to configure properties of infrastructure beans * Bean registration is now done programmatically with a BeanDefinitionRegistrar instead of importing a class with statically annotated bean definition methods * Bean are now resolved from the application context directly instead of being resolved from a BatchConfigurer * Both a data source and a transaction manager are now required to be defined in the application context * AbstractBatchConfiguration is now intended to be extended by users code to get/customize basic infrastructure beans Issue spring-projects#3942

Before this commit, the configuration of infrastructure beans was confusing and not straightforward to customize. This commit changes the way Batch infrastructure beans are configured. The most important changes are: * EnableBatchProcessing now provides new attributes to configure properties of infrastructure beans * Bean registration is now done programmatically with a BeanDefinitionRegistrar instead of importing a class with statically annotated bean definition methods * Bean are now resolved from the application context directly instead of being resolved from a BatchConfigurer * Both a data source and a transaction manager are now required to be defined in the application context * AbstractBatchConfiguration is now intended to be extended by users code to get/customize basic infrastructure beans Issue spring-projects#3942 Revisit the configuration code of EnableBatchProcessing Before this commit, the configuration of infrastructure beans was confusing and not straightforward to customize. This commit changes the way Batch infrastructure beans are configured. The most important changes are: * EnableBatchProcessing now provides new attributes to configure properties of infrastructure beans * Bean registration is now done programmatically with a BeanDefinitionRegistrar instead of importing a class with statically annotated bean definition methods * Bean are now resolved from the application context directly instead of being resolved from a BatchConfigurer * Both a data source and a transaction manager are now required to be defined in the application context * AbstractBatchConfiguration is now intended to be extended by users code to get/customize basic infrastructure beans Issue spring-projects#3942

These checks occur too early in the bean creation/registration process and might cause issues when initializing the context. Related to #3942

Related to #3942

jmresler · 2023-11-29T21:39:17Z

I know I'm late to the party and I also am a user, not a developer of Spring so please be patient with my ignorance.

A little background, I work for a huge financial services company that uses containers for deployment.
Because containers are dynamic and addresses can change if one goes down and the system restarts, it is difficult to get static external resource access (such as database) approved because they would have to allow secure connections from a conceivably large, dynamic number of addresses and it's hard to get these approved through security.

The issue at hand is that we don't have batch schemas available on our databases because of this dynamic so we predominantly use in memory databases for batch applications. Given multiple data sources are a very common paradigm for batch applications (due to the need to have a batch infrastructure database), would the default use case to provide a data source used by the batch infrastructure not be expected?

The DefaultBatchConfigurer in effect provided a less than robust solution to this problem by using the Map repository. I understand why it was removed but it was very time saving.

Any chance this functionality could be built in to the batch starter?

I am not volunteering my work because I don't expect it to pass code review :-) but I'm writing a basic starter to provide this for some applications I've written. Would be nicer if it was built-in to the API in a robust way.

Just sayin...

fmbenhassine · 2024-10-14T16:34:02Z

@jmresler I just wanted to mention that we introduced a resourceless job repository implementation in 5.2.0-M2: #4679. This implementation does not require a dependency to any database , and is fast as it does not use or store batch meta-data in any form.

Any feedback is welcome to improve things for the RC and then the GA in November.

jmresler · 2024-10-15T02:35:38Z

Thanks for the work Sir. I'll check it out!

…

On Monday, October 14, 2024, Mahmoud Ben Hassine ***@***.***> wrote: @jmresler <https://github.com/jmresler> I just wanted to mention that we introduced a resourceless job repository implementation in 5.2.0-M2: #4679 <#4679>. This implementation does not require a dependency to any database , and is fast as it does not use or store batch meta-data in any form. Any feedback is welcome to improve things for the RC and then the GA in November. — Reply to this email directly, view it on GitHub <#3942 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7TENNPBS7WEOEJCCPH6H3Z3PXBBAVCNFSM6AAAAABP5LYTZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRG42DGOBWGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

--

_____ _____ _______ _____ _____ /\ /\ /\ /\ \ /\ \ /::\ \ /\ \ /\ \ / \ / \ / \ /::\ \ /::\ \ /::::\ \ /::\ \ /::\____\ / \ / \ / \ \:::\ \ /::::\ \ /::::::\ \ /::::\ \ /:::/ / / \ / \ / \ \:::\ \ /::::::\ \ /::::::::\ \ /::::::\ \ /:::/ / / \ / \ / \ \:::\ \ /:::/\:::\ \ /:::/~~\:::\ \ /:::/\:::\ \ /:::/ / / \ / \ / \ \:::\ \ /:::/__\:::\ \ /:::/ \:::\ \ /:::/ \:::\ \ /:::/____/ / \ / \ / \ /::::\ \ /::::\ \:::\ \ /:::/ / \:::\ \ /:::/ \:::\ \ /::::\ \ / \ / \ / \ ____ /::::::\ \ /::::::\ \:::\ \ /:::/____/ \:::\____\ /:::/ / \:::\ \ /::::::\____\________ / Lennert \ / Lennert \ / Lennert \ /\ \ /:::/\:::\ \ /:::/\:::\ \:::\____\ |:::| | |:::| | /:::/ / \:::\ \ /:::/\:::::::::::\ \ / Stock \/ Stock \/ Stock \ /::\ \/:::/ \:::\____\ /:::/ \:::\ \:::| ||:::|____| |:::| |/:::/____/ \:::\____\/:::/ |:::::::::::\____\\ 1994 /\ 1994 /\ 1994 / \:::\ /:::/ \::/ / \::/ |::::\ /:::|____| \:::\ \ /:::/ / \:::\ \ \::/ /\::/ |::|~~~|~~~~~ \ / \ / \ / \:::\/:::/ / \/____/ \/____|:::::\/:::/ / \:::\ \ /:::/ / \:::\ \ \/____/ \/____|::| | \ / \ / \ / \::::::/ / |:::::::::/ / \:::\ /:::/ / \:::\ \ |::| | \ / \ / \ / \::::/____/ |::|\::::/ / \:::\__/:::/ / \:::\ \ |::| | \ / \ / \ / \:::\ \ |::| \::/____/ \::::::::/ / \:::\ \ |::| | \ / \ / \ / \:::\ \ |::| ~| \::::::/ / \:::\ \ |::| | \ / \ / \ / \:::\ \ |::| | \::::/ / \:::\ \ |::| | \ / \ / \ / \:::\____\ \::| | \::/____/ \:::\____\ \::| | \ / \ / \ / \::/ / \:| | ~~ \::/ / \:| | \/ \/ \/ \/____/ \|___| \/____/ \|___|

fmbenhassine added the type: enhancement label Jun 16, 2021

fmbenhassine added this to the 5.0.0 milestone Jun 16, 2021

fmbenhassine self-assigned this Jun 16, 2021

fmbenhassine added the in: core label Jun 16, 2021

fmbenhassine mentioned this issue Jul 5, 2021

Make ScopeConfiguration publicly accessible #3958

Closed

fmbenhassine mentioned this issue Sep 3, 2021

Add support for MongoDB as JobRepository #877

Closed

fmbenhassine mentioned this issue May 17, 2022

Add java.util.UUID to the trusted classes list in Jackson2ExecutionContextStringSerializer #4110

Closed

fmbenhassine modified the milestones: 5.0.0, 5.0.0-M6 Aug 31, 2022

fmbenhassine mentioned this issue Sep 7, 2022

Deprecate Job/Step builder factories #4188

Closed

fmbenhassine mentioned this issue Sep 16, 2022

Revisit the configuration code of EnableBatchProcessing #4193

Closed

fmbenhassine closed this as completed in 48e437a Sep 20, 2022

This was referenced Sep 20, 2022

Regression in AbstractBatchConfiguration for dataSource Autowiring [BATCH-2819] #796

Closed

Not possible to use step scope when allowBeanDefinitionOverriding is false [BATCH-2552] #1050

Closed

Can't wrap JobRepository in a tracing representation #3899

Closed

fmbenhassine added a commit that referenced this issue Sep 20, 2022

Remove bean reference checks in BatchRegistrar

aaf22e1

These checks occur too early in the bean creation/registration process and might cause issues when initializing the context. Related to #3942

fmbenhassine added a commit that referenced this issue Sep 21, 2022

Add isolationLevelForCreate attribute in EnableBatchProcessing

1141975

Related to #3942

sambsnyd mentioned this issue Aug 9, 2023

Spring batch migration should remove extends DefaultBatchConfigurer openrewrite/rewrite-spring#412

Closed

klopfdreh mentioned this issue Jun 26, 2024

BadSqlGrammarException for Spring Boot 3 Task Applications spring-cloud/spring-cloud-dataflow#5848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit the configuration of infrastructure beans with @EnableBatchProcessing #3942

Revisit the configuration of infrastructure beans with @EnableBatchProcessing #3942

fmbenhassine commented Jun 16, 2021 •

edited

Loading

jmresler commented Nov 29, 2023

fmbenhassine commented Oct 14, 2024

jmresler commented Oct 15, 2024 via email

Revisit the configuration of infrastructure beans with @EnableBatchProcessing #3942

Revisit the configuration of infrastructure beans with @EnableBatchProcessing #3942

Comments

fmbenhassine commented Jun 16, 2021 • edited Loading

1. Customization of infrastructure beans is not straightforward

2. Unconditional exposure of some beans

3. Extending infrastructure beans is not straightforward / possible

4. BatchConfigurer is eventually an unnecessary level of indirection

5. Confusing configuration when batch meta-data is not required

Possible solutions

1. Use annotation attributes to customize infrastructure beans

2. Provide a base configuration class with infrastructure beans

jmresler commented Nov 29, 2023

fmbenhassine commented Oct 14, 2024

jmresler commented Oct 15, 2024 via email

fmbenhassine commented Jun 16, 2021 •

edited

Loading

4. `BatchConfigurer` is eventually an unnecessary level of indirection