From a604ed3f162409f81465fb23b2e315b0497da5f7 Mon Sep 17 00:00:00 2001 From: Jay Bryant Date: Fri, 25 Mar 2022 09:15:46 -0500 Subject: [PATCH 1/2] Editing pass to account for recent changes. --- .../src/main/asciidoc/appendix.adoc | 108 ++-- .../src/main/asciidoc/common-patterns.adoc | 1 - .../src/main/asciidoc/domain.adoc | 225 ++++---- .../src/main/asciidoc/glossary.adoc | 21 +- .../src/main/asciidoc/index-single.adoc | 2 + .../src/main/asciidoc/index.adoc | 16 +- spring-batch-docs/src/main/asciidoc/job.adoc | 118 ++-- .../main/asciidoc/monitoring-and-metrics.adoc | 18 +- .../src/main/asciidoc/processor.adoc | 54 +- .../src/main/asciidoc/readersAndWriters.adoc | 2 +- .../src/main/asciidoc/repeat.adoc | 64 +-- .../src/main/asciidoc/retry.adoc | 4 +- .../src/main/asciidoc/scalability.adoc | 115 ++-- .../src/main/asciidoc/schema-appendix.adoc | 100 ++-- .../asciidoc/spring-batch-architecture.adoc | 427 +++++++++++++++ .../asciidoc/spring-batch-integration.adoc | 260 ++++----- .../src/main/asciidoc/spring-batch-intro.adoc | 503 ++---------------- spring-batch-docs/src/main/asciidoc/step.adoc | 468 ++++++++-------- .../src/main/asciidoc/testing.adoc | 59 +- .../main/asciidoc/transaction-appendix.adoc | 79 ++- .../src/main/asciidoc/whatsnew.adoc | 55 +- 21 files changed, 1324 insertions(+), 1375 deletions(-) create mode 100644 spring-batch-docs/src/main/asciidoc/spring-batch-architecture.adoc diff --git a/spring-batch-docs/src/main/asciidoc/appendix.adoc b/spring-batch-docs/src/main/asciidoc/appendix.adoc index 502e3d0590..0338c768a1 100644 --- a/spring-batch-docs/src/main/asciidoc/appendix.adoc +++ b/spring-batch-docs/src/main/asciidoc/appendix.adoc @@ -3,133 +3,129 @@ :toclevels: 4 [[listOfReadersAndWriters]] - [appendix] == List of ItemReaders and ItemWriters [[itemReadersAppendix]] - === Item Readers .Available Item Readers [options="header"] |=============== |Item Reader|Description -|AbstractItemCountingItemStreamItemReader|Abstract base class that provides basic +|`AbstractItemCountingItemStreamItemReader`|Abstract base class that provides basic restart capabilities by counting the number of items returned from an `ItemReader`. -|AggregateItemReader|An `ItemReader` that delivers a list as its +|`AggregateItemReader`|An `ItemReader` that delivers a list as its item, storing up objects from the injected `ItemReader` until they - are ready to be packed out as a collection. This class must be used - as a wrapper for a custom `ItemReader` that can identify the record - boundaries. The custom reader should mark the beginning and end of - records by returning an `AggregateItem` which responds `true` to its - query methods `isHeader()` and `isFooter()`. Note that this reader + are ready to be packed out as a collection. This class must be used + as a wrapper for a custom `ItemReader` that can identify the record + boundaries. The custom reader should mark the beginning and end of + records by returning an `AggregateItem` which responds `true` to its + query methods (`isHeader()` and `isFooter()`). Note that this reader is not part of the library of readers provided by Spring Batch but given as a sample in `spring-batch-samples`. -|AmqpItemReader|Given a Spring `AmqpTemplate`, it provides +|`AmqpItemReader`|Given a Spring `AmqpTemplate`, it provides synchronous receive methods. The `receiveAndConvert()` method lets you receive POJO objects. -|KafkaItemReader|An `ItemReader` that reads messages from an Apache Kafka topic. +|`KafkaItemReader`|An `ItemReader` that reads messages from an Apache Kafka topic. It can be configured to read messages from multiple partitions of the same topic. This reader stores message offsets in the execution context to support restart capabilities. -|FlatFileItemReader|Reads from a flat file. Includes `ItemStream` - and `Skippable` functionality. See link:readersAndWriters.html#flatFileItemReader[`FlatFileItemReader`]. -|HibernateCursorItemReader|Reads from a cursor based on an HQL query. See +|`FlatFileItemReader`|Reads from a flat file. Includes `ItemStream` + and `Skippable` functionality. See link:readersAndWriters.html#flatFileItemReader["`FlatFileItemReader`"]. +|`HibernateCursorItemReader`|Reads from a cursor based on an HQL query. See link:readersAndWriters.html#cursorBasedItemReaders[`Cursor-based ItemReaders`]. -|HibernatePagingItemReader|Reads from a paginated HQL query -|ItemReaderAdapter|Adapts any class to the +|`HibernatePagingItemReader`|Reads from a paginated HQL query. +|`ItemReaderAdapter`|Adapts any class to the `ItemReader` interface. -|JdbcCursorItemReader|Reads from a database cursor via JDBC. See - link:readersAndWriters.html#cursorBasedItemReaders[`Cursor-based ItemReaders`]. -|JdbcPagingItemReader|Given an SQL statement, pages through the rows, +|`JdbcCursorItemReader`|Reads from a database cursor over JDBC. See + link:readersAndWriters.html#cursorBasedItemReaders["`Cursor-based ItemReaders`"]. +|`JdbcPagingItemReader`|Given an SQL statement, pages through the rows, such that large datasets can be read without running out of memory. -|JmsItemReader|Given a Spring `JmsOperations` object and a JMS - Destination or destination name to which to send errors, provides items +|`JmsItemReader`|Given a Spring `JmsOperations` object and a JMS + destination or destination name to which to send errors, provides items received through the injected `JmsOperations#receive()` method. -|JpaPagingItemReader|Given a JPQL statement, pages through the +|`JpaPagingItemReader`|Given a JPQL statement, pages through the rows, such that large datasets can be read without running out of memory. -|ListItemReader|Provides the items from a list, one at a +|`ListItemReader`|Provides the items from a list, one at a time. -|MongoItemReader|Given a `MongoOperations` object and a JSON-based MongoDB +|`MongoItemReader`|Given a `MongoOperations` object and a JSON-based MongoDB query, provides items received from the `MongoOperations#find()` method. -|Neo4jItemReader|Given a `Neo4jOperations` object and the components of a +|`Neo4jItemReader`|Given a `Neo4jOperations` object and the components of a Cyhper query, items are returned as the result of the Neo4jOperations.query method. -|RepositoryItemReader|Given a Spring Data `PagingAndSortingRepository` object, +|`RepositoryItemReader`|Given a Spring Data `PagingAndSortingRepository` object, a `Sort`, and the name of method to execute, returns items provided by the Spring Data repository implementation. -|StoredProcedureItemReader|Reads from a database cursor resulting from the +|`StoredProcedureItemReader`|Reads from a database cursor resulting from the execution of a database stored procedure. See link:readersAndWriters.html#StoredProcedureItemReader[`StoredProcedureItemReader`] -|StaxEventItemReader|Reads via StAX. see link:readersAndWriters.html#StaxEventItemReader[`StaxEventItemReader`]. -|JsonItemReader|Reads items from a Json document. see link:readersAndWriters.html#JsonItemReader[`JsonItemReader`]. +|`StaxEventItemReader`|Reads over StAX. see link:readersAndWriters.html#StaxEventItemReader[`StaxEventItemReader`]. +|`JsonItemReader`|Reads items from a Json document. see link:readersAndWriters.html#JsonItemReader[`JsonItemReader`]. |=============== [[itemWritersAppendix]] - - === Item Writers .Available Item Writers [options="header"] |=============== |Item Writer|Description -|AbstractItemStreamItemWriter|Abstract base class that combines the +|`AbstractItemStreamItemWriter`|Abstract base class that combines the `ItemStream` and `ItemWriter` interfaces. -|AmqpItemWriter|Given a Spring `AmqpTemplate`, it provides +|`AmqpItemWriter`|Given a Spring `AmqpTemplate`, provides for a synchronous `send` method. The `convertAndSend(Object)` method lets you send POJO objects. -|CompositeItemWriter|Passes an item to the `write` method of each +|`CompositeItemWriter`|Passes an item to the `write` method of each item in an injected `List` of `ItemWriter` objects. -|FlatFileItemWriter|Writes to a flat file. Includes `ItemStream` and - Skippable functionality. See link:readersAndWriters.html#flatFileItemWriter[`FlatFileItemWriter`]. -|GemfireItemWriter|Using a `GemfireOperations` object, items are either written +|`FlatFileItemWriter`|Writes to a flat file. Includes `ItemStream` and + Skippable functionality. See link:readersAndWriters.html#flatFileItemWriter["`FlatFileItemWriter`"]. +|`GemfireItemWriter`|Using a `GemfireOperations` object, items are either written or removed from the Gemfire instance based on the configuration of the delete flag. -|HibernateItemWriter|This item writer is Hibernate-session aware - and handles some transaction-related work that a non-"hibernate-aware" +|`HibernateItemWriter`|This item writer is Hibernate-session aware + and handles some transaction-related work that a non-"`hibernate-aware`" item writer would not need to know about and then delegates to another item writer to do the actual writing. -|ItemWriterAdapter|Adapts any class to the +|`ItemWriterAdapter`|Adapts any class to the `ItemWriter` interface. -|JdbcBatchItemWriter|Uses batching features from a +|`JdbcBatchItemWriter`|Uses batching features from a `PreparedStatement`, if available, and can take rudimentary steps to locate a failure during a `flush`. -|JmsItemWriter|Using a `JmsOperations` object, items are written +|`JmsItemWriter`|Using a `JmsOperations` object, items are written to the default queue through the `JmsOperations#convertAndSend()` method. -|JpaItemWriter|This item writer is JPA EntityManager-aware - and handles some transaction-related work that a non-"JPA-aware" +|`JpaItemWriter`|This item writer is JPA `EntityManager`-aware + and handles some transaction-related work that a non-"`JPA-aware`" `ItemWriter` would not need to know about and then delegates to another writer to do the actual writing. -|KafkaItemWriter|Using a `KafkaTemplate` object, items are written to the default topic through the - `KafkaTemplate#sendDefault(Object, Object)` method using a `Converter` to map the key from the item. +|`KafkaItemWriter`|Using a `KafkaTemplate` object, items are written to the default topic through the + `KafkaTemplate#sendDefault(Object, Object)` method by using a `Converter` to map the key from the item. A delete flag can also be configured to send delete events to the topic. -|MimeMessageItemWriter|Using Spring's `JavaMailSender`, items of type `MimeMessage` +|`MimeMessageItemWriter`|Using Spring's `JavaMailSender`, items of type `MimeMessage` are sent as mail messages. -|MongoItemWriter|Given a `MongoOperations` object, items are written +|`MongoItemWriter`|Given a `MongoOperations` object, items are written through the `MongoOperations.save(Object)` method. The actual write is delayed until the last possible moment before the transaction commits. -|Neo4jItemWriter|Given a `Neo4jOperations` object, items are persisted through the - `save(Object)` method or deleted through the `delete(Object)` per the +|`Neo4jItemWriter`|Given a `Neo4jOperations` object, items are persisted through the + `save(Object)` method or deleted through the `delete(Object)`, as dictated by the `ItemWriter's` configuration -|PropertyExtractingDelegatingItemWriter|Extends `AbstractMethodInvokingDelegator` +|`PropertyExtractingDelegatingItemWriter`|Extends `AbstractMethodInvokingDelegator` creating arguments on the fly. Arguments are created by retrieving the values from the fields in the item to be processed (through a `SpringBeanWrapper`), based on an injected array of field names. -|RepositoryItemWriter|Given a Spring Data `CrudRepository` implementation, +|`RepositoryItemWriter`|Given a Spring Data `CrudRepository` implementation, items are saved through the method specified in the configuration. -|StaxEventItemWriter|Uses a `Marshaller` implementation to - convert each item to XML and then writes it to an XML file using +|`StaxEventItemWriter`|Uses a `Marshaller` implementation to + convert each item to XML and then writes it to an XML file by using StAX. -|JsonFileItemWriter|Uses a `JsonObjectMarshaller` implementation to - convert each item to Json and then writes it to an Json file. +|`JsonFileItemWriter`|Uses a `JsonObjectMarshaller` implementation to + convert each item to Json and then writes it to a Json file. |=============== diff --git a/spring-batch-docs/src/main/asciidoc/common-patterns.adoc b/spring-batch-docs/src/main/asciidoc/common-patterns.adoc index 6ebc84af1d..30a116a644 100644 --- a/spring-batch-docs/src/main/asciidoc/common-patterns.adoc +++ b/spring-batch-docs/src/main/asciidoc/common-patterns.adoc @@ -3,7 +3,6 @@ :toclevels: 4 [[commonPatterns]] - == Common Batch Patterns ifndef::onlyonetoggle[] diff --git a/spring-batch-docs/src/main/asciidoc/domain.adoc b/spring-batch-docs/src/main/asciidoc/domain.adoc index cd4a10fd46..c22589d6ac 100644 --- a/spring-batch-docs/src/main/asciidoc/domain.adoc +++ b/spring-batch-docs/src/main/asciidoc/domain.adoc @@ -3,7 +3,6 @@ :toclevels: 4 [[domainLanguageOfBatch]] - == The Domain Language of Batch ifndef::onlyonetoggle[] @@ -11,7 +10,7 @@ include::toggle.adoc[] endif::onlyonetoggle[] To any experienced batch architect, the overall concepts of batch processing used in -Spring Batch should be familiar and comfortable. There are "Jobs" and "Steps" and +Spring Batch should be familiar and comfortable. There are "`Jobs`" and "`Steps`" and developer-supplied processing units called `ItemReader` and `ItemWriter`. However, because of the Spring patterns, operations, templates, callbacks, and idioms, there are opportunities for the following: @@ -19,14 +18,14 @@ opportunities for the following: * Significant improvement in adherence to a clear separation of concerns. * Clearly delineated architectural layers and services provided as interfaces. * Simple and default implementations that allow for quick adoption and ease of use -out-of-the-box. +out of the box. * Significantly enhanced extensibility. The following diagram is a simplified version of the batch reference architecture that has been used for decades. It provides an overview of the components that make up the domain language of batch processing. This architecture framework is a blueprint that has been proven through decades of implementations on the last several generations of -platforms (COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers +platforms (COBOL on mainframes, C++ on Unix, and now Java anywhere). JCL and COBOL developers are likely to be as comfortable with the concepts as C++, C#, and Java developers. Spring Batch provides a physical implementation of the layers, components, and technical services commonly found in the robust, maintainable systems that are used to address the @@ -37,7 +36,7 @@ to address very complex processing needs. image::{batch-asciidoc}images/spring-batch-reference-model.png[Figure 2.1: Batch Stereotypes, scaledwidth="60%"] The preceding diagram highlights the key concepts that make up the domain language of -Spring Batch. A Job has one to many steps, each of which has exactly one `ItemReader`, +Spring Batch. A `Job` has one to many steps, each of which has exactly one `ItemReader`, one `ItemProcessor`, and one `ItemWriter`. A job needs to be launched (with `JobLauncher`), and metadata about the currently running process needs to be stored (in `JobRepository`). @@ -47,28 +46,29 @@ one `ItemProcessor`, and one `ItemWriter`. A job needs to be launched (with This section describes stereotypes relating to the concept of a batch job. A `Job` is an entity that encapsulates an entire batch process. As is common with other Spring projects, a `Job` is wired together with either an XML configuration file or Java-based -configuration. This configuration may be referred to as the "job configuration". However, -`Job` is just the top of an overall hierarchy, as shown in the following diagram: +configuration. This configuration may be referred to as the "`job configuration`". However, +`Job` is only the top of an overall hierarchy, as shown in the following diagram: .Job Hierarchy image::{batch-asciidoc}images/job-heirarchy.png[Job Hierarchy, scaledwidth="60%"] In Spring Batch, a `Job` is simply a container for `Step` instances. It combines multiple -steps that belong logically together in a flow and allows for configuration of properties +steps that logically belong together in a flow and allows for configuration of properties global to all steps, such as restartability. The job configuration contains: -* The simple name of the job. +* The name of the job. * Definition and ordering of `Step` instances. * Whether or not the job is restartable. ifdef::backend-html5[] [role="javaContent"] For those who use Java configuration, Spring Batch provides a default implementation of -the Job interface in the form of the `SimpleJob` class, which creates some standard -functionality on top of `Job`. When using java based configuration, a collection of -builders is made available for the instantiation of a `Job`, as shown in the following -example: +the `Job` interface in the form of the `SimpleJob` class, which creates some standard +functionality on top of `Job`. When using Java-based configuration, a collection of +builders is made available for the instantiation of a `Job`, as the following +example shows: +==== [source, java, role="javaContent"] ---- @Bean @@ -80,14 +80,16 @@ public Job footballJob() { .build(); } ---- +==== [role="xmlContent"] For those who use XML configuration, Spring Batch provides a default implementation of the `Job` interface in the form of the `SimpleJob` class, which creates some standard functionality on top of `Job`. However, the batch namespace abstracts away the need to -instantiate it directly. Instead, the `` element can be used, as shown in the -following example: +instantiate it directly. Instead, you can use the `` element, as the +following example shows: +==== [source, xml, role="xmlContent"] ---- @@ -96,14 +98,16 @@ following example: ---- +==== endif::backend-html5[] ifdef::backend-pdf[] -Spring Batch provides a default implementation of the Job interface in the form of the +Spring Batch provides a default implementation of the `Job` interface in the form of the `SimpleJob` class, which creates some standard functionality on top of `Job`. When using Java-based configuration, a collection of builders are made available for the -instantiation of a `Job`, as shown in the following example: +instantiation of a `Job`, as the following example shows: +==== [source, java] ---- @Bean @@ -115,11 +119,13 @@ public Job footballJob() { .build(); } ---- +==== However, when using XML configuration, the batch namespace abstracts away the need to -instantiate it directly. Instead, the `` tag can be used as shown in the following -example: +instantiate it directly. Instead, you can use the `` element, as the following +example shows: +==== [source, xml] ---- @@ -128,46 +134,47 @@ example: ---- +==== endif::backend-pdf[] ==== JobInstance A `JobInstance` refers to the concept of a logical job run. Consider a batch job that -should be run once at the end of the day, such as the 'EndOfDay' `Job` from the preceding -diagram. There is one 'EndOfDay' job, but each individual run of the `Job` must be +should be run once at the end of the day, such as the `EndOfDay` `Job` from the preceding +diagram. There is one `EndOfDay` job, but each individual run of the `Job` must be tracked separately. In the case of this job, there is one logical `JobInstance` per day. For example, there is a January 1st run, a January 2nd run, and so on. If the January 1st run fails the first time and is run again the next day, it is still the January 1st run. (Usually, this corresponds with the data it is processing as well, meaning the January 1st run processes data for January 1st). Therefore, each `JobInstance` can have multiple executions (`JobExecution` is discussed in more detail later in this chapter), and only -one `JobInstance` corresponding to a particular `Job` and identifying `JobParameters` can +one `JobInstance` (which corresponds to a particular `Job` and identifying `JobParameters`) can run at a given time. The definition of a `JobInstance` has absolutely no bearing on the data to be loaded. It is entirely up to the `ItemReader` implementation to determine how data is loaded. For -example, in the EndOfDay scenario, there may be a column on the data that indicates the -'effective date' or 'schedule date' to which the data belongs. So, the January 1st run +example, in the `EndOfDay` scenario, there may be a column on the data that indicates the +`effective date` or `schedule date` to which the data belongs. So, the January 1st run would load only data from the 1st, and the January 2nd run would use only data from the 2nd. Because this determination is likely to be a business decision, it is left up to the `ItemReader` to decide. However, using the same `JobInstance` determines whether or not -the 'state' (that is, the `ExecutionContext`, which is discussed later in this chapter) -from previous executions is used. Using a new `JobInstance` means 'start from the -beginning', and using an existing instance generally means 'start from where you left -off'. +the "`state`" (that is, the `ExecutionContext`, which is discussed later in this chapter) +from previous executions is used. Using a new `JobInstance` means "`start from the +beginning,`" and using an existing instance generally means "`start from where you left +off`". ==== JobParameters -Having discussed `JobInstance` and how it differs from Job, the natural question to ask -is: "How is one `JobInstance` distinguished from another?" The answer is: +Having discussed `JobInstance` and how it differs from `Job`, the natural question to ask +is: "`How is one `JobInstance` distinguished from another?`" The answer is: `JobParameters`. A `JobParameters` object holds a set of parameters used to start a batch -job. They can be used for identification or even as reference data during the run, as -shown in the following image: +job. They can be used for identification or even as reference data during the run, as the +following image shows: .Job Parameters image::{batch-asciidoc}images/job-stereotypes-parameters.png[Job Parameters, scaledwidth="60%"] -In the preceding example, where there are two instances, one for January 1st, and another +In the preceding example, where there are two instances, one for January 1st and another for January 2nd, there is really only one `Job`, but it has two `JobParameter` objects: one that was started with a job parameter of 01-01-2017 and another that was started with a parameter of 01-02-2017. Thus, the contract can be defined as: `JobInstance` = `Job` @@ -183,7 +190,7 @@ of a `Job` with parameters that do not contribute to the identity of a `JobInsta A `JobExecution` refers to the technical concept of a single attempt to run a Job. An execution may end in failure or success, but the `JobInstance` corresponding to a given execution is not considered to be complete unless the execution completes successfully. -Using the EndOfDay `Job` described previously as an example, consider a `JobInstance` for +Using the `EndOfDay` `Job` described previously as an example, consider a `JobInstance` for 01-01-2017 that failed the first time it was run. If it is run again with the same identifying job parameters as the first run (01-01-2017), a new `JobExecution` is created. However, there is still only one `JobInstance`. @@ -192,52 +199,52 @@ A `Job` defines what a job is and how it is to be executed, and a `JobInstance` purely organizational object to group executions together, primarily to enable correct restart semantics. A `JobExecution`, however, is the primary storage mechanism for what actually happened during a run and contains many more properties that must be controlled -and persisted, as shown in the following table: +and persisted, as the following table shows: .JobExecution Properties |=== |Property |Definition -|Status +|`Status` |A `BatchStatus` object that indicates the status of the execution. While running, it is `BatchStatus#STARTED`. If it fails, it is `BatchStatus#FAILED`. If it finishes successfully, it is `BatchStatus#COMPLETED` -|startTime +|`startTime` |A `java.util.Date` representing the current system time when the execution was started. This field is empty if the job has yet to start. -|endTime +|`endTime` |A `java.util.Date` representing the current system time when the execution finished, regardless of whether or not it was successful. The field is empty if the job has yet to finish. -|exitStatus +|`exitStatus` |The `ExitStatus`, indicating the result of the run. It is most important, because it contains an exit code that is returned to the caller. See chapter 5 for more details. The field is empty if the job has yet to finish. -|createTime +|`createTime` |A `java.util.Date` representing the current system time when the `JobExecution` was first persisted. The job may not have been started yet (and thus has no start time), but -it always has a createTime, which is required by the framework for managing job level +it always has a `createTime`, which is required by the framework for managing job-level `ExecutionContexts`. -|lastUpdated +|`lastUpdated` |A `java.util.Date` representing the last time a `JobExecution` was persisted. This field is empty if the job has yet to start. -|executionContext -|The "property bag" containing any user data that needs to be persisted between +|`executionContext` +|The "`property bag`" containing any user data that needs to be persisted between executions. -|failureExceptions +|`failureExceptions` |The list of exceptions encountered during the execution of a `Job`. These can be useful if more than one exception is encountered during the failure of a `Job`. |=== These properties are important because they are persisted and can be used to completely -determine the status of an execution. For example, if the EndOfDay job for 01-01 is +determine the status of an execution. For example, if the `EndOfDay` job for 01-01 is executed at 9:00 PM and fails at 9:30, the following entries are made in the batch metadata tables: @@ -273,7 +280,7 @@ NOTE: Column names may have been abbreviated or removed for the sake of clarity formatting. Now that the job has failed, assume that it took the entire night for the problem to be -determined, so that the 'batch window' is now closed. Further assuming that the window +determined, so that the "`batch window`" is now closed. Further assuming that the window starts at 9:00 PM, the job is kicked off again for 01-01, starting where it left off and completing successfully at 9:30. Because it is now the next day, the 01-02 job must be run as well, and it is kicked off just afterwards at 9:31 and completes in its normal one @@ -347,7 +354,7 @@ formatting. === Step A `Step` is a domain object that encapsulates an independent, sequential phase of a batch -job. Therefore, every Job is composed entirely of one or more steps. A `Step` contains +job. Therefore, every `Job` is composed entirely of one or more steps. A `Step` contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given `Step` are at the discretion of the developer writing a `Job`. A `Step` can be as simple or complex as the @@ -355,7 +362,7 @@ developer desires. A simple `Step` might load data from a file into the database requiring little or no code (depending upon the implementations used). A more complex `Step` may have complicated business rules that are applied as part of the processing. As with a `Job`, a `Step` has an individual `StepExecution` that correlates with a unique -`JobExecution`, as shown in the following image: +`JobExecution`, as the following image shows: .Job Hierarchy With Steps image::{batch-asciidoc}images/jobHeirarchyWithSteps.png[Figure 2.1: Job Hierarchy With Steps, scaledwidth="60%"] @@ -368,7 +375,7 @@ to execute because the step before it fails, no execution is persisted for it. A `StepExecution` is created only when its `Step` is actually started. `Step` executions are represented by objects of the `StepExecution` class. Each execution -contains a reference to its corresponding step and `JobExecution` and transaction related +contains a reference to its corresponding step and `JobExecution` and transaction-related data, such as commit and rollback counts and start and end times. Additionally, each step execution contains an `ExecutionContext`, which contains any data a developer needs to have persisted across batch runs, such as statistics or state information needed to @@ -377,76 +384,78 @@ restart. The following table lists the properties for `StepExecution`: .StepExecution Properties |=== |Property|Definition -|Status +|`Status` |A `BatchStatus` object that indicates the status of the execution. While running, the status is `BatchStatus.STARTED`. If it fails, the status is `BatchStatus.FAILED`. If it finishes successfully, the status is `BatchStatus.COMPLETED`. -|startTime +|`startTime` |A `java.util.Date` representing the current system time when the execution was started. This field is empty if the step has yet to start. -|endTime +|`endTime` |A `java.util.Date` representing the current system time when the execution finished, regardless of whether or not it was successful. This field is empty if the step has yet to exit. -|exitStatus +|`exitStatus` |The `ExitStatus` indicating the result of the execution. It is most important, because it contains an exit code that is returned to the caller. See chapter 5 for more details. This field is empty if the job has yet to exit. -|executionContext -|The "property bag" containing any user data that needs to be persisted between +|`executionContext` +|The "`property bag`" containing any user data that needs to be persisted between executions. -|readCount +|`readCount` |The number of items that have been successfully read. -|writeCount +|`writeCount` |The number of items that have been successfully written. -|commitCount +|`commitCount` |The number of transactions that have been committed for this execution. -|rollbackCount +|`rollbackCount` |The number of times the business transaction controlled by the `Step` has been rolled back. -|readSkipCount +|`readSkipCount` |The number of times `read` has failed, resulting in a skipped item. -|processSkipCount +|`processSkipCount` |The number of times `process` has failed, resulting in a skipped item. -|filterCount -|The number of items that have been 'filtered' by the `ItemProcessor`. +|`filterCount` +|The number of items that have been "`filtered`" by the `ItemProcessor`. -|writeSkipCount +|`writeSkipCount` |The number of times `write` has failed, resulting in a skipped item. |=== === ExecutionContext An `ExecutionContext` represents a collection of key/value pairs that are persisted and -controlled by the framework in order to allow developers a place to store persistent -state that is scoped to a `StepExecution` object or a `JobExecution` object. For those -familiar with Quartz, it is very similar to JobDataMap. The best usage example is to +controlled by the framework to give developers a place to store persistent +state that is scoped to a `StepExecution` object or a `JobExecution` object. (For those +familiar with Quartz, it is very similar to `JobDataMap`.) The best usage example is to facilitate restart. Using flat file input as an example, while processing individual lines, the framework periodically persists the `ExecutionContext` at commit points. Doing -so allows the `ItemReader` to store its state in case a fatal error occurs during the run +so lets the `ItemReader` store its state in case a fatal error occurs during the run or even if the power goes out. All that is needed is to put the current number of lines -read into the context, as shown in the following example, and the framework will do the +read into the context, as the following example shows, and the framework does the rest: +==== [source, java] ---- executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition()); ---- +==== -Using the EndOfDay example from the `Job` Stereotypes section as an example, assume there -is one step, 'loadData', that loads a file into the database. After the first failed run, +Using the `EndOfDay` example from the `Job` stereotypes section as an example, assume there +is one step, `loadData`, that loads a file into the database. After the first failed run, the metadata tables would look like the following example: .BATCH_JOB_INSTANCE @@ -493,7 +502,7 @@ the metadata tables would look like the following example: |{piece.count=40321} |=== -In the preceding case, the `Step` ran for 30 minutes and processed 40,321 'pieces', which +In the preceding case, the `Step` ran for 30 minutes and processed 40,321 "`pieces`", which would represent lines in a file in this scenario. This value is updated just before each commit by the framework and can contain multiple rows corresponding to entries within the `ExecutionContext`. Being notified before a commit requires one of the various @@ -502,8 +511,9 @@ later in this guide. As with the previous example, it is assumed that the `Job` restarted the next day. When it is restarted, the values from the `ExecutionContext` of the last run are reconstituted from the database. When the `ItemReader` is opened, it can check to see if it has any stored state in the context and initialize itself from there, -as shown in the following example: +as the following example shows: +==== [source, java] ---- if (executionContext.containsKey(getKey(LINES_READ_COUNT))) { @@ -519,16 +529,18 @@ if (executionContext.containsKey(getKey(LINES_READ_COUNT))) { } } ---- -In this case, after the above code runs, the current line is 40,322, allowing the `Step` -to start again from where it left off. The `ExecutionContext` can also be used for +==== + +In this case, after the preceding code runs, the current line is 40,322, letting the `Step` +start again from where it left off. You can also use the `ExecutionContext` for statistics that need to be persisted about the run itself. For example, if a flat file contains orders for processing that exist across multiple lines, it may be necessary to store how many orders have been processed (which is much different from the number of lines read), so that an email can be sent at the end of the `Step` with the total number -of orders processed in the body. The framework handles storing this for the developer, in -order to correctly scope it with an individual `JobInstance`. It can be very difficult to +of orders processed in the body. The framework handles storing this for the developer, +to correctly scope it with an individual `JobInstance`. It can be very difficult to know whether an existing `ExecutionContext` should be used or not. For example, using the -'EndOfDay' example from above, when the 01-01 run starts again for the second time, the +`EndOfDay` example from above, when the 01-01 run starts again for the second time, the framework recognizes that it is the same `JobInstance` and on an individual `Step` basis, pulls the `ExecutionContext` out of the database, and hands it (as part of the `StepExecution`) to the `Step` itself. Conversely, for the 01-02 run, the framework @@ -541,16 +553,18 @@ keyspace. As a result, care should be taken when putting values in to ensure no overwritten. However, the `Step` stores absolutely no data in the context, so there is no way to adversely affect the framework. -It is also important to note that there is at least one `ExecutionContext` per +Note that there is at least one `ExecutionContext` per `JobExecution` and one for every `StepExecution`. For example, consider the following code snippet: +==== [source, java] ---- ExecutionContext ecStep = stepExecution.getExecutionContext(); ExecutionContext ecJob = jobExecution.getExecutionContext(); //ecStep does not equal ecJob ---- +==== As noted in the comment, `ecStep` does not equal `ecJob`. They are two different `ExecutionContexts`. The one scoped to the `Step` is saved at every commit point in the @@ -558,30 +572,33 @@ As noted in the comment, `ecStep` does not equal `ecJob`. They are two different === JobRepository -`JobRepository` is the persistence mechanism for all of the Stereotypes mentioned above. +`JobRepository` is the persistence mechanism for all of the stereotypes mentioned earlier. It provides CRUD operations for `JobLauncher`, `Job`, and `Step` implementations. When a -`Job` is first launched, a `JobExecution` is obtained from the repository, and, during +`Job` is first launched, a `JobExecution` is obtained from the repository. Also, during the course of execution, `StepExecution` and `JobExecution` implementations are persisted by passing them to the repository. [role="xmlContent"] The Spring Batch XML namespace provides support for configuring a `JobRepository` instance -with the `` tag, as shown in the following example: +with the `` tag, as the following example shows: +==== [source, xml, role="xmlContent"] ---- ---- +==== [role="javaContent"] When using Java configuration, the `@EnableBatchProcessing` annotation provides a -`JobRepository` as one of the components automatically configured out of the box. +`JobRepository` as one of the components that is automatically configured. === JobLauncher `JobLauncher` represents a simple interface for launching a `Job` with a given set of -`JobParameters`, as shown in the following example: +`JobParameters`, as the following example shows: +==== [source, java] ---- public interface JobLauncher { @@ -591,43 +608,46 @@ public JobExecution run(Job job, JobParameters jobParameters) JobInstanceAlreadyCompleteException, JobParametersInvalidException; } ---- +==== + It is expected that implementations obtain a valid `JobExecution` from the `JobRepository` and execute the `Job`. -=== Item Reader +=== ItemReader `ItemReader` is an abstraction that represents the retrieval of input for a `Step`, one item at a time. When the `ItemReader` has exhausted the items it can provide, it -indicates this by returning `null`. More details about the `ItemReader` interface and its -various implementations can be found in +indicates this by returning `null`. You can find more details about the `ItemReader` interface and its +various implementations in <>. -=== Item Writer +=== ItemWriter `ItemWriter` is an abstraction that represents the output of a `Step`, one batch or chunk -of items at a time. Generally, an `ItemWriter` has no knowledge of the input it should -receive next and knows only the item that was passed in its current invocation. More -details about the `ItemWriter` interface and its various implementations can be found in +of items at a time. Generally, an `ItemWriter` has no knowledge of the input it should +receive next and knows only the item that was passed in its current invocation. You can find more +details about the `ItemWriter` interface and its various implementations in <>. -=== Item Processor +=== ItemProcessor `ItemProcessor` is an abstraction that represents the business processing of an item. -While the `ItemReader` reads one item, and the `ItemWriter` writes them, the +While the `ItemReader` reads one item, and the `ItemWriter` writes one item, the `ItemProcessor` provides an access point to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning -`null` indicates that the item should not be written out. More details about the -`ItemProcessor` interface can be found in +`null` indicates that the item should not be written out. You can find more details about the +`ItemProcessor` interface in <>. [role="xmlContent"] === Batch Namespace Many of the domain concepts listed previously need to be configured in a Spring -`ApplicationContext`. While there are implementations of the interfaces above that can be -used in a standard bean definition, a namespace has been provided for ease of -configuration, as shown in the following example: +`ApplicationContext`. While there are implementations of the interfaces above that you can +use in a standard bean definition, a namespace has been provided for ease of +configuration, as the following example shows: +==== [source, xml, role="xmlContent"] ---- >. More information on configuring a `Step` can be found in +As long as the batch namespace has been declared, any of its elements can be used. You can find more +information on configuring a Job in <>. You can find more information on configuring a `Step` in <>. diff --git a/spring-batch-docs/src/main/asciidoc/glossary.adoc b/spring-batch-docs/src/main/asciidoc/glossary.adoc index 04822a345d..4bedab1c43 100644 --- a/spring-batch-docs/src/main/asciidoc/glossary.adoc +++ b/spring-batch-docs/src/main/asciidoc/glossary.adoc @@ -1,6 +1,7 @@ [[glossary]] [appendix] == Glossary + [glossary] === Spring Batch Glossary @@ -27,7 +28,7 @@ Batch Window:: Step:: The main batch task or unit of work. It initializes the business logic and controls the - transaction environment, based on commit interval setting and other factors. + transaction environment, based on the commit interval setting and other factors. Tasklet:: A component created by an application developer to process the business logic for a @@ -41,7 +42,7 @@ Batch Job Type:: Driving Query:: A driving query identifies the set of work for a job to do. The job then breaks that work into individual units of work. For instance, a driving query might be to identify - all financial transactions that have a status of "pending transmission" and send them + all financial transactions that have a status of "`pending transmission`" and send them to a partner system. The driving query returns a set of record IDs to process. Each record ID then becomes a unit of work. A driving query may involve a join (if the criteria for selection falls across two or more tables) or it may work with a single @@ -52,7 +53,7 @@ Item:: terms, this might be a line in a file, a row in a database table, or a particular element in an XML file. -Logical Unit of Work (LUW):: +Logicial Unit of Work (LUW):: A batch job iterates through a driving query (or other input source, such as a file) to perform the set of work that the job must accomplish. Each iteration of work performed is a unit of work. @@ -70,27 +71,27 @@ Staging Table:: Restartable:: A job that can be executed again and assumes the same identity as when run initially. - In other words, it is has the same job instance ID. + In other words, it has the same job instance ID. Rerunnable:: A job that is restartable and manages its own state in terms of the previous run's - record processing. An example of a rerunnable step is one based on a driving query. If + record processing. An example of a re-runnable step is one based on a driving query. If the driving query can be formed so that it limits the processed rows when the job is restarted, then it is re-runnable. This is managed by the application logic. Often, a condition is added to the `where` statement to limit the rows returned by the driving - query with logic resembling "and processedFlag!= true". + query with logic resembling `and processedFlag!= true`. Repeat:: - One of the most basic units of batch processing, it defines by repeatability calling a + One of the most basic units of batch processing, it defines by repeatedly calling a portion of code until it is finished and while there is no error. Typically, a batch process would be repeatable as long as there is input. Retry:: Simplifies the execution of operations with retry semantics most frequently associated - with handling transactional output exceptions. Retry is slightly different from repeat, - rather than continually calling a block of code, retry is stateful and continually + with handling transactional output exceptions. Retry is slightly different from repeat. + Rather than continually calling a block of code, retry is stateful and continually calls the same block of code with the same input, until it either succeeds or some type - of retry limit has been exceeded. It is only generally useful when a subsequent + of retry limit has been exceeded. It is generally useful only when a subsequent invocation of the operation might succeed because something in the environment has improved. diff --git a/spring-batch-docs/src/main/asciidoc/index-single.adoc b/spring-batch-docs/src/main/asciidoc/index-single.adoc index 621463a530..5da4684a53 100644 --- a/spring-batch-docs/src/main/asciidoc/index-single.adoc +++ b/spring-batch-docs/src/main/asciidoc/index-single.adoc @@ -10,6 +10,8 @@ include::toggle.adoc[] include::spring-batch-intro.adoc[] +include::spring-batch-architecture.adoc[] + include::whatsnew.adoc[] include::domain.adoc[] diff --git a/spring-batch-docs/src/main/asciidoc/index.adoc b/spring-batch-docs/src/main/asciidoc/index.adoc index c83eef9b33..be8f414a94 100644 --- a/spring-batch-docs/src/main/asciidoc/index.adoc +++ b/spring-batch-docs/src/main/asciidoc/index.adoc @@ -10,19 +10,19 @@ The reference documentation is divided into several sections: [horizontal] <> :: Background, usage - scenarios and general guidelines. + scenarios, and general guidelines. <> :: New features introduced in version 5.0. <> :: Core concepts and abstractions of the Batch domain language. -<> :: Job configuration, execution and +<> :: Job configuration, execution, and administration. -<> :: Step configuration, different types of steps, +<> :: Step configuration, different types of steps, and controlling step flow. <> :: `ItemReader` and `ItemWriter` interfaces and how to use them. <> :: `ItemProcessor` interface and how to use it. <> :: Multi-threaded steps, -parallel steps, remote chunking and partitioning. +parallel steps, remote chunking, and partitioning. <> :: Completion policies and exception handling of repetitive actions. <> :: Retry and backoff policies of retryable operations. <> :: Job and Step testing facilities and APIs. @@ -31,18 +31,18 @@ and guidelines. <> :: Integration between Spring Batch and Spring Integration projects. <> :: Batch jobs -monitoring and metrics +monitoring and metrics. The following appendices are available: [horizontal] <> :: List of -all item readers and writers provided out-of-the box. +all provided item readers and writers. <> :: Core tables used by the Batch domain model. <> :: Transaction -boundaries, propagation and isolation levels used in Spring Batch. -<> :: Glossary of common terms, concepts and vocabulary of +boundaries, propagation, and isolation levels used in Spring Batch. +<> :: Glossary of common terms, concepts, and vocabulary of the Batch domain. include::footer/index-footer.adoc[] diff --git a/spring-batch-docs/src/main/asciidoc/job.adoc b/spring-batch-docs/src/main/asciidoc/job.adoc index ec788a7802..e799f874da 100644 --- a/spring-batch-docs/src/main/asciidoc/job.adoc +++ b/spring-batch-docs/src/main/asciidoc/job.adoc @@ -19,7 +19,7 @@ image::{batch-asciidoc}images/spring-batch-reference-model.png[Figure 2.1: Batch While the `Job` object may seem like a simple container for steps, you must be aware of many configuration options. Furthermore, you must consider many options about -how a `Job` can be run and how its meta-data can be +how a `Job` can be run and how its metadata can be stored during that run. This chapter explains the various configuration options and runtime concerns of a `Job`. @@ -52,13 +52,14 @@ configuration of the `JobRepository` is handled through the <> +There are multiple implementations of the <> interface. However, the namespace abstracts away the differences in configuration. It has only three required dependencies: a name, `JobRepository` , and a list of `Step` instances. +The following example creates a `footballJob`: ==== [source, xml, role="xmlContent"] @@ -282,7 +283,7 @@ public Job footballJob() { ---- ==== -Noted that the `afterJob` method is called regardless of the success or +Note that the `afterJob` method is called regardless of the success or failure of the `Job`. If you need to determine success or failure, you can get that information from the `JobExecution`: @@ -309,11 +310,16 @@ The annotations corresponding to this interface are: [role="xmlContent"] ==== Inheriting from a Parent Job +ifdef::backend-pdf[] +This section applies only to XML based configuration, as Java configuration provides better +reuse capabilities. +endif::backend-pdf[] + [role="xmlContent"] If a group of Jobs share similar but not identical configurations, it may help to define a "`parent`" `Job` from which the concrete -`Job` instances may inherit properties. Similar to class +`Job` instances can inherit properties. Similar to class inheritance in Java, a "`child`" `Job` combines its elements and attributes with the parent's. @@ -324,7 +330,7 @@ listeners. The `Job` (`job1`) is a concrete definition that inherits the list of listeners from `baseJob` and merges it with its own list of listeners to produce a `Job` with two listeners and one -`Step`, `step1`. +`Step` (`step1`). ==== [source, xml, role="xmlContent"] @@ -348,12 +354,6 @@ it with its own list of listeners to produce a See the section on <> for more detailed information. -ifdef::backend-pdf[] -This section applies only to XML based configuration, as Java configuration provides better -reuse capabilities. - -endif::backend-pdf[] - ==== JobParametersValidator A job declared in the XML namespace or using any subclass of @@ -367,7 +367,7 @@ constraints, you can implement the interface yourself. ifdef::backend-html5[] [role="xmlContent"] The configuration of a validator is supported through the XML namespace through a child -element of the job, as shown in the following example: +element of the job, as the following example shows: ==== [source, xml, role="xmlContent"] @@ -437,7 +437,7 @@ endif::backend-pdf[] [[javaConfig]] === Java Configuration -Spring 3 brought the ability to configure applications with java instead of XML. As of +Spring 3 brought the ability to configure applications with Java instead of XML. As of Spring Batch 2.2.0, you can configure batch jobs by using the same Java configuration. There are three components for the Java-based configuration: the `@EnableBatchProcessing` annotation and two builders. @@ -445,7 +445,7 @@ annotation and two builders. The `@EnableBatchProcessing` annotation works similarly to the other `@Enable*` annotations in the Spring family. In this case, `@EnableBatchProcessing` provides a base configuration for building batch jobs. Within this base configuration, an instance of `StepScope` is -created in addition to a number of beans being made available to be autowired: +created, in addition to a number of beans being made available to be autowired: * `JobRepository`: a bean named `jobRepository` * `JobLauncher`: a bean named `jobLauncher` @@ -547,7 +547,7 @@ framework features, such as the `JobLauncher`, [role="xmlContent"] The batch namespace abstracts away many of the implementation details of the `JobRepository` implementations and their collaborators. However, there are still a few -configuration options available, as shown in the following example: +configuration options available, as the following example shows: .XML Configuration ==== @@ -570,7 +570,7 @@ The `max-varchar-length` defaults to `2500`, which is the length of the long scripts>>. [role="javaContent"] -When you use Java configuration, a `JobRepository` is provided for you. A JDBC-based one is +When you use Java configuration, a `JobRepository` is provided for you. A JDBC-based one is provided if a `DataSource` is provided, and the `Map`-based one is provided if no `DataSource` is provided. However, you can customize the configuration of the `JobRepository` through an implementation of the `BatchConfigurer` interface, as the following example shows: @@ -613,7 +613,7 @@ The behavior of the framework is not well defined if the repository methods are transactional. The isolation level in the `create*` method attributes is specified separately to ensure that, when jobs are launched, if two processes try to launch the same job at the same time, only one succeeds. The default isolation level for that -method is `SERIALIZABLE`, which is quite aggressive. `READ_COMMITTED` usually works just as +method is `SERIALIZABLE`, which is quite aggressive. `READ_COMMITTED` usually works equally well. `READ_UNCOMMITTED` is fine if two processes are not likely to collide in this way. However, since a call to the `create*` method is quite short, it is unlikely that `SERIALIZED` causes problems, as long as the database platform supports it. However, you @@ -650,7 +650,7 @@ protected JobRepository createJobRepository() throws Exception { ==== If the namespace or factory beans are not used, you must also configure the -transactional behavior of the repository using AOP. +transactional behavior of the repository by using AOP. [role="xmlContent"] The following example shows how to configure the transactional behavior of the repository @@ -750,7 +750,7 @@ Only the table prefix is configurable. The table and column names are not. [[nonStandardDatabaseTypesInRepository]] ==== Non-standard Database Types in a Repository -If you are using a database platform that is not in the list of supported platforms, you +If you use a database platform that is not in the list of supported platforms, you may be able to use one of the supported types, if the SQL variant is close enough. To do this, you can use the raw `JobRepositoryFactoryBean` instead of the namespace shortcut and use it to set the database type to the closest match. @@ -794,7 +794,7 @@ If the database type is not specified, the `JobRepositoryFactoryBean` tries to auto-detect the database type from the `DataSource`. The major differences between platforms are mainly accounted for by the strategy for incrementing primary keys, so -it is often be necessary to override the +it is often necessary to override the `incrementerFactory` as well (by using one of the standard implementations from the Spring Framework). @@ -848,8 +848,8 @@ protected JobLauncher createJobLauncher() throws Exception { ==== Once a <> is obtained, it is passed to the -execute method of `Job`, ultimately returning the `JobExecution` to the caller, as shown -in the following image: +execute method of `Job`, ultimately returning the `JobExecution` to the caller, as +the following image shows: .Job Launcher Sequence image::{batch-asciidoc}images/job-launcher-sequence-sync.png[Job Launcher Sequence, scaledwidth="60%"] @@ -868,7 +868,7 @@ You can configure the `SimpleJobLauncher` to allow for this scenario by configur `TaskExecutor`. [role="xmlContent"] -The following XML example shows a `SimpleJobLauncher` configured to return immediately: +The following XML example configures a `SimpleJobLauncher` to return immediately: .XML Configuration ==== @@ -885,7 +885,7 @@ The following XML example shows a `SimpleJobLauncher` configured to return immed ==== [role="javaContent"] -The following Java example shows a `SimpleJobLauncher` configured to return immediately: +The following Java example configures a `SimpleJobLauncher` to return immediately: .Java Configuration ==== @@ -902,8 +902,8 @@ public JobLauncher jobLauncher() { ---- ==== -Any implementation of the spring `TaskExecutor` -interface can be used to control how jobs are asynchronously +You can use any implementation of the spring `TaskExecutor` +interface to control how jobs are asynchronously executed. [[runningAJob]] @@ -915,7 +915,7 @@ At a minimum, launching a batch job requires two things: the context or different contexts. For example, if you launch jobs from the command line, a new JVM is instantiated for each `Job`. Thus, every job has its own `JobLauncher`. However, if -you run from within a web container within the scope of an +you run from within a web container that is within the scope of an `HttpRequest`, there is usually one `JobLauncher` (configured for asynchronous job launching) that multiple requests invoke to launch their jobs. @@ -923,13 +923,13 @@ launching) that multiple requests invoke to launch their jobs. [[runningJobsFromCommandLine]] ==== Running Jobs from the Command Line -For users that want to run their jobs from an enterprise +If you want to run your jobs from an enterprise scheduler, the command line is the primary interface. This is because most schedulers (with the exception of Quartz, unless using `NativeJob`) work directly with operating system processes, primarily kicked off with shell scripts. There are many ways to launch a Java process besides a shell script, such as Perl, Ruby, or -even build tools, such as ant or maven. However, because most people +even build tools, such as Ant or Maven. However, because most people are familiar with shell scripts, this example focuses on them. [[commandLineJobRunner]] @@ -945,12 +945,12 @@ many ways to launch a Java process, and this class should in no way be viewed as definitive. The `CommandLineJobRunner` performs four tasks: -* Load the appropriate `ApplicationContext` -* Parse command line arguments into `JobParameters` -* Locate the appropriate job based on arguments +* Load the appropriate `ApplicationContext`. +* Parse command line arguments into `JobParameters`. +* Locate the appropriate job based on arguments. * Use the `JobLauncher` provided in the application context to launch the job. -All of these tasks are accomplished using only the arguments passed in. +All of these tasks are accomplished with only the arguments passed in. The following table describes the required arguments: .CommandLineJobRunner arguments @@ -962,7 +962,7 @@ should contain everything needed to run the complete |`jobName`|The name of the job to be run. |=============== -These arguments must be passed in with the path first and the name second. All arguments +These arguments must be passed in, with the path first and the name second. All arguments after these are considered to be job parameters, are turned into a `JobParameters` object, and must be in the format of `name=value`. @@ -1016,7 +1016,7 @@ You can override this behavior by using a custom `JobParametersConverter`. ifdef::backend-html5[] [role="xmlContent"] In most cases, you would want to use a manifest to declare your `main` class in a jar. However, -for simplicity, the class was used directly. This example uses the same `EndOfDay` +for simplicity, the class was used directly. This example uses the `EndOfDay` example from the <>. The first argument is `endOfDayJob.xml`, which is the Spring ApplicationContext that contains the `Job`. The second argument, `endOfDay,` represents the job name. The final argument, @@ -1040,7 +1040,7 @@ The following example shows a sample configuration for `endOfDay` in XML: [role="javaContent"] In most cases, you would want to use a manifest to declare your `main` class in a jar. However, -for simplicity, the class was used directly. This example uses the same `EndOfDay` +for simplicity, the class was used directly. This example uses the `EndOfDay` example from the <>. The first argument is `io.spring.EndOfDayJobConfiguration`, which is the fully qualified class name to the configuration class that contains the Job. The second argument, `endOfDay`, represents @@ -1083,7 +1083,7 @@ endif::backend-html5[] ifdef::backend-pdf[] In most cases, you would want to use a manifest to declare your `main` class in a jar. However, -for simplicity, the class was used directly. This example uses the same `EndOfDay` +for simplicity, the class was used directly. This example uses the `EndOfDay` example from the <>. The first argument is where your job is configured (either an XML file or a fully qualified class name). The second argument, `endOfDay`, represents the job name. The final argument, @@ -1197,15 +1197,15 @@ implementation used by the job runner is the `SimpleJvmExitCodeMapper` that returns 0 for completion, 1 for generic errors, and 2 for any job runner errors such as not being able to find a `Job` in the provided context. If anything more -complex than the 3 values above is needed, then a custom +complex than the three values above is needed, a custom implementation of the `ExitCodeMapper` interface must be supplied. Because the `CommandLineJobRunner` is the class that creates -an `ApplicationContext`, and thus cannot be +an `ApplicationContext` and, thus, cannot be 'wired together', any values that need to be overwritten must be autowired. This means that if an implementation of `ExitCodeMapper` is found within the `BeanFactory`, -it will be injected into the runner after the context is created. All +it is injected into the runner after the context is created. All that needs to be done to provide your own `ExitCodeMapper` is to declare the implementation as a root level bean and ensure that it is part of the @@ -1220,7 +1220,7 @@ launched from the command-line, as described earlier. However, there are many cases where launching from an `HttpRequest` is a better option. Many such use cases include reporting, ad-hoc job running, and web application support. Because a batch job (by definition) -is long running, the most important concern is ensuring to launch the +is long running, the most important concern is to launch the job asynchronously: .Asynchronous Job Launcher Sequence From Web Container @@ -1465,7 +1465,7 @@ example has been given an `id` so that it can be included in child contexts (for example, as a parent bean definition) and cause all jobs created there to also be registered automatically. -===== `AutomaticJobRegistrar` +===== AutomaticJobRegistrar This is a lifecycle component that creates child contexts and registers jobs from those contexts as they are created. One advantage of doing this is that, while the job names in @@ -1552,10 +1552,10 @@ locations. As previously discussed, the `JobRepository` provides CRUD operations on the meta-data, and the `JobExplorer` provides read-only operations on the -meta-data. However, those operations are most useful when used together +metadata. However, those operations are most useful when used together to perform common monitoring tasks such as stopping, restarting, or summarizing a Job, as is commonly done by batch operators. Spring Batch -provides these types of operations via the +provides these types of operations in the `JobOperator` interface: ==== @@ -1597,7 +1597,7 @@ public interface JobOperator { ---- ==== -The above operations represent methods from many different interfaces, such as +The preceding operations represent methods from many different interfaces, such as `JobLauncher`, `JobRepository`, `JobExplorer`, and `JobRegistry`. For this reason, the provided implementation of `JobOperator` (`SimpleJobOperator`) has many dependencies. @@ -1647,14 +1647,14 @@ The following example shows a typical bean definition for `SimpleJobOperator` in ---- ==== -NOTE: If you set the table prefix on the job repository, don't forget to set it on the job explorer as well. +NOTE: If you set the table prefix on the job repository, do not forget to set it on the job explorer as well. [[JobParametersIncrementer]] ==== JobParametersIncrementer Most of the methods on `JobOperator` are self-explanatory, and you can find more detailed explanations in the -https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/core/launch/JobOperator.html[javadoc of the interface]. However, the +https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/core/launch/JobOperator.html[Javadoc of the interface]. However, the `startNextInstance` method is worth noting. This method always starts a new instance of a `Job`. This can be extremely useful if there are serious issues in a @@ -1682,15 +1682,15 @@ public interface JobParametersIncrementer { The contract of `JobParametersIncrementer` is that, given a <> -object, it returns the 'next' JobParameters +object, it returns the "`next`" `JobParameters` object by incrementing any necessary values it may contain. This strategy is useful because the framework has no way of knowing what changes to the `JobParameters` make it the "`next`" instance. For example, if the only value in `JobParameters` is a date and the next instance -should be created, should that value be incremented by one day? Or one +should be created, should that value be incremented by one day or one week (if the job is weekly, for instance)? The same can be said for any -numerical values that help to identify the Job, +numerical values that help to identify the `Job`, as the following example shows: ==== @@ -1709,11 +1709,11 @@ public class SampleIncrementer implements JobParametersIncrementer { ---- ==== -In this example, the value with a key of 'run.id' is used to +In this example, the value with a key of `run.id` is used to discriminate between `JobInstances`. If the `JobParameters` passed in is null, it can be assumed that the `Job` has never been run before -and thus its initial state can be returned. However, if not, the old +and, thus, its initial state can be returned. However, if not, the old value is obtained, incremented by one, and returned. ifdef::backend-html5[] @@ -1749,8 +1749,8 @@ public Job footballJob() { endif::backend-html5[] ifdef::backend-pdf[] -You can associate an incrementer can -be associated with a `Job` by using the `incrementer` +You can associate an incrementer +with a `Job` by using the `incrementer` attribute in the namespace: ==== @@ -1799,7 +1799,7 @@ developer code that the framework has no control over, such as a business service. However, as soon as control is returned back to the framework, it sets the status of the current `StepExecution` to -`BatchStatus.STOPPED`, saves it, and then does the same +`BatchStatus.STOPPED`, saves it, and does the same for the `JobExecution` before finishing. ==== Aborting a Job @@ -1814,12 +1814,12 @@ job is running and encounters a step that has been marked moves on to the next step (as determined by the job flow definition and the step execution exit status). -If the process died (`"kill -9"` or server -failure) the job is, of course, not running, but the `JobRepository` has +If the process died (`kill -9` or server +failure), the job is, of course, not running, but the `JobRepository` has no way of knowing because no one told it before the process died. You have to tell it manually that you know that the execution either failed or should be considered aborted (change its status to `FAILED` or `ABANDONED`). This is a business decision, and there is no way to automate it. Change the status to `FAILED` only if it is not restartable or if -you know the restart data is valid. +you know the restart data is not valid. diff --git a/spring-batch-docs/src/main/asciidoc/monitoring-and-metrics.adoc b/spring-batch-docs/src/main/asciidoc/monitoring-and-metrics.adoc index 98033188fd..13b2aff281 100644 --- a/spring-batch-docs/src/main/asciidoc/monitoring-and-metrics.adoc +++ b/spring-batch-docs/src/main/asciidoc/monitoring-and-metrics.adoc @@ -3,7 +3,6 @@ :toclevels: 4 [[monitoring-and-metrics]] - == Monitoring and metrics Since version 4.2, Spring Batch provides support for batch monitoring and metrics @@ -11,7 +10,6 @@ based on link:$$https://micrometer.io/$$[Micrometer]. This section describes which metrics are provided out-of-the-box and how to contribute custom metrics. [[built-in-metrics]] - === Built-in metrics Metrics collection does not require any specific configuration. All metrics provided @@ -32,7 +30,6 @@ under the `spring.batch` prefix. The following table explains all the metrics in NOTE: The `status` tag can be either `SUCCESS` or `FAILURE`. [[custom-metrics]] - === Custom metrics If you want to use your own metrics in your custom components, we recommend using @@ -71,20 +68,19 @@ public class MyTimedTasklet implements Tasklet { ---- [[disabling-metrics]] - -=== Disabling metrics +=== Disabling Metrics Metrics collection is a concern similar to logging. Disabling logs is typically -done by configuring the logging library and this is no different for metrics. -There is no feature in Spring Batch to disable micrometer's metrics, this should -be done on micrometer's side. Since Spring Batch stores metrics in the global -registry of micrometer with the `spring.batch` prefix, it is possible to configure -micrometer to ignore/deny batch metrics with the following snippet: +done by configuring the logging library, and this is no different for metrics. +There is no feature in Spring Batch to disable Micrometer's metrics. This should +be done on Micrometer's side. Since Spring Batch stores metrics in the global +registry of Micrometer with the `spring.batch` prefix, you can configure +micrometer to ignore or deny batch metrics with the following snippet: [source, java] ---- Metrics.globalRegistry.config().meterFilter(MeterFilter.denyNameStartsWith("spring.batch")) ---- -Please refer to micrometer's link:$$http://micrometer.io/docs/concepts#_meter_filters$$[reference documentation] +See Micrometer's link:$$http://micrometer.io/docs/concepts#_meter_filters$$[reference documentation] for more details. diff --git a/spring-batch-docs/src/main/asciidoc/processor.adoc b/spring-batch-docs/src/main/asciidoc/processor.adoc index 439b6b3e6a..8c33512417 100644 --- a/spring-batch-docs/src/main/asciidoc/processor.adoc +++ b/spring-batch-docs/src/main/asciidoc/processor.adoc @@ -38,12 +38,12 @@ public class CompositeItemWriter implements ItemWriter { The preceding class contains another `ItemWriter` to which it delegates after having provided some business logic. This pattern could easily be used for an `ItemReader` as -well, perhaps to obtain more reference data based upon the input that was provided by the +well, perhaps to obtain more reference data based on the input that was provided by the main `ItemReader`. It is also useful if you need to control the call to `write` yourself. -However, if you only want to 'transform' the item passed in for writing before it is +However, if you only want to "`transform`" the item passed in for writing before it is actually written, you need not `write` yourself. You can just modify the item. For this -scenario, Spring Batch provides the `ItemProcessor` interface, as shown in the following -interface definition: +scenario, Spring Batch provides the `ItemProcessor` interface, as the following +interface definition shows: [source, java] ---- @@ -83,12 +83,12 @@ public class BarWriter implements ItemWriter { } ---- -In the preceding example, there is a class `Foo`, a class `Bar`, and a class -`FooProcessor` that adheres to the `ItemProcessor` interface. The transformation is +In the preceding example, there is a class named `Foo`, a class named `Bar`, and a class +named `FooProcessor` that adheres to the `ItemProcessor` interface. The transformation is simple, but any type of transformation could be done here. The `BarWriter` writes `Bar` objects, throwing an exception if any other type is provided. Similarly, the `FooProcessor` throws an exception if anything but a `Foo` is provided. The -`FooProcessor` can then be injected into a `Step`, as shown in the following example: +`FooProcessor` can then be injected into a `Step`, as the following example shows: .XML Configuration [source, xml, role="xmlContent"] @@ -131,10 +131,10 @@ is optional for a `Step`. === Chaining ItemProcessors Performing a single transformation is useful in many scenarios, but what if you want to -'chain' together multiple `ItemProcessor` implementations? This can be accomplished using +"`chain`" together multiple `ItemProcessor` implementations? You can do so by using the composite pattern mentioned previously. To update the previous, single transformation, example, `Foo` is transformed to `Bar`, which is transformed to `Foobar` -and written out, as shown in the following example: +and written out, as the following example shows: [source, java] ---- @@ -182,7 +182,7 @@ itemProcessors.add(new BarProcessor()); compositeProcessor.setDelegates(itemProcessors); ---- -Just as with the previous example, the composite processor can be configured into the +Just as with the previous example, you can configure the composite processor into the `Step`: .XML Configuration @@ -247,36 +247,36 @@ public CompositeItemProcessor compositeProcessor() { One typical use for an item processor is to filter out records before they are passed to the `ItemWriter`. Filtering is an action distinct from skipping. Skipping indicates that -a record is invalid, while filtering simply indicates that a record should not be +a record is invalid, while filtering indicates that a record should not be written. For example, consider a batch job that reads a file containing three different types of records: records to insert, records to update, and records to delete. If record deletion -is not supported by the system, then we would not want to send any "delete" records to -the `ItemWriter`. But, since these records are not actually bad records, we would want to +is not supported by the system, we would not want to send any deletable records to +the `ItemWriter`. However, since these records are not actually bad records, we would want to filter them out rather than skip them. As a result, the `ItemWriter` would receive only -"insert" and "update" records. +insertable and updatable records. To filter a record, you can return `null` from the `ItemProcessor`. The framework detects that the result is `null` and avoids adding that item to the list of records delivered to -the `ItemWriter`. As usual, an exception thrown from the `ItemProcessor` results in a +the `ItemWriter`. An exception thrown from the `ItemProcessor` results in a skip. [[validatingInput]] === Validating Input -In the <> chapter, multiple approaches to parsing input have been -discussed. Each major implementation throws an exception if it is not 'well-formed'. The +The <> chapter discusses multiple approaches to parsing input. +Each major implementation throws an exception if it is not "`well formed.`" The `FixedLengthTokenizer` throws an exception if a range of data is missing. Similarly, attempting to access an index in a `RowMapper` or `FieldSetMapper` that does not exist or is in a different format than the one expected causes an exception to be thrown. All of these types of exceptions are thrown before `read` returns. However, they do not address the issue of whether or not the returned item is valid. For example, if one of the fields -is an age, it obviously cannot be negative. It may parse correctly, because it exists and +is an age, it cannot be negative. It may parse correctly, because it exists and is a number, but it does not cause an exception. Since there are already a plethora of validation frameworks, Spring Batch does not attempt to provide yet another. Rather, it -provides a simple interface, called `Validator`, that can be implemented by any number of -frameworks, as shown in the following interface definition: +provides a simple interface, called `Validator`, that you can implement by any number of +frameworks, as the following interface definition shows: [source, java] ---- @@ -288,8 +288,8 @@ public interface Validator { ---- The contract is that the `validate` method throws an exception if the object is invalid -and returns normally if it is valid. Spring Batch provides an out of the box -`ValidatingItemProcessor`, as shown in the following bean definition: +and returns normally if it is valid. Spring Batch provides an +`ValidatingItemProcessor`, as the following bean definition shows: .XML Configuration [source, xml, role="xmlContent"] @@ -328,7 +328,7 @@ public SpringValidator validator() { ---- You can also use the `BeanValidatingItemProcessor` to validate items annotated with -the Bean Validation API (JSR-303) annotations. For example, given the following type `Person`: +the Bean Validation API (JSR-303) annotations. For example, consider the following type `Person`: [source, java] ---- @@ -352,7 +352,7 @@ class Person { } ---- -you can validate items by declaring a `BeanValidatingItemProcessor` bean in your +You can validate items by declaring a `BeanValidatingItemProcessor` bean in your application context and register it as a processor in your chunk-oriented step: [source, java] @@ -370,8 +370,8 @@ public BeanValidatingItemProcessor beanValidatingItemProcessor() throws === Fault Tolerance When a chunk is rolled back, items that have been cached during reading may be -reprocessed. If a step is configured to be fault tolerant (typically by using skip or +reprocessed. If a step is configured to be fault-tolerant (typically by using skip or retry processing), any `ItemProcessor` used should be implemented in a way that is -idempotent. Typically that would consist of performing no changes on the input item for -the `ItemProcessor` and only updating the +idempotent. Typically that would consist of performing no changes on the input item for +the `ItemProcessor` and updating only the instance that is the result. diff --git a/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc b/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc index 1f3b4f4aef..1eee0f5c53 100644 --- a/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc +++ b/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc @@ -1694,7 +1694,7 @@ file-1.txt file-2.txt ignored.txt ---- file-1.txt and file-2.txt are formatted the same and, for business reasons, should be -processed together. The `MultiResourceItemReader` can be used to read in both files by +processed together. The `MuliResourceItemReader` can be used to read in both files by using wildcards. [role="xmlContent"] diff --git a/spring-batch-docs/src/main/asciidoc/repeat.adoc b/spring-batch-docs/src/main/asciidoc/repeat.adoc index 0875d81ebb..747199c2a8 100644 --- a/spring-batch-docs/src/main/asciidoc/repeat.adoc +++ b/spring-batch-docs/src/main/asciidoc/repeat.adoc @@ -41,17 +41,15 @@ public interface RepeatCallback { ---- The callback is executed repeatedly until the implementation determines that the -iteration should end. The return value in these interfaces is an enumeration that can -either be `RepeatStatus.CONTINUABLE` or `RepeatStatus.FINISHED`. A `RepeatStatus` +iteration should end. The return value in these interfaces is an enumeration value that can +be either `RepeatStatus.CONTINUABLE` or `RepeatStatus.FINISHED`. A `RepeatStatus` enumeration conveys information to the caller of the repeat operations about whether -there is any more work to do. Generally speaking, implementations of `RepeatOperations` -should inspect the `RepeatStatus` and use it as part of the decision to end the -iteration. Any callback that wishes to signal to the caller that there is no more work to -do can return `RepeatStatus.FINISHED`. - -The simplest general purpose implementation of `RepeatOperations` is `RepeatTemplate`, as -shown in the following example: +any work remains. Generally speaking, implementations of `RepeatOperations` +should inspect `RepeatStatus` and use it as part of the decision to end the +iteration. Any callback that wishes to signal to the caller that there is no work remains +can return `RepeatStatus.FINISHED`. +The simplest general purpose implementation of `RepeatOperations` is `RepeatTemplate`: [source, java] ---- @@ -71,9 +69,9 @@ template.iterate(new RepeatCallback() { In the preceding example, we return `RepeatStatus.CONTINUABLE`, to show that there is more work to do. The callback can also return `RepeatStatus.FINISHED`, to signal to the -caller that there is no more work to do. Some iterations can be terminated by +caller that there is no work remains. Some iterations can be terminated by considerations intrinsic to the work being done in the callback. Others are effectively -infinite loops as far as the callback is concerned and the completion decision is +infinite loops (as far as the callback is concerned), and the completion decision is delegated to an external policy, as in the case shown in the preceding example. [[repeatContext]] @@ -81,7 +79,7 @@ delegated to an external policy, as in the case shown in the preceding example. ==== RepeatContext The method parameter for the `RepeatCallback` is a `RepeatContext`. Many callbacks ignore -the context. However, if necessary, it can be used as an attribute bag to store transient +the context. However, if necessary, you can use it as an attribute bag to store transient data for the duration of the iteration. After the `iterate` method returns, the context no longer exists. @@ -91,28 +89,26 @@ calls to `iterate`. This is the case, for instance, if you want to count the num occurrences of an event in the iteration and remember it across subsequent calls. [[repeatStatus]] - ==== RepeatStatus `RepeatStatus` is an enumeration used by Spring Batch to indicate whether processing has -finished. It has two possible `RepeatStatus` values, described in the following table: +finished. It has two possible `RepeatStatus` values: .RepeatStatus Properties |=============== |__Value__|__Description__ -|CONTINUABLE|There is more work to do. -|FINISHED|No more repetitions should take place. +|`CONTINUABLE`|There is more work to do. +|`FINISHED`|No more repetitions should take place. |=============== -`RepeatStatus` values can also be combined with a logical AND operation by using the +You can combine `RepeatStatus` values with a logical AND operation by using the `and()` method in `RepeatStatus`. The effect of this is to do a logical AND on the -continuable flag. In other words, if either status is `FINISHED`, then the result is +continuable flag. In other words, if either status is `FINISHED`, the result is `FINISHED`. [[completionPolicies]] - === Completion Policies Inside a `RepeatTemplate`, the termination of the loop in the `iterate` method is @@ -132,7 +128,6 @@ decisions. For example, a batch processing window that prevents batch jobs from once the online systems are in use would require a custom policy. [[repeatExceptionHandling]] - === Exception Handling If there is an exception thrown inside a `RepeatCallback`, the `RepeatTemplate` consults @@ -165,7 +160,6 @@ current `RepeatContext`. When set to `true`, the limit is kept across sibling co a nested iteration (such as a set of chunks inside a step). [[repeatListeners]] - === Listeners Often, it is useful to be able to receive additional callbacks for cross-cutting concerns @@ -195,7 +189,6 @@ order. In this case, `open` and `before` are called in the same order while `aft `onError`, and `close` are called in reverse order. [[repeatParallelProcessing]] - === Parallel Processing Implementations of `RepeatOperations` are not restricted to executing the callback @@ -207,21 +200,21 @@ of executing the whole iteration in the same thread (the same as a normal `RepeatTemplate`). [[declarativeIteration]] - === Declarative Iteration -Sometimes there is some business processing that you know you want to repeat every time -it happens. The classic example of this is the optimization of a message pipeline. It is -more efficient to process a batch of messages, if they are arriving frequently, than to +Sometimes, there is some business processing that you know you want to repeat every time +it happens. The classic example of this is the optimization of a message pipeline. +If a batch of messages arrives frequently, it is more efficient to process them than to bear the cost of a separate transaction for every message. Spring Batch provides an AOP -interceptor that wraps a method call in a `RepeatOperations` object for just this +interceptor that wraps a method call in a `RepeatOperations` object for this purpose. The `RepeatOperationsInterceptor` executes the intercepted method and repeats according to the `CompletionPolicy` in the provided `RepeatTemplate`. [role="xmlContent"] -The following example shows declarative iteration using the Spring AOP namespace to +The following example shows declarative iteration that uses the Spring AOP namespace to repeat a service call to a method called `processMessage` (for more detail on how to -configure AOP interceptors, see the Spring User Guide): +configure AOP interceptors, see the +<>): [source, xml, role="xmlContent"] ---- @@ -236,9 +229,10 @@ configure AOP interceptors, see the Spring User Guide): ---- [role="javaContent"] -The following example demonstrates using Java configuration to +The following example uses Java configuration to repeat a service call to a method called `processMessage` (for more detail on how to -configure AOP interceptors, see the Spring User Guide): +configure AOP interceptors, see the +<>): [source, java, role="javaContent"] ---- @@ -264,11 +258,11 @@ The preceding example uses a default `RepeatTemplate` inside the interceptor. To the policies, listeners, and other details, you can inject an instance of `RepeatTemplate` into the interceptor. -If the intercepted method returns `void`, then the interceptor always returns +If the intercepted method returns `void`, the interceptor always returns `RepeatStatus.CONTINUABLE` (so there is a danger of an infinite loop if the `CompletionPolicy` does not have a finite end point). Otherwise, it returns -`RepeatStatus.CONTINUABLE` until the return value from the intercepted method is `null`, -at which point it returns `RepeatStatus.FINISHED`. Consequently, the business logic +`RepeatStatus.CONTINUABLE` until the return value from the intercepted method is `null`. +At that point, it returns `RepeatStatus.FINISHED`. Consequently, the business logic inside the target method can signal that there is no more work to do by returning `null` -or by throwing an exception that is re-thrown by the `ExceptionHandler` in the provided +or by throwing an exception that is rethrown by the `ExceptionHandler` in the provided `RepeatTemplate`. diff --git a/spring-batch-docs/src/main/asciidoc/retry.adoc b/spring-batch-docs/src/main/asciidoc/retry.adoc index 5326ef7c0d..6ad5029082 100644 --- a/spring-batch-docs/src/main/asciidoc/retry.adoc +++ b/spring-batch-docs/src/main/asciidoc/retry.adoc @@ -14,9 +14,9 @@ Examples include remote calls to a web service that fails because of a network g [NOTE] ==== -The retry functionality was pulled out of Spring Batch as of 2.2.0. +As of version 2.2.0, the retry functionality was pulled out of Spring Batch. It is now part of a new library, https://github.com/spring-projects/spring-retry[Spring Retry]. Spring Batch still relies on Spring Retry to automate retry operations within the framework. -Please refer to the reference documentation of Spring Retry for details about +See the reference documentation of Spring Retry for details about key APIs and how to use them. ==== diff --git a/spring-batch-docs/src/main/asciidoc/scalability.adoc b/spring-batch-docs/src/main/asciidoc/scalability.adoc index b6c20b1067..14d7157a63 100644 --- a/spring-batch-docs/src/main/asciidoc/scalability.adoc +++ b/spring-batch-docs/src/main/asciidoc/scalability.adoc @@ -10,7 +10,7 @@ ifndef::onlyonetoggle[] include::toggle.adoc[] endif::onlyonetoggle[] -Many batch processing problems can be solved with single threaded, single process jobs, +Many batch processing problems can be solved with single-threaded, single-process jobs, so it is always a good idea to properly check if that meets your needs before thinking about more complex implementations. Measure the performance of a realistic job and see if the simplest implementation meets your needs first. You can read and write a file of @@ -21,27 +21,26 @@ Batch offers a range of options, which are described in this chapter, although s features are covered elsewhere. At a high level, there are two modes of parallel processing: -* Single process, multi-threaded +* Single-process, multi-threaded * Multi-process These break down into categories as well, as follows: -* Multi-threaded Step (single process) -* Parallel Steps (single process) -* Remote Chunking of Step (multi process) -* Partitioning a Step (single or multi process) +* Multi-threaded Step (single-process) +* Parallel Steps (single-process) +* Remote Chunking of Step (multi-process) +* Partitioning a Step (single or multi-process) First, we review the single-process options. Then we review the multi-process options. [[multithreadedStep]] - === Multi-threaded Step The simplest way to start parallel processing is to add a `TaskExecutor` to your Step configuration. [role="xmlContent"] -For example, you might add an attribute of the `tasklet`, as follows: +For example, you might add an attribute TO the `tasklet`, as follows: [source, xml, role="xmlContent"] ---- @@ -51,8 +50,8 @@ For example, you might add an attribute of the `tasklet`, as follows: ---- [role="javaContent"] -When using java configuration, a `TaskExecutor` can be added to the step, -as shown in the following example: +When using Java configuration, you can add a `TaskExecutor` to the step, +as the following example shows: .Java Configuration [source, java, role="javaContent"] @@ -80,16 +79,16 @@ is a standard Spring interface, so consult the Spring User Guide for details of implementations. The simplest multi-threaded `TaskExecutor` is a `SimpleAsyncTaskExecutor`. -The result of the above configuration is that the `Step` executes by reading, processing, +The result of the preceding configuration is that the `Step` executes by reading, processing, and writing each chunk of items (each commit interval) in a separate thread of execution. Note that this means there is no fixed order for the items to be processed, and a chunk might contain items that are non-consecutive compared to the single-threaded case. In addition to any limits placed by the task executor (such as whether it is backed by a -thread pool), there is a throttle limit in the tasklet configuration which defaults to 4. -You may need to increase this to ensure that a thread pool is fully utilized. +thread pool), the tasklet configuration has a throttle limit (default: 4). +You may need to increase this limit to ensure that a thread pool is fully used. [role="xmlContent"] -For example you might increase the throttle-limit, as shown in the following example: +For example, you might increase the throttle-limit, as follows: [source, xml, role="xmlContent"] ---- @@ -100,8 +99,8 @@ For example you might increase the throttle-limit, as shown in the following exa ---- [role="javaContent"] -When using Java configuration, the builders provide access to the throttle limit, as shown -in the following example: +When using Java configuration, the builders provide access to the throttle limit, as +follows: .Java Configuration [source, java, role="javaContent"] @@ -119,13 +118,13 @@ public Step sampleStep(TaskExecutor taskExecutor) { ---- Note also that there may be limits placed on concurrency by any pooled resources used in -your step, such as a `DataSource`. Be sure to make the pool in those resources at least +your step, such as a `DataSource`. Be sure to make the pool in those resources at least as large as the desired number of concurrent threads in the step. There are some practical limitations of using multi-threaded `Step` implementations for some common batch use cases. Many participants in a `Step` (such as readers and writers) -are stateful. If the state is not segregated by thread, then those components are not -usable in a multi-threaded `Step`. In particular, most of the off-the-shelf readers and +are stateful. If the state is not segregated by thread, those components are not +usable in a multi-threaded `Step`. In particular, most of the readers and writers from Spring Batch are not designed for multi-threaded use. It is, however, possible to work with stateless or thread safe readers and writers, and there is a sample (called `parallelJob`) in the @@ -136,25 +135,23 @@ of items that have been processed in a database input table. Spring Batch provides some implementations of `ItemWriter` and `ItemReader`. Usually, they say in the Javadoc if they are thread safe or not or what you have to do to avoid -problems in a concurrent environment. If there is no information in the Javadoc, you can -check the implementation to see if there is any state. If a reader is not thread safe, +problems in a concurrent environment. If there is no information in the Javadoc, you can +check the implementation to see if there is any state. If a reader is not thread safe, you can decorate it with the provided `SynchronizedItemStreamReader` or use it in your own -synchronizing delegator. You can synchronize the call to `read()` and as long as the +synchronizing delegator. You can synchronize the call to `read()`, and, as long as the processing and writing is the most expensive part of the chunk, your step may still -complete much faster than it would in a single threaded configuration. +complete much more quickly than it would in a single-threaded configuration. [[scalabilityParallelSteps]] - - === Parallel Steps As long as the application logic that needs to be parallelized can be split into distinct -responsibilities and assigned to individual steps, then it can be parallelized in a +responsibilities and assigned to individual steps, it can be parallelized in a single process. Parallel Step execution is easy to configure and use. [role="xmlContent"] For example, executing steps `(step1,step2)` in parallel with `step3` is straightforward, -as shown in the following example: +as follows: [source, xml, role="xmlContent"] ---- @@ -176,7 +173,7 @@ as shown in the following example: [role="javaContent"] When using Java configuration, executing steps `(step1,step2)` in parallel with `step3` -is straightforward, as shown in the following example: +is straightforward, as follows: .Java Configuration [source, java, role="javaContent"] @@ -220,7 +217,7 @@ public TaskExecutor taskExecutor() { ---- The configurable task executor is used to specify which `TaskExecutor` -implementation should be used to execute the individual flows. The default is +implementation should execute the individual flows. The default is `SyncTaskExecutor`, but an asynchronous `TaskExecutor` is required to run the steps in parallel. Note that the job ensures that every flow in the split completes before aggregating the exit statuses and transitioning. @@ -228,7 +225,6 @@ aggregating the exit statuses and transitioning. See the section on <> for more detail. [[remoteChunking]] - === Remote Chunking In remote chunking, the `Step` processing is split across multiple processes, @@ -245,13 +241,13 @@ expensive than the reading of items (as is often the case in practice). The manager is an implementation of a Spring Batch `Step` with the `ItemWriter` replaced by a generic version that knows how to send chunks of items to the middleware as messages. The workers are standard listeners for whatever middleware is being used (for -example, with JMS, they would be `MessageListener` implementations), and their role is -to process the chunks of items using a standard `ItemWriter` or `ItemProcessor` plus +example, with JMS, they would be `MesssageListener` implementations), and their role is +to process the chunks of items by using a standard `ItemWriter` or `ItemProcessor` plus an `ItemWriter`, through the `ChunkProcessor` interface. One of the advantages of using this pattern is that the reader, processor, and writer components are off-the-shelf (the same -as would be used for a local execution of the step). The items are divided up dynamically +as would be used for a local execution of the step). The items are divided up dynamically, and work is shared through the middleware, so that, if the listeners are all eager -consumers, then load balancing is automatic. +consumers, load balancing is automatic. The middleware has to be durable, with guaranteed delivery and a single consumer for each message. JMS is the obvious candidate, but other options (such as JavaSpaces) exist in @@ -262,7 +258,6 @@ See the section on for more detail. [[partitioning]] - === Partitioning Spring Batch also provides an SPI for partitioning a `Step` execution and executing it @@ -285,7 +280,7 @@ each `Job` execution. The SPI in Spring Batch consists of a special implementation of `Step` (called the `PartitionStep`) and two strategy interfaces that need to be implemented for the specific environment. The strategy interfaces are `PartitionHandler` and `StepExecutionSplitter`, -and their role is shown in the following sequence diagram: +and the following sequence diagram shows their role: .Partitioning SPI image::{batch-asciidoc}images/partitioning-spi.png[Partitioning SPI, scaledwidth="60%"] @@ -325,36 +320,41 @@ public Step step1Manager() { } ---- +[role="xmlContent"] Similar to the multi-threaded step's `throttle-limit` attribute, the `grid-size` attribute prevents the task executor from being saturated with requests from a single step. -There is a simple example that can be copied and extended in the unit test suite for +[role="javaContent"] +Similar to the multi-threaded step's `throttleLimit` method, the `gridSize` +method prevents the task executor from being saturated with requests from a single +step. + +The unit test suite for https://github.com/spring-projects/spring-batch/tree/main/spring-batch-samples/src/main/resources/jobs[Spring -Batch Samples] (see `partition*Job.xml` configuration). +Batch Samples] (see `partition*Job.xml` configuration) has a simple example that you can copy and extend. -Spring Batch creates step executions for the partitions called "step1:partition0", and so -on. Many people prefer to call the manager step "step1:manager" for consistency. You can +Spring Batch creates step executions for the partition called `step1:partition0` and so +on. Many people prefer to call the manager step `step1:manager` for consistency. You can use an alias for the step (by specifying the `name` attribute instead of the `id` attribute). [[partitionHandler]] - ==== PartitionHandler -The `PartitionHandler` is the component that knows about the fabric of the remoting or +`PartitionHandler` is the component that knows about the fabric of the remoting or grid environment. It is able to send `StepExecution` requests to the remote `Step` instances, wrapped in some fabric-specific format, like a DTO. It does not have to know how to split the input data or how to aggregate the result of multiple `Step` executions. Generally speaking, it probably also does not need to know about resilience or failover, since those are features of the fabric in many cases. In any case, Spring Batch always -provides restartability independent of the fabric. A failed `Job` can always be restarted -and only the failed `Steps` are re-executed. +provides restartability independent of the fabric. A failed `Job` can always be restarted, +and, in that case, only the failed `Steps` are re-executed. The `PartitionHandler` interface can have specialized implementations for a variety of fabric types, including simple RMI remoting, EJB remoting, custom web service, JMS, Java -Spaces, shared memory grids (like Terracotta or Coherence), and grid execution fabrics -(like GridGain). Spring Batch does not contain implementations for any proprietary grid +Spaces, shared memory grids (such as Terracotta or Coherence), and grid execution fabrics +(such as GridGain). Spring Batch does not contain implementations for any proprietary grid or remoting fabrics. Spring Batch does, however, provide a useful implementation of `PartitionHandler` that @@ -364,8 +364,7 @@ executes `Step` instances locally in separate threads of execution, using the [role="xmlContent"] The `TaskExecutorPartitionHandler` is the default for a step configured with the XML -namespace shown previously. It can also be configured explicitly, as shown in the -following example: +namespace shown previously. You can also configure it explicitly, as follows: [source, xml, role="xmlContent"] ---- @@ -381,8 +380,8 @@ following example: ---- [role="javaContent"] -The `TaskExecutorPartitionHandler` can be configured explicitly within java configuration, -as shown in the following example: +You can explicitly configure the `TaskExecutorPartitionHandler` with Java configuration, +as follows: .Java Configuration [source, java, role="javaContent"] @@ -416,12 +415,11 @@ systems. It can also be used for remote execution by providing a `Step` implemen that is a proxy for a remote invocation (such as using Spring Remoting). [[partitioner]] - ==== Partitioner The `Partitioner` has a simpler responsibility: to generate execution contexts as input parameters for new step executions only (no need to worry about restarts). It has a -single method, as shown in the following interface definition: +single method, as the following interface definition shows: [source, java] ---- @@ -435,24 +433,23 @@ The return value from this method associates a unique name for each step executi later in the Batch metadata as the step name in the partitioned `StepExecutions`. The `ExecutionContext` is just a bag of name-value pairs, so it might contain a range of primary keys, line numbers, or the location of an input file. The remote `Step` then -normally binds to the context input using `#{...}` placeholders (late binding in step -scope), as illustrated in the next section. +normally binds to the context input by using `#{...}` placeholders (late binding in step +scope), as shown in the next section. The names of the step executions (the keys in the `Map` returned by `Partitioner`) need to be unique amongst the step executions of a `Job` but do not have any other specific requirements. The easiest way to do this (and to make the names meaningful for users) is to use a prefix+suffix naming convention, where the prefix is the name of the step that -is being executed (which itself is unique in the `Job`), and the suffix is just a +is being executed (which itself is unique in the `Job`) and the suffix is just a counter. There is a `SimplePartitioner` in the framework that uses this convention. -An optional interface called `PartitionNameProvider` can be used to provide the partition +You can use an optional interface called `PartitionNameProvider` to provide the partition names separately from the partitions themselves. If a `Partitioner` implements this -interface, then, on a restart, only the names are queried. If partitioning is expensive, +interface, only the names are queried on a restart. If partitioning is expensive, this can be a useful optimization. The names provided by the `PartitionNameProvider` must match those provided by the `Partitioner`. [[bindingInputDataToSteps]] - ==== Binding Input Data to Steps It is very efficient for the steps that are executed by the `PartitionHandler` to have @@ -471,7 +468,7 @@ the `Partitioner` output might resemble the content of the following table: |filecopy:partition2|fileName=/home/data/three |=============== -Then the file name can be bound to a step using late binding to the execution context. +Then the file name can be bound to a step by using late binding to the execution context. [role="xmlContent"] The following example shows how to define late binding in XML: diff --git a/spring-batch-docs/src/main/asciidoc/schema-appendix.adoc b/spring-batch-docs/src/main/asciidoc/schema-appendix.adoc index 995bb485e7..22dae32d0c 100644 --- a/spring-batch-docs/src/main/asciidoc/schema-appendix.adoc +++ b/spring-batch-docs/src/main/asciidoc/schema-appendix.adoc @@ -9,18 +9,18 @@ [[metaDataSchemaOverview]] === Overview -The Spring Batch Metadata tables closely match the Domain objects that represent them in +The Spring Batch Metadata tables closely match the domain objects that represent them in Java. For example, `JobInstance`, `JobExecution`, `JobParameters`, and `StepExecution` map to `BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, `BATCH_JOB_EXECUTION_PARAMS`, and `BATCH_STEP_EXECUTION`, respectively. `ExecutionContext` maps to both `BATCH_JOB_EXECUTION_CONTEXT` and `BATCH_STEP_EXECUTION_CONTEXT`. The `JobRepository` is responsible for saving and storing each Java object into its correct table. This appendix describes the metadata tables in detail, along with many of the design decisions that -were made when creating them. When viewing the various table creation statements below, -it is important to realize that the data types used are as generic as possible. Spring -Batch provides many schemas as examples, all of which have varying data types, due to +were made when creating them. When viewing the various table creation statements described +later in this appendix, note that the data types used are as generic as possible. Spring +Batch provides many schemas as examples. All of them have varying data types, due to variations in how individual database vendors handle data types. The following image -shows an ERD model of all 6 tables and their relationships to one another: +shows an ERD model of all six tables and their relationships to one another: .Spring Batch Meta-Data ERD image::{batch-asciidoc}images/meta-data-erd.png[Spring Batch Meta-Data ERD, scaledwidth="60%"] @@ -31,8 +31,8 @@ image::{batch-asciidoc}images/meta-data-erd.png[Spring Batch Meta-Data ERD, scal The Spring Batch Core JAR file contains example scripts to create the relational tables for a number of database platforms (which are, in turn, auto-detected by the job repository factory bean or namespace equivalent). These scripts can be used as is or -modified with additional indexes and constraints as desired. The file names are in the -form `schema-\*.sql`, where "*" is the short name of the target database platform. +modified with additional indexes and constraints, as desired. The file names are in the +form `schema-\*.sql`, where `*` is the short name of the target database platform. The scripts are in the package `org.springframework.batch.core`. [[migrationDDLScripts]] @@ -42,18 +42,18 @@ Spring Batch provides migration DDL scripts that you need to execute when you up These scripts can be found in the Core Jar file under `org/springframework/batch/core/migration`. Migration scripts are organized into folders corresponding to version numbers in which they were introduced: -* `2.2`: contains scripts needed if you are migrating from a version before `2.2` to version `2.2` -* `4.1`: contains scripts needed if you are migrating from a version before `4.1` to version `4.1` +* `2.2`: Contains scripts you need to migrate from a version before `2.2` to version `2.2` +* `4.1`: Contains scripts you need to migrate from a version before `4.1` to version `4.1` [[metaDataVersion]] ==== Version Many of the database tables discussed in this appendix contain a version column. This -column is important because Spring Batch employs an optimistic locking strategy when -dealing with updates to the database. This means that each time a record is 'touched' -(updated) the value in the version column is incremented by one. When the repository goes -back to save the value, if the version number has changed it throws an -`OptimisticLockingFailureException`, indicating there has been an error with concurrent +column is important, because Spring Batch employs an optimistic locking strategy when +dealing with updates to the database. This means that each time a record is "`touched`" +(updated), the value in the version column is incremented by one. When the repository goes +back to save the value, if the version number has changed, it throws an +`OptimisticLockingFailureException`, indicating that there has been an error with concurrent access. This check is necessary, since, even though different batch jobs may be running in different machines, they all use the same database tables. @@ -91,13 +91,13 @@ INSERT INTO BATCH_JOB_SEQ values(0); ---- In the preceding case, a table is used in place of each sequence. The Spring core class, -`MySQLMaxValueIncrementer`, then increments the one column in this sequence in order to +`MySQLMaxValueIncrementer`, then increments the one column in this sequence to give similar functionality. [[metaDataBatchJobInstance]] -=== `BATCH_JOB_INSTANCE` +=== The `BATCH_JOB_INSTANCE` Table -The `BATCH_JOB_INSTANCE` table holds all information relevant to a `JobInstance`, and +The `BATCH_JOB_INSTANCE` table holds all information relevant to a `JobInstance` and serves as the top of the overall hierarchy. The following generic DDL statement is used to create it: @@ -124,15 +124,15 @@ instances of the same job from one another. (`JobInstances` with the same job na have different `JobParameters` and, thus, different `JOB_KEY` values). [[metaDataBatchJobParams]] -=== `BATCH_JOB_EXECUTION_PARAMS` +=== The `BATCH_JOB_EXECUTION_PARAMS` Table The `BATCH_JOB_EXECUTION_PARAMS` table holds all information relevant to the `JobParameters` object. It contains 0 or more key/value pairs passed to a `Job` and serves as a record of the parameters with which a job was run. For each parameter that contributes to the generation of a job's identity, the `IDENTIFYING` flag is set to true. Note that the table has been denormalized. Rather than creating a separate table for each -type, there is one table with a column indicating the type, as shown in the following -listing: +type, there is one table with a column indicating the type, as the following +listing shows: [source, sql] ---- @@ -158,22 +158,22 @@ key/value pairs) may exist for each execution. * TYPE_CD: String representation of the type of value stored, which can be a string, a date, a long, or a double. Because the type must be known, it cannot be null. * KEY_NAME: The parameter key. -* STRING_VAL: Parameter value, if the type is string. -* DATE_VAL: Parameter value, if the type is date. -* LONG_VAL: Parameter value, if the type is long. -* DOUBLE_VAL: Parameter value, if the type is double. +* STRING_VAL: Parameter value if the type is string. +* DATE_VAL: Parameter value if the type is date. +* LONG_VAL: Parameter value if the type is long. +* DOUBLE_VAL: Parameter value if the type is double. * IDENTIFYING: Flag indicating whether the parameter contributed to the identity of the related `JobInstance`. Note that there is no primary key for this table. This is because the framework has no -use for one and, thus, does not require it. If need be, you can add a primary key may be -added with a database generated key without causing any issues to the framework itself. +use for one and, thus, does not require it. If need be, you can add a primary key +with a database generated key without causing any issues to the framework itself. [[metaDataBatchJobExecution]] -=== `BATCH_JOB_EXECUTION` +=== The `BATCH_JOB_EXECUTION` Table The `BATCH_JOB_EXECUTION` table holds all information relevant to the `JobExecution` -object. Every time a `Job` is run, there is always a new `JobExecution`, and a new row in +object. Every time a `Job` is run, there is always a new called `JobExecution` and a new row in this table. The following listing shows the definition of the `BATCH_JOB_EXECUTION` table: @@ -220,9 +220,9 @@ possible. * `LAST_UPDATED`: Timestamp representing the last time this execution was persisted. [[metaDataBatchStepExecution]] -=== `BATCH_STEP_EXECUTION` +=== The `BATCH_STEP_EXECUTION` Table -The BATCH_STEP_EXECUTION table holds all information relevant to the `StepExecution` +The `BATCH_STEP_EXECUTION` table holds all information relevant to the `StepExecution` object. This table is similar in many ways to the `BATCH_JOB_EXECUTION` table, and there is always at least one entry per `Step` for each `JobExecution` created. The following listing shows the definition of the `BATCH_STEP_EXECUTION` table: @@ -253,7 +253,7 @@ CREATE TABLE BATCH_STEP_EXECUTION ( ) ; ---- -The following list describes for each column: +The following list describes each column: * `STEP_EXECUTION_ID`: Primary key that uniquely identifies this execution. The value of this column should be obtainable by calling the `getId` method of the `StepExecution` @@ -291,13 +291,13 @@ possible. * `LAST_UPDATED`: Timestamp representing the last time this execution was persisted. [[metaDataBatchJobExecutionContext]] -=== `BATCH_JOB_EXECUTION_CONTEXT` +=== The `BATCH_JOB_EXECUTION_CONTEXT` Table The `BATCH_JOB_EXECUTION_CONTEXT` table holds all information relevant to the -`ExecutionContext` of a `Job`. There is exactly one `Job` `ExecutionContext` per +`ExecutionContext` of a `Job`. There is exactly one `Job` `ExecutionContext` for each `JobExecution`, and it contains all of the job-level data that is needed for a particular job execution. This data typically represents the state that must be retrieved after a -failure, so that a `JobInstance` can "start from where it left off". The following +failure, so that a `JobInstance` can "`start where it left off`". The following listing shows the definition of the `BATCH_JOB_EXECUTION_CONTEXT` table: [source, sql] @@ -319,14 +319,14 @@ belongs. There may be more than one row associated with a given execution. * `SERIALIZED_CONTEXT`: The entire context, serialized. [[metaDataBatchStepExecutionContext]] -=== `BATCH_STEP_EXECUTION_CONTEXT` +=== The `BATCH_STEP_EXECUTION_CONTEXT` Table The `BATCH_STEP_EXECUTION_CONTEXT` table holds all information relevant to the `ExecutionContext` of a `Step`. There is exactly one `ExecutionContext` per `StepExecution`, and it contains all of the data that needs to be persisted for a particular step execution. This data typically represents the -state that must be retrieved after a failure, so that a `JobInstance` can 'start from -where it left off'. The following listing shows the definition of the +state that must be retrieved after a failure so that a `JobInstance` can "`start +where it left off`". The following listing shows the definition of the `BATCH_STEP_EXECUTION_CONTEXT` table: [source, sql] @@ -343,7 +343,7 @@ CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT ( The following list describes each column: * `STEP_EXECUTION_ID`: Foreign key representing the `StepExecution` to which the context -belongs. There may be more than one row associated to a given execution. +belongs. There may be more than one row associated with a given execution. * `SHORT_CONTEXT`: A string version of the `SERIALIZED_CONTEXT`. * `SERIALIZED_CONTEXT`: The entire context, serialized. @@ -356,47 +356,47 @@ to show a record of what happened in the past and generally do not affect the ru job, with a few notable exceptions pertaining to restart: * The framework uses the metadata tables to determine whether a particular `JobInstance` -has been run before. If it has been run and if the job is not restartable, then an +has been run before. If it has been run and if the job is not restartable, an exception is thrown. * If an entry for a `JobInstance` is removed without having completed successfully, the framework thinks that the job is new rather than a restart. * If a job is restarted, the framework uses any data that has been persisted to the `ExecutionContext` to restore the `Job's` state. Therefore, removing any entries from this table for jobs that have not completed successfully prevents them from starting at -the correct point if run again. +the correct point if they are run again. [[multiByteCharacters]] === International and Multi-byte Characters -If you are using multi-byte character sets (such as Chinese or Cyrillic) in your business -processing, then those characters might need to be persisted in the Spring Batch schema. +If you use multi-byte character sets (such as Chinese or Cyrillic) in your business +processing, those characters might need to be persisted in the Spring Batch schema. Many users find that simply changing the schema to double the length of the `VARCHAR` -columns is enough. Others prefer to configure the +columns is enough. Others prefer to configure the <> with `max-varchar-length` half the value of the `VARCHAR` column length. Some users have also reported that they use `NVARCHAR` in place of `VARCHAR` in their schema definitions. The best result depends on the database platform and the way the database server has been configured locally. [[recommendationsForIndexingMetaDataTables]] -=== Recommendations for Indexing Meta Data Tables +=== Recommendations for Indexing Metadata Tables Spring Batch provides DDL samples for the metadata tables in the core jar file for several common database platforms. Index declarations are not included in that DDL, because there are too many variations in how users may want to index, depending on their precise platform, local conventions, and the business requirements of how the jobs are -operated. The following below provides some indication as to which columns are going to +operated. The following table provides some indication as to which columns are going to be used in a `WHERE` clause by the DAO implementations provided by Spring Batch and how -frequently they might be used, so that individual projects can make up their own minds +frequently they might be used so that individual projects can make up their own minds about indexing: .Where clauses in SQL statements (excluding primary keys) and their approximate frequency of use. |=============== |Default Table Name|Where Clause|Frequency -|BATCH_JOB_INSTANCE|JOB_NAME = ? and JOB_KEY = ?|Every time a job is launched -|BATCH_JOB_EXECUTION|JOB_INSTANCE_ID = ?|Every time a job is restarted -|BATCH_STEP_EXECUTION|VERSION = ?|On commit interval, a.k.a. chunk (and at start and end of +|`BATCH_JOB_INSTANCE`|`JOB_NAME = ? and JOB_KEY = ?`|Every time a job is launched +|`BATCH_JOB_EXECUTION`|`JOB_INSTANCE_ID = ?`|Every time a job is restarted +|`BATCH_STEP_EXECUTION`|`VERSION = ?`|On commit interval, a.k.a. chunk (and at start and end of step) -|BATCH_STEP_EXECUTION|STEP_NAME = ? and JOB_EXECUTION_ID = ?|Before each step execution +|`BATCH_STEP_EXECUTION`|`STEP_NAME = ? and JOB_EXECUTION_ID = ?`|Before each step execution |=============== diff --git a/spring-batch-docs/src/main/asciidoc/spring-batch-architecture.adoc b/spring-batch-docs/src/main/asciidoc/spring-batch-architecture.adoc new file mode 100644 index 0000000000..95e04c1e55 --- /dev/null +++ b/spring-batch-docs/src/main/asciidoc/spring-batch-architecture.adoc @@ -0,0 +1,427 @@ +[[springBatchArchitecture]] +=== Spring Batch Architecture +// TODO Make a separate document +Spring Batch is designed with extensibility and a diverse group of end users in mind. The +following image shows the layered architecture that supports the extensibility and ease of +use for end-user developers. + +.Spring Batch Layered Architecture +image::{batch-asciidoc}images/spring-batch-layers.png[Figure 1.1: Spring Batch Layered Architecture, scaledwidth="60%"] + +This layered architecture highlights three major high-level components: Application, +Core, and Infrastructure. The application contains all batch jobs and custom code written +by developers using Spring Batch. The Batch Core contains the core runtime classes +necessary to launch and control a batch job. It includes implementations for +`JobLauncher`, `Job`, and `Step`. Both Application and Core are built on top of a common +infrastructure. This infrastructure contains common readers and writers and services +(such as the `RetryTemplate`), which are used both by application developers(readers and +writers, such as `ItemReader` and `ItemWriter`), and the core framework itself (retry, +which is its own library). + +[[batchArchitectureConsiderations]] +==== General Batch Principles and Guidelines + +The following key principles, guidelines, and general considerations should be considered +when building a batch solution. + +* Remember that a batch architecture typically affects on-line architecture and vice +versa. Design with both architectures and environments in mind by using common building +blocks when possible. + +* Simplify as much as possible and avoid building complex logical structures in single +batch applications. + +* Keep the processing and storage of data physically close together (in other words, keep +your data where your processing occurs). + +* Minimize system resource use, especially I/O. Perform as many operations as possible in +internal memory. + +* Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O +is avoided. In particular, the following four common flaws need to be looked for: +** Reading data for every transaction when the data could be read once and cached or kept +in the working storage. +** Rereading data for a transaction where the data was read earlier in the same +transaction. +** Causing unnecessary table or index scans. +** Not specifying key values in the `WHERE` clause of an SQL statement. + +* Do not do things twice in a batch run. For instance, if you need data summarization for +reporting purposes, you should (if possible) increment stored totals when data is being +initially processed, so your reporting application does not have to reprocess the same +data. + +* Allocate enough memory at the beginning of a batch application to avoid time-consuming +reallocation during the process. + +* Always assume the worst with regard to data integrity. Insert adequate checks and +record validation to maintain data integrity. + +* Implement checksums for internal validation where possible. For example, flat files +should have a trailer record telling the total of records in the file and an aggregate of +the key fields. + +* Plan and execute stress tests as early as possible in a production-like environment +with realistic data volumes. + +* In large batch systems, backups can be challenging, especially if the system is running +concurrent with online applications on a 24-7 basis. Database backups are typically well taken care +of in online design, but file backups should be considered to be just as important. +If the system depends on flat files, file backup procedures should not only be in place +and documented but be regularly tested as well. + +[[batchProcessingStrategy]] +==== Batch Processing Strategies + +To help design and implement batch systems, basic batch application building blocks and +patterns should be provided to the designers and programmers in the form of sample +structure charts and code shells. When starting to design a batch job, the business logic +should be decomposed into a series of steps that can be implemented by using the following +standard building blocks: + +* __Conversion Applications:__ For each type of file supplied by or generated for an +external system, a conversion application must be created to convert the transaction +records supplied into a standard format required for processing. This type of batch +application can partly or entirely consist of translation utility modules (see Basic +Batch Services). +// TODO Add a link to "Basic Batch Services", once you discover where that content is. +* __Validation Applications:__ A validation application ensures that all input and output +records are correct and consistent. Validation is typically based on file headers and +trailers, checksums and validation algorithms, and record-level cross-checks. +* __Extract Applications:__ An extract application reads a set of records from a database or +input file, selects records based on predefined rules, and writes the records to an +output file. +* __Extract/Update Applications:__ An extract/update applications reads records from a database or +an input file and makes changes to a database or an output file, driven by the data found +in each input record. +* __Processing and Updating Applications:__ A processing and updating application performs processing on +input transactions from an extract or a validation application. The processing usually +involves reading a database to obtain data required for processing, potentially updating +the database and creating records for output processing. +* __Output/Format Applications:__ An output/format applications reads an input file, restructures data +from this record according to a standard format, and produces an output file for printing +or transmission to another program or system. + +Additionally, a basic application shell should be provided for business logic that cannot +be built by using the previously mentioned building blocks. +// TODO What is an example of such a system? + +In addition to the main building blocks, each application may use one or more standard +utility steps, such as: + +* Sort: A program that reads an input file and produces an output file where records +have been re-sequenced according to a sort key field in the records. Sorts are usually +performed by standard system utilities. +* Split: A program that reads a single input file and writes each record to one of +several output files based on a field value. Splits can be tailored or performed by +parameter-driven standard system utilities. +* Merge: A program that reads records from multiple input files and produces one output +file with combined data from the input files. Merges can be tailored or performed by +parameter-driven standard system utilities. + +Batch applications can additionally be categorized by their input source: + +* Database-driven applications are driven by rows or values retrieved from the database. +* File-driven applications are driven by records or values retrieved from a file. +* Message-driven applications are driven by messages retrieved from a message queue. + +The foundation of any batch system is the processing strategy. Factors affecting the +selection of the strategy include: estimated batch system volume, concurrency with +online systems or with other batch systems, available batch windows. (Note that, with +more enterprises wanting to be up and running 24x7, clear batch windows are +disappearing). + +Typical processing options for batch are (in increasing order of implementation +complexity): + +* Normal processing during a batch window in offline mode. +* Concurrent batch or online processing. +* Parallel processing of many different batch runs or jobs at the same time. +* Partitioning (processing of many instances of the same job at the same time). +* A combination of the preceding options. + +Some or all of these options may be supported by a commercial scheduler. + +The remainder of this section discusses these processing options in more detail. +Note that, as a rule of thumb, the commit and locking strategy adopted by batch +processes depends on the type of processing performed and that the online locking +strategy should also use the same principles. Therefore, the batch architecture cannot be +simply an afterthought when designing an overall architecture. + +The locking strategy can be to use only normal database locks or to implement an +additional custom locking service in the architecture. The locking service would track +database locking (for example, by storing the necessary information in a dedicated +database table) and give or deny permissions to the application programs requesting a database +operation. Retry logic could also be implemented by this architecture to avoid aborting a +batch job in case of a lock situation. + +*1. Normal processing in a batch window* For simple batch processes running in a separate +batch window where the data being updated is not required by online users or other batch +processes, concurrency is not an issue and a single commit can be done at the end of the +batch run. + +In most cases, a more robust approach is more appropriate. Keep in mind that batch +systems have a tendency to grow as time goes by, both in terms of complexity and the data +volumes they handle. If no locking strategy is in place and the system still relies on a +single commit point, modifying the batch programs can be painful. Therefore, even with +the simplest batch systems, consider the need for commit logic for restart-recovery +options as well as the information concerning the more complex cases described later in +this section. + +*2. Concurrent batch or on-line processing* Batch applications processing data that can +be simultaneously updated by online users should not lock any data (either in the +database or in files) that could be required by on-line users for more than a few +seconds. Also, updates should be committed to the database at the end of every few +transactions. Doing so minimizes the portion of data that is unavailable to other processes +and the elapsed time the data is unavailable. + +Another option to minimize physical locking is to have logical row-level locking +implemented with either an optimistic locking pattern or a pessimistic locking pattern. + +* Optimistic locking assumes a low likelihood of record contention. It typically means +inserting a timestamp column in each database table that is used concurrently by both batch and +online processing. When an application fetches a row for processing, it also fetches the +timestamp. As the application then tries to update the processed row, the update uses the +original timestamp in the `WHERE` clause. If the timestamp matches, the data and the +timestamp are updated. If the timestamp does not match, this indicates that another +application has updated the same row between the fetch and the update attempt. Therefore, +the update cannot be performed. + +* Pessimistic locking is any locking strategy that assumes there is a high likelihood of +record contention and, therefore, either a physical or a logical lock needs to be obtained at +retrieval time. One type of pessimistic logical locking uses a dedicated lock-column in +the database table. When an application retrieves the row for update, it sets a flag in +the lock column. With the flag in place, other applications attempting to retrieve the +same row logically fail. When the application that sets the flag updates the row, it also +clears the flag, enabling the row to be retrieved by other applications. Note that +the integrity of data must be maintained also between the initial fetch and the setting +of the flag -- for example, by using database locks (such as `SELECT FOR UPDATE`). Note also that +this method suffers from the same downside as physical locking except that it is somewhat +easier to manage building a time-out mechanism that gets the lock released if the user +goes to lunch while the record is locked. + +These patterns are not necessarily suitable for batch processing, but they might be used +for concurrent batch and online processing (such as in cases where the database does not +support row-level locking). As a general rule, optimistic locking is more suitable for +online applications, while pessimistic locking is more suitable for batch applications. +Whenever logical locking is used, the same scheme must be used for all applications +that access the data entities protected by logical locks. + +Note that both of these solutions only address locking a single record. Often, we may +need to lock a logically related group of records. With physical locks, you have to +manage these very carefully to avoid potential deadlocks. With logical locks, it +is usually best to build a logical lock manager that understands the logical record +groups you want to protect and that can ensure that locks are coherent and +non-deadlocking. This logical lock manager usually uses its own tables for lock +management, contention reporting, time-out mechanism, and other concerns. + +*3. Parallel Processing* Parallel processing lets multiple batch runs or jobs run in +parallel to minimize the total elapsed batch processing time. This is not a problem as +long as the jobs are not sharing the same files, database tables, or index spaces. If they do, +this service should be implemented by using partitioned data. Another option is to build an +architecture module for maintaining interdependencies by using a control table. A control +table should contain a row for each shared resource and whether it is in use by an +application or not. The batch architecture or the application in a parallel job would +then retrieve information from that table to determine whether it can get access to the +resource it needs. + +If the data access is not a problem, parallel processing can be implemented through the +use of additional threads to process in parallel. In a mainframe environment, parallel +job classes have traditionally been used, to ensure adequate CPU time for all +the processes. Regardless, the solution has to be robust enough to ensure time slices for +all the running processes. + +Other key issues in parallel processing include load balancing and the availability of +general system resources, such as files, database buffer pools, and so on. Also, note that +the control table itself can easily become a critical resource. + +*4. Partitioning* Using partitioning lets multiple versions of large batch applications +run concurrently. The purpose of this is to reduce the elapsed time required to +process long batch jobs. Processes that can be successfully partitioned are those where +the input file can be split or the main database tables partitioned to let the +application run against different sets of data. + +In addition, processes that are partitioned must be designed to process only their +assigned data set. A partitioning architecture has to be closely tied to the database +design and the database partitioning strategy. Note that database partitioning does not +necessarily mean physical partitioning of the database (although, in most cases, this is +advisable). The following image illustrates the partitioning approach: + +.Partitioned Process +image::{batch-asciidoc}images/partitioned.png[Figure 1.2: Partitioned Process, scaledwidth="60%"] + +The architecture should be flexible enough to allow dynamic configuration of the number +of partitions. You shoul consider both automatic and user controlled configuration. +Automatic configuration may be based on such parameters as the input file size and the +number of input records. + +*4.1 Partitioning Approaches* Selecting a partitioning approach has to be done on a +case-by-case basis. The following list describes some of the possible partitioning +approaches: + +_1. Fixed and Even Break-Up of Record Set_ + +This involves breaking the input record set into an even number of portions (for example, +10, where each portion has exactly 1/10th of the entire record set). Each portion is then +processed by one instance of the batch/extract application. + +To use this approach, preprocessing is required to split the record set up. The +result of this split is a lower and upper bound placement number that you can use +as input to the batch/extract application to restrict its processing to only its +portion. + +Preprocessing could be a large overhead, as it has to calculate and determine the bounds +of each portion of the record set. + +_2. Break up by a Key Column_ + +This involves breaking up the input record set by a key column, such as a location code, +and assigning data from each key to a batch instance. To achieve this, column +values can be either: + +* Assigned to a batch instance by a partitioning table (described later in this +section). + +* Assigned to a batch instance by a portion of the value (such as 0000-0999, 1000 - 1999, +and so on). + +Under option 1, adding new values means a manual reconfiguration of the batch or extract to +ensure that the new value is added to a particular instance. + +Under option 2, this ensures that all values are covered by an instance of the batch +job. However, the number of values processed by one instance is dependent on the +distribution of column values (there may be a large number of locations in the 0000-0999 +range and few in the 1000-1999 range). Under this option, the data range should be +designed with partitioning in mind. + +Under both options, the optimal even distribution of records to batch instances cannot be +realized. There is no dynamic configuration of the number of batch instances used. + +_3. Breakup by Views_ + +This approach is basically breakup by a key column but on the database level. It involves +breaking up the record set into views. These views are used by each instance of the batch +application during its processing. The breakup is done by grouping the data. + +With this option, each instance of a batch application has to be configured to hit a +particular view (instead of the main table). Also, with the addition of new data +values, this new group of data has to be included into a view. There is no dynamic +configuration capability, as a change in the number of instances results in a change to +the views. + +_4. Addition of a Processing Indicator_ + +This involves the addition of a new column to the input table, which acts as an +indicator. As a preprocessing step, all indicators are marked as being non-processed. +During the record fetch stage of the batch application, records are read on the condition +that an individual record is marked as being non-processed, and, once it is read (with lock), +it is marked as being in processing. When that record is completed, the indicator is +updated to either complete or error. You can start many instances of a batch application +without a change, as the additional column ensures that a record is only processed once. +// TODO On completion, what is the record marked as? Same for on error. (I expected a +// sentence or two on the order of "On completion, indicators are marked as having +// a particular status.") + +With this option, I/O on the table increases dynamically. In the case of an updating +batch application, this impact is reduced, as a write must occur anyway. + +_5. Extract Table to a Flat File_ + +This approach involves the extraction of the table into a flat file. This file can then be split into +multiple segments and used as input to the batch instances. + +With this option, the additional overhead of extracting the table into a file and +splitting it may cancel out the effect of multi-partitioning. Dynamic configuration can +be achieved by changing the file splitting script. + +_6. Use of a Hashing Column_ + +This scheme involves the addition of a hash column (key or index) to the database tables +used to retrieve the driver record. This hash column has an indicator to determine which +instance of the batch application processes this particular row. For example, if there +are three batch instances to be started, an indicator of 'A' marks a row for +processing by instance 1, an indicator of 'B' marks a row for processing by instance 2, +and an indicator of 'C' marks a row for processing by instance 3. + +The procedure used to retrieve the records would then have an additional `WHERE` clause +to select all rows marked by a particular indicator. The inserts in this table would +involve the addition of the marker field, which would be defaulted to one of the +instances (such as 'A'). + +A simple batch application would be used to update the indicators, such as to +redistribute the load between the different instances. When a sufficiently large number +of new rows have been added, this batch can be run (anytime, except in the batch window) +to redistribute the new rows to other instances. + +Additional instances of the batch application require only the running of the batch +application (as described in the preceding paragraphs) to redistribute the indicators to +work with a new number of instances. + +*4.2 Database and Application Design Principles* + +An architecture that supports multi-partitioned applications that run against +partitioned database tables and use the key column approach should include a central +partition repository for storing partition parameters. This provides flexibility and +ensures maintainability. The repository generally consists of a single table, known as +the partition table. + +Information stored in the partition table is static and, in general, should be maintained +by the DBA. The table should consist of one row of information for each partition of a +multi-partitioned application. The table should have columns for Program ID Code, +Partition Number (the logical ID of the partition), Low Value of the database key column for this +partition, and High Value of the database key column for this partition. + +On program start-up, the program `id` and partition number should be passed to the +application from the architecture (specifically, from the control processing tasklet). If +a key column approach is used, these variables are used to read the partition table +to determine what range of data the application is to process. In addition, the +partition number must be used throughout the processing to: + +* Add to the output files or database updates, for the merge process to work +properly. +* Report normal processing to the batch log and any errors to the architecture error +handler. + +*4.3 Minimizing Deadlocks* + +When applications run in parallel or are partitioned, contention for database resources +and deadlocks may occur. It is critical that the database design team eliminate +potential contention situations as much as possible, as part of the database design. + +Also, the developers must ensure that the database index tables are designed with +deadlock prevention and performance in mind. + +Deadlocks or hot spots often occur in administration or architecture tables, such as log +tables, control tables, and lock tables. The implications of these should be taken into +account as well. Realistic stress tests are crucial for identifying the possible +bottlenecks in the architecture. + +To minimize the impact of conflicts on data, the architecture should provide services +(such as wait-and-retry intervals) when attaching to a database or when encountering a +deadlock. This means a built-in mechanism to react to certain database return codes and, +instead of issuing an immediate error, waiting a predetermined amount of time and +retrying the database operation. + +*4.4 Parameter Passing and Validation* + +The partition architecture should be relatively transparent to application developers. +The architecture should perform all tasks associated with running the application in a +partitioned mode, including: + +* Retrieving partition parameters before application start-up. +* Validating partition parameters before application start-up. +* Passing parameters to the application at start-up. + +The validation should include checks to ensure that: + +* The application has sufficient partitions to cover the whole data range. +* There are no gaps between partitions. + +If the database is partitioned, some additional validation may be necessary to ensure +that a single partition does not span database partitions. + +Also, the architecture should take into consideration the consolidation of partitions. +Key questions include: + +* Must all the partitions be finished before going into the next job step? +* What happens if one of the partitions aborts? diff --git a/spring-batch-docs/src/main/asciidoc/spring-batch-integration.adoc b/spring-batch-docs/src/main/asciidoc/spring-batch-integration.adoc index 3aef9c2fdb..cd4f41816c 100644 --- a/spring-batch-docs/src/main/asciidoc/spring-batch-integration.adoc +++ b/spring-batch-docs/src/main/asciidoc/spring-batch-integration.adoc @@ -3,16 +3,12 @@ :toclevels: 4 [[springBatchIntegration]] - == Spring Batch Integration ifndef::onlyonetoggle[] include::toggle.adoc[] endif::onlyonetoggle[] -[[spring-batch-integration-introduction]] -=== Spring Batch Integration Introduction - Many users of Spring Batch may encounter requirements that are outside the scope of Spring Batch but that may be efficiently and concisely implemented by using Spring Integration. Conversely, Spring @@ -23,41 +19,36 @@ addresses those requirements. The line between Spring Batch and Spring Integration is not always clear, but two pieces of advice can -help: Think about granularity, and apply common patterns. Some -of those common patterns are described in this reference manual -section. +help: Thinking about granularity and applying common patterns. Some +of those common patterns are described in this section. Adding messaging to a batch process enables automation of operations and also separation and strategizing of key concerns. -For example, a message might trigger a job to execute, and then the -sending of the message can be exposed in a variety of ways. Alternatively, when +For example, a message might trigger a job to execute, and then +sending the message can be exposed in a variety of ways. Alternatively, when a job completes or fails, that event might trigger a message to be sent, and the consumers of those messages might have operational concerns that have nothing to do with the application itself. Messaging can -also be embedded in a job (for example reading or writing items for -processing via channels). Remote partitioning and remote chunking +also be embedded in a job (for example, reading or writing items for +processing through channels). Remote partitioning and remote chunking provide methods to distribute workloads over a number of workers. This section covers the following key concepts: [role="xmlContent"] -* <> -[[continue-section-list]] -* <> -* <> -* <> -* <> - - +* <> +* <> +* <> +* <> +* <> [[namespace-support]] [role="xmlContent"] ==== Namespace Support -Since Spring Batch Integration 1.3, dedicated XML Namespace -support was added, with the aim to provide an easier configuration -experience. In order to activate the namespace, add the following +Dedicated XML namespace support was added to Spring Batch Integration in version 1.3, +with the aim to provide an easier configuration +experience. To use the namespace, add the following namespace declarations to your Spring XML Application Context file: @@ -75,8 +66,8 @@ file: ---- -A fully configured Spring XML Application Context file for Spring -Batch Integration may look like the following: +The following example shows a fully configured Spring XML application context file for Spring +Batch Integration: [source, xml] ---- @@ -101,7 +92,7 @@ Batch Integration may look like the following: ---- Appending version numbers to the referenced XSD file is also -allowed, but, as a version-less declaration always uses the +allowed. However, because a version-less declaration always uses the latest schema, we generally do not recommend appending the version number to the XSD name. Adding a version number could possibly create issues when updating the Spring Batch @@ -110,59 +101,49 @@ of the XML schema. [[launching-batch-jobs-through-messages]] - ==== Launching Batch Jobs through Messages - When starting batch jobs by using the core Spring Batch API, you -basically have 2 options: +basically have two options: * From the command line, with the `CommandLineJobRunner` * Programmatically, with either `JobOperator.start()` or `JobLauncher.run()` - - For example, you may want to use the -`CommandLineJobRunner` when invoking Batch Jobs by -using a shell script. Alternatively, you may use the +`CommandLineJobRunner` when invoking batch jobs by +using a shell script. Alternatively, you can use the `JobOperator` directly (for example, when using Spring Batch as part of a web application). However, what about more complex use cases? Maybe you need to poll a remote (S)FTP server to retrieve the data for the Batch Job or your application has to support multiple different data sources simultaneously. For -example, you may receive data files not only from the web, but also from +example, you may receive data files not only from the web but also from FTP and other sources. Maybe additional transformation of the input files is needed before invoking Spring Batch. - - Therefore, it would be much more powerful to execute the batch job -using Spring Integration and its numerous adapters. For example, -you can use a __File Inbound Channel Adapter__ to -monitor a directory in the file-system and start the Batch Job as +by using Spring Integration and its numerous adapters. For example, +you can use a _File Inbound Channel Adapter_ to +monitor a directory in the file-system and start the batch job as soon as the input file arrives. Additionally, you can create Spring Integration flows that use multiple different adapters to easily ingest data for your batch jobs from multiple sources -simultaneously using only configuration. Implementing all these +simultaneously by using only configuration. Implementing all these scenarios with Spring Integration is easy, as it allows for decoupled, event-driven execution of the `JobLauncher`. - - Spring Batch Integration provides the `JobLaunchingMessageHandler` class that you can use to launch batch jobs. The input for the `JobLaunchingMessageHandler` is provided by a Spring Integration message, which has a payload of type `JobLaunchRequest`. This class is a wrapper around the `Job` - that needs to be launched and around the `JobParameters` +to be launched and around the `JobParameters` that are necessary to launch the Batch job. - - -The following image illustrates the typical Spring Integration -message flow in order to start a Batch job. The +The following image shows the typical Spring Integration +message flow that is needed to start a Batch job. The link:$$https://www.enterpriseintegrationpatterns.com/toc.html$$[EIP (Enterprise Integration Patterns) website] provides a full overview of messaging icons and their descriptions. @@ -171,9 +152,9 @@ image::{batch-asciidoc}images/launch-batch-job.png[Launch Batch Job, scaledwidth [[transforming-a-file-into-a-joblaunchrequest]] +===== Transforming a File into a JobLaunchRequest -===== Transforming a file into a JobLaunchRequest - +The following example transforms a file into a `JobLaunchRequest`: [source, java] ---- @@ -213,18 +194,15 @@ public class FileMessageToJobRequest { ---- [[the-jobexecution-response]] - -===== The `JobExecution` Response +===== The JobExecution Response When a batch job is being executed, a -`JobExecution` instance is returned. This -instance can be used to determine the status of an execution. If +`JobExecution` instance is returned. You can use this +instance to determine the status of an execution. If a `JobExecution` is able to be created successfully, it is always returned, regardless of whether or not the actual execution is successful. - - The exact behavior on how the `JobExecution` instance is returned depends on the provided `TaskExecutor`. If a @@ -235,29 +213,24 @@ instance is returned depends on the provided `asynchronous` `TaskExecutor`, the `JobExecution` instance is returned -immediately. Users can then take the `id` of +immediately. You can then take the `id` of `JobExecution` instance (with `JobExecution.getJobId()`) and query the `JobRepository` for the job's updated status using the `JobExplorer`. For more -information, please refer to the Spring -Batch reference documentation on +information, see <>. - - [[spring-batch-integration-configuration]] - ===== Spring Batch Integration Configuration Consider a case where someone needs to create a file `inbound-channel-adapter` to listen for CSV files in the provided directory, hand them off to a transformer -(`FileMessageToJobRequest`), launch the job through the _Job Launching Gateway_, and then +(`FileMessageToJobRequest`), launch the job through the job launching gateway, and log the output of the `JobExecution` with the `logging-channel-adapter`. [role="xmlContent"] The following example shows how that common case can be configured in XML: - .XML Configuration [source, xml, role="xmlContent"] ---- @@ -328,7 +301,7 @@ public IntegrationFlow integrationFlow(JobLaunchingGateway jobLaunchingGateway) Now that we are polling for files and launching jobs, we need to configure our Spring Batch `ItemReader` (for example) to use the files found at the location defined by the job -parameter called "input.file.name", as shown in the following bean configuration: +parameter called "input.file.name", as the following bean configuration shows: [role="xmlContent"] The following XML example shows the necessary bean configuration: @@ -363,7 +336,7 @@ public ItemReader sampleReader(@Value("#{jobParameters[input.file.name]}") Strin The main points of interest in the preceding example are injecting the value of `#{jobParameters['input.file.name']}` as the Resource property value and setting the `ItemReader` bean -to have __Step scope__. Setting the bean to have Step scope takes advantage of +to have step scope. Setting the bean to have step scope takes advantage of the late binding support, which allows access to the `jobParameters` variable. @@ -377,14 +350,14 @@ The job-launching gateway has the following attributes that you can set to contr ** `EventDrivenConsumer` ** `PollingConsumer` (The exact implementation depends on whether the component's input channel is a -`SubscribableChannel` or `PollableChannel`.) +`SubscribableChannel` or a `PollableChannel`.) * `auto-startup`: Boolean flag to indicate that the endpoint should start automatically on -startup. The default is __true__. +startup. The default is `true`. * `request-channel`: The input `MessageChannel` of this endpoint. * `reply-channel`: `MessageChannel` to which the resulting `JobExecution` payload is sent. * `reply-timeout`: Lets you specify how long (in milliseconds) this gateway waits for the reply message to be sent successfully to the reply channel before throwing -an exception. This attribute only applies when the channel +an exception. This attribute applies only when the channel might block (for example, when using a bounded queue channel that is currently full). Also, keep in mind that, when sending to a `DirectChannel`, the invocation occurs @@ -393,20 +366,21 @@ operation may be caused by other components further downstream. The `reply-timeout` attribute maps to the `sendTimeout` property of the underlying `MessagingTemplate` instance. If not specified, the attribute -defaults to-1, +defaults to -1, meaning that, by default, the `Gateway` waits indefinitely. * `job-launcher`: Optional. Accepts a custom `JobLauncher` bean reference. -If not specified the adapter +If not specified, the adapter re-uses the instance that is registered under the `id` of `jobLauncher`. If no default instance exists, an exception is thrown. * `order`: Specifies the order of invocation when this endpoint is connected as a subscriber to a `SubscribableChannel`. -=== Sub-Elements +=== Sub-elements + When this `Gateway` is receiving messages from a `PollableChannel`, you must either provide a global default `Poller` or provide a `Poller` sub-element to the @@ -443,7 +417,7 @@ public JobLaunchingGateway sampleJobLaunchingGateway() { ==== Providing Feedback with Informational Messages As Spring Batch jobs can run for long times, providing progress -information is often critical. For example, stake-holders may want +information is often critical. For example, stakeholders may want to be notified if some or all parts of a batch job have failed. Spring Batch provides support for this information being gathered through: @@ -451,11 +425,11 @@ through: * Active polling * Event-driven listeners -When starting a Spring Batch job asynchronously (for example, by using the `Job Launching -Gateway`), a `JobExecution` instance is returned. Thus, `JobExecution.getJobId()` can be -used to continuously poll for status updates by retrieving updated instances of the +When starting a Spring Batch job asynchronously (for example, by using the Job Launching +Gateway), a `JobExecution` instance is returned. Thus, you can use `JobExecution.getJobId()` +to continuously poll for status updates by retrieving updated instances of the `JobExecution` from the `JobRepository` by using the `JobExplorer`. However, this is -considered sub-optimal, and an event-driven approach should be preferred. +considered sub-optimal, and an event-driven approach is preferred. Therefore, Spring Batch provides listeners, including the three most commonly used listeners: @@ -466,15 +440,14 @@ listeners: In the example shown in the following image, a Spring Batch job has been configured with a `StepExecutionListener`. Thus, Spring Integration receives and processes any step before -or after events. For example, the received `StepExecution` can be inspected by using a +or after events. For example, you can inspect the received `StepExecution` by using a `Router`. Based on the results of that inspection, various things can occur (such as -routing a message to a Mail Outbound Channel Adapter), so that an Email notification can +routing a message to a mail outbound channel adapter), so that an email notification can be sent out based on some condition. .Handling Informational Messages image::{batch-asciidoc}images/handling-informational-messages.png[Handling Informational Messages, scaledwidth="60%"] - The following two-part example shows how a listener is configured to send a message to a `Gateway` for a `StepExecution` events and log its output to a `logging-channel-adapter`. @@ -553,21 +526,20 @@ public Job importPaymentsJob() { .chunk(200) .listener(notificationExecutionsListener()) ... + ) } ---- [[asynchronous-processors]] - ==== Asynchronous Processors - -Asynchronous Processors help you to scale the processing of items. In the asynchronous +Asynchronous Processors help you scale the processing of items. In the asynchronous processor use case, an `AsyncItemProcessor` serves as a dispatcher, executing the logic of the `ItemProcessor` for an item on a new thread. Once the item completes, the `Future` is passed to the `AsynchItemWriter` to be written. Therefore, you can increase performance by using asynchronous item processing, basically -letting you implement _fork-join_ scenarios. The `AsyncItemWriter` gathers the results and +letting you implement fork-join scenarios. The `AsyncItemWriter` gathers the results and writes back the chunk as soon as all the results become available. [role="xmlContent"] @@ -638,35 +610,29 @@ actually a reference to your `ItemWriter` bean. [[externalizing-batch-process-execution]] - ==== Externalizing Batch Process Execution - The integration approaches discussed so far suggest use cases -where Spring Integration wraps Spring Batch like an outer-shell. +where Spring Integration wraps Spring Batch like an outer shell. However, Spring Batch can also use Spring Integration internally. -Using this approach, Spring Batch users can delegate the +By using this approach, Spring Batch users can delegate the processing of items or even chunks to outside processes. This -allows you to offload complex processing. Spring Batch Integration +lets you offload complex processing. Spring Batch Integration provides dedicated support for: - - * Remote Chunking - - - * Remote Partitioning - [[remote-chunking]] - ===== Remote Chunking +The following image shows one way that remote chunking works when you use Spring Batch +together with Spring Integration: + .Remote Chunking image::{batch-asciidoc}images/remote-chunking-sbi.png[Remote Chunking, scaledwidth="60%"] -Taking things one step further, one can also externalize the +Taking things one step further, you can also externalize the chunk processing by using the `ChunkMessageChannelItemWriter` (provided by Spring Batch Integration), which sends items out @@ -675,13 +641,12 @@ process of reading and grouping items, without waiting for the results. Rather, it is the responsibility of the `ChunkMessageChannelItemWriter` to gather the results and integrate them back into the Spring Batch process. - With Spring Integration, you have full control over the concurrency of your processes (for instance, by using a `QueueChannel` instead of a `DirectChannel`). Furthermore, by relying on -Spring Integration's rich collection of Channel Adapters (such as -JMS and AMQP), you can distribute chunks of a Batch job to +Spring Integration's rich collection of channel adapters (such as +JMS and AMQP), you can distribute chunks of a batch job to external systems for processing. [role="xmlContent"] @@ -721,7 +686,7 @@ public Job chunkJob() { The `ItemReader` reference points to the bean you want to use for reading data on the manager. The `ItemWriter` reference points to a special `ItemWriter` (called -`ChunkMessageChannelItemWriter`), as described above. The processor (if any) is left off +`ChunkMessageChannelItemWriter`), as described earlier. The processor (if any) is left off the manager configuration, as it is configured on the worker. You should check any additional component properties, such as throttle limits and so on, when implementing your use case. @@ -822,14 +787,14 @@ public ItemWriter itemWriter() { ---- The preceding configuration provides us with a number of beans. We -configure our messaging middleware using ActiveMQ and the -inbound/outbound JMS adapters provided by Spring Integration. As +configure our messaging middleware by using ActiveMQ and the +inbound and outbound JMS adapters provided by Spring Integration. As shown, our `itemWriter` bean, which is referenced by our job step, uses the -`ChunkMessageChannelItemWriter` for writing chunks over the +`ChunkMessageChannelItemWriter` to write chunks over the configured middleware. -Now we can move on to the worker configuration, as shown in the following example: +Now we can move on to the worker configuration, as the following example shows: [role="xmlContent"] The following example shows the worker configuration in XML: @@ -945,17 +910,17 @@ configured `SimpleChunkProcessor`, which is where you would provide a reference `ItemProcessor`) that will run on the worker when it receives chunks from the manager. -For more information, see the section of the "Scalability" chapter on +For more information, see the section of the "`Scalability`" chapter on link:$$https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#remoteChunking$$[Remote Chunking]. Starting from version 4.1, Spring Batch Integration introduces the `@EnableBatchIntegration` annotation that can be used to simplify a remote chunking setup. This annotation provides -two beans that can be autowired in the application context: +two beans that you can autowire in your application context: -* `RemoteChunkingManagerStepBuilderFactory`: used to configure the manager step -* `RemoteChunkingWorkerBuilder`: used to configure the remote worker integration flow +* `RemoteChunkingManagerStepBuilderFactory`: Configures the manager step +* `RemoteChunkingWorkerBuilder`: Configures the remote worker integration flow -These APIs take care of configuring a number of components as described in the following diagram: +These APIs take care of configuring a number of components, as the following diagram shows: .Remote Chunking Configuration image::{batch-asciidoc}images/remote-chunking-config.png[Remote Chunking Configuration, scaledwidth="80%"] @@ -963,22 +928,23 @@ image::{batch-asciidoc}images/remote-chunking-config.png[Remote Chunking Configu On the manager side, the `RemoteChunkingManagerStepBuilderFactory` lets you configure a manager step by declaring: -* the item reader to read items and send them to workers -* the output channel ("Outgoing requests") to send requests to workers -* the input channel ("Incoming replies") to receive replies from workers +* The item reader to read items and send them to workers +* The output channel ("Outgoing requests") to send requests to workers +* The input channel ("Incoming replies") to receive replies from workers -A `ChunkMessageChannelItemWriter` and the `MessagingTemplate` are not needed to be explicitly configured -(Those can still be explicitly configured if required). +You need not explicitly configure `ChunkMessageChannelItemWriter` and the `MessagingTemplate`. +(You can still explicitly configure them if find a reason to do so). -On the worker side, the `RemoteChunkingWorkerBuilder` allows you to configure a worker to: +On the worker side, the `RemoteChunkingWorkerBuilder` lets you configure a worker to: -* listen to requests sent by the manager on the input channel ("Incoming requests") -* call the `handleChunk` method of `ChunkProcessorChunkHandler` for each request +* Listen to requests sent by the manager on the input channel ("`Incoming requests`") +* Call the `handleChunk` method of `ChunkProcessorChunkHandler` for each request with the configured `ItemProcessor` and `ItemWriter` -* send replies on the output channel ("Outgoing replies") to the manager +* Send replies on the output channel ("`Outgoing replies`") to the manager -There is no need to explicitly configure the `SimpleChunkProcessor` -and the `ChunkProcessorChunkHandler` (Those can be explicitly configured if required). +You need not explicitly configure the `SimpleChunkProcessor` +and the `ChunkProcessorChunkHandler`. (You can still explicitly configure them if you find + a reason to do so). The following example shows how to use these APIs: @@ -1035,32 +1001,28 @@ You can find a complete example of a remote chunking job link:$$https://github.com/spring-projects/spring-batch/tree/main/spring-batch-samples#remote-chunking-sample$$[here]. [[remote-partitioning]] - ===== Remote Partitioning +The following image shows a typical remote partitioning situation: + .Remote Partitioning image::{batch-asciidoc}images/remote-partitioning.png[Remote Partitioning, scaledwidth="60%"] - Remote Partitioning, on the other hand, is useful when it is not the processing of items but rather the associated I/O that -causes the bottleneck. Using Remote Partitioning, work can -be farmed out to workers that execute complete Spring Batch +causes the bottleneck. With remote partitioning, you can send work +to workers that execute complete Spring Batch steps. Thus, each worker has its own `ItemReader`, `ItemProcessor`, and `ItemWriter`. For this purpose, Spring Batch Integration provides the `MessageChannelPartitionHandler`. - - This implementation of the `PartitionHandler` interface uses `MessageChannel` instances to send instructions to remote workers and receive their responses. This provides a nice abstraction from the transports (such as JMS and AMQP) being used to communicate with the remote workers. - - -The section of the "Scalability" chapter that addresses +The section of the "`Scalability`" chapter that addresses <> provides an overview of the concepts and components needed to configure remote partitioning and shows an example of using the default @@ -1072,9 +1034,7 @@ to multiple JVMs, two additional components are required: * A `PartitionHandler` implementation that supports the desired remoting fabric or grid environment - - -Similar to remote chunking, JMS can be used as the "`remoting fabric`". In that case, use +Similar to remote chunking, you can use JMS as the "`remoting fabric`". In that case, use a `MessageChannelPartitionHandler` instance as the `PartitionHandler` implementation, as described earlier. @@ -1141,7 +1101,6 @@ The following example assumes an existing partitioned job and focuses on the .Java Configuration [source, java, role="javaContent"] ---- - /* * Configuration of the manager side */ @@ -1284,13 +1243,13 @@ Java: You can find a complete example of a remote partitioning job link:$$https://github.com/spring-projects/spring-batch/tree/main/spring-batch-samples#remote-partitioning-sample$$[here]. -The `@EnableBatchIntegration` annotation that can be used to simplify a remote - partitioning setup. This annotation provides two beans useful for remote partitioning: +You can use the `@EnableBatchIntegration` annotation to simplify a remote +partitioning setup. This annotation provides two beans that are useful for remote partitioning: -* `RemotePartitioningManagerStepBuilderFactory`: used to configure the manager step -* `RemotePartitioningWorkerStepBuilderFactory`: used to configure the worker step +* `RemotePartitioningManagerStepBuilderFactory`: Configures the manager step +* `RemotePartitioningWorkerStepBuilderFactory`: Configures the worker step -These APIs take care of configuring a number of components as described in the following diagram: +These APIs take care of configuring a number of components, as the following diagrams show: .Remote Partitioning Configuration (with job repository polling) image::{batch-asciidoc}images/remote-partitioning-polling-config.png[Remote Partitioning Configuration (with job repository polling), scaledwidth="80%"] @@ -1298,24 +1257,25 @@ image::{batch-asciidoc}images/remote-partitioning-polling-config.png[Remote Part .Remote Partitioning Configuration (with replies aggregation) image::{batch-asciidoc}images/remote-partitioning-aggregation-config.png[Remote Partitioning Configuration (with replies aggregation), scaledwidth="80%"] -On the manager side, the `RemotePartitioningManagerStepBuilderFactory` allows you to +On the manager side, the `RemotePartitioningManagerStepBuilderFactory` lets you configure a manager step by declaring: -* the `Partitioner` used to partition data -* the output channel ("Outgoing requests") to send requests to workers -* the input channel ("Incoming replies") to receive replies from workers (when configuring replies aggregation) -* the poll interval and timeout parameters (when configuring job repository polling) +* The `Partitioner` used to partition data +* The output channel ("`Outgoing requests`") on which to send requests to workers +* The input channel ("`Incoming replies`") on which to receive replies from workers (when configuring replies aggregation) +* The poll interval and timeout parameters (when configuring job repository polling) -The `MessageChannelPartitionHandler` and the `MessagingTemplate` are not needed to be explicitly configured -(Those can still be explicitly configured if required). +You need not explicitly configure The `MessageChannelPartitionHandler` and the `MessagingTemplate`. +(You can still explicitly configured them if you find a reason to do so). -On the worker side, the `RemotePartitioningWorkerStepBuilderFactory` allows you to configure a worker to: +On the worker side, the `RemotePartitioningWorkerStepBuilderFactory` lets you configure a worker to: -* listen to requests sent by the manager on the input channel ("Incoming requests") -* call the `handle` method of `StepExecutionRequestHandler` for each request -* send replies on the output channel ("Outgoing replies") to the manager +* Listen to requests sent by the manager on the input channel ("`Incoming requests`") +* Call the `handle` method of `StepExecutionRequestHandler` for each request +* Send replies on the output channel ("`Outgoing replies`") to the manager -There is no need to explicitly configure the `StepExecutionRequestHandler` (which can be explicitly configured if required). +You need not explicitly configure the `StepExecutionRequestHandler`. +(You can explicitly configure it if you find a reason to do so). The following example shows how to use these APIs: diff --git a/spring-batch-docs/src/main/asciidoc/spring-batch-intro.adoc b/spring-batch-docs/src/main/asciidoc/spring-batch-intro.adoc index 0755938822..1e92b13066 100644 --- a/spring-batch-docs/src/main/asciidoc/spring-batch-intro.adoc +++ b/spring-batch-docs/src/main/asciidoc/spring-batch-intro.adoc @@ -24,7 +24,7 @@ endif::[] == Spring Batch Introduction Many applications within the enterprise domain require bulk processing to perform -business operations in mission critical environments. These business operations include: +business operations in mission-critical environments. These business operations include: * Automated, complex processing of large volumes of information that is most efficiently processed without user interaction. These operations typically include time-based events @@ -37,27 +37,26 @@ the system of record. Batch processing is used to process billions of transactio day for enterprises. Spring Batch is a lightweight, comprehensive batch framework designed to enable the -development of robust batch applications vital for the daily operations of enterprise +development of robust batch applications that are vital for the daily operations of enterprise systems. Spring Batch builds upon the characteristics of the Spring Framework that people have come to expect (productivity, POJO-based development approach, and general ease of -use), while making it easy for developers to access and leverage more advance enterprise +use), while making it easy for developers to access and use more advanced enterprise services when necessary. Spring Batch is not a scheduling framework. There are many good -enterprise schedulers (such as Quartz, Tivoli, Control-M, etc.) available in both the -commercial and open source spaces. It is intended to work in conjunction with a -scheduler, not replace a scheduler. +enterprise schedulers (such as Quartz, Tivoli, Control-M, and others) available in both the +commercial and open source spaces. Spring Batch is intended to work in conjunction with a +scheduler rather than replace a scheduler. Spring Batch provides reusable functions that are essential in processing large volumes -of records, including logging/tracing, transaction management, job processing statistics, +of records, including logging and tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that enable extremely high-volume and high performance batch jobs -through optimization and partitioning techniques. Spring Batch can be used in both simple -use cases (such as reading a file into a database or running a stored procedure) as well -as complex, high volume use cases (such as moving high volumes of data between databases, -transforming it, and so on). High-volume batch jobs can leverage the framework in a +through optimization and partitioning techniques. You can use Spring Batch in both simple +use cases (such as reading a file into a database or running a stored procedure) and +complex, high volume use cases (such as moving high volumes of data between databases, +transforming it, and so on). High-volume batch jobs can use the framework in a highly scalable manner to process significant volumes of information. [[springBatchBackground]] - === Background While open source software projects and associated communities have focused greater @@ -68,13 +67,13 @@ environments. The lack of a standard, reusable batch architecture has resulted i proliferation of many one-off, in-house solutions developed within client enterprise IT functions. -SpringSource (now Pivotal) and Accenture collaborated to change this. Accenture's +SpringSource (now VMware) and Accenture collaborated to change this. Accenture's hands-on industry and technical experience in implementing batch architectures, SpringSource's depth of technical experience, and Spring's proven programming model together made a natural and powerful partnership to create high-quality, market-relevant software aimed at filling an important gap in enterprise Java. Both companies worked with a number of clients who were solving similar problems by developing Spring-based batch -architecture solutions. This provided some useful additional detail and real-life +architecture solutions. This input provided some useful additional detail and real-life constraints that helped to ensure the solution can be applied to the real-world problems posed by clients. @@ -82,16 +81,15 @@ Accenture contributed previously proprietary batch processing architecture frame the Spring Batch project, along with committer resources to drive support, enhancements, and the existing feature set. Accenture's contribution was based upon decades of experience in building batch architectures with the last several generations of -platforms: COBOL/Mainframe, C++/Unix, and now Java/anywhere. +platforms: COBOL on mainframes, C++ on Unix, and, now, Java anywhere. The collaborative effort between Accenture and SpringSource aimed to promote the -standardization of software processing approaches, frameworks, and tools that can be -consistently leveraged by enterprise users when creating batch applications. Companies +standardization of software processing approaches, frameworks, and tools +enterprise users can consistently use when creating batch applications. Companies and government agencies desiring to deliver standard, proven solutions to their enterprise IT environments can benefit from Spring Batch. [[springBatchUsageScenarios]] - === Usage Scenarios A typical batch program generally: @@ -105,463 +103,34 @@ similar transactions as a set, typically in an offline environment without any u interaction. Batch jobs are part of most IT projects, and Spring Batch is the only open source framework that provides a robust, enterprise-scale solution. -Business Scenarios +==== Business Scenarios -* Commit batch process periodically -* Concurrent batch processing: parallel processing of a job -* Staged, enterprise message-driven processing -* Massively parallel batch processing -* Manual or scheduled restart after failure -* Sequential processing of dependent steps (with extensions to workflow-driven batches) -* Partial processing: skip records (for example, on rollback) +Spring Batch supports the following business scenarios: + +* Commit batch process periodically. +* Concurrent batch processing: parallel processing of a job. +* Staged, enterprise message-driven processing. +* Massively parallel batch processing. +* Manual or scheduled restart after failure. +* Sequential processing of dependent steps (with extensions to workflow-driven batches). +* Partial processing: skip records (for example, on rollback). * Whole-batch transaction, for cases with a small batch size or existing stored -procedures/scripts +procedures or scripts. + +==== Technical Objectives -Technical Objectives +Spring Batch has the following technical objectives: -* Batch developers use the Spring programming model: Concentrate on business logic and -let the framework take care of infrastructure. -* Clear separation of concerns between the infrastructure, the batch execution +* Let batch developers use the Spring programming model: Concentrate on business logic and +let the framework take care of the infrastructure. +* Provide clear separation of concerns between the infrastructure, the batch execution environment, and the batch application. * Provide common, core execution services as interfaces that all projects can implement. * Provide simple and default implementations of the core execution interfaces that can be -used 'out of the box'. -* Easy to configure, customize, and extend services, by leveraging the spring framework +used "`out of the box`". +* Make it easy to configure, customize, and extend services, by using the Spring framework in all layers. * All existing core services should be easy to replace or extend, without any impact to the infrastructure layer. * Provide a simple deployment model, with the architecture JARs completely separate from -the application, built using Maven. - -[[springBatchArchitecture]] -=== Spring Batch Architecture -// TODO Make a separate document -Spring Batch is designed with extensibility and a diverse group of end users in mind. The -figure below shows the layered architecture that supports the extensibility and ease of -use for end-user developers. - -.Spring Batch Layered Architecture -image::{batch-asciidoc}images/spring-batch-layers.png[Figure 1.1: Spring Batch Layered Architecture, scaledwidth="60%"] - -This layered architecture highlights three major high-level components: Application, -Core, and Infrastructure. The application contains all batch jobs and custom code written -by developers using Spring Batch. The Batch Core contains the core runtime classes -necessary to launch and control a batch job. It includes implementations for -`JobLauncher`, `Job`, and `Step`. Both Application and Core are built on top of a common -infrastructure. This infrastructure contains common readers and writers and services -(such as the `RetryTemplate`), which are used both by application developers(readers and -writers, such as `ItemReader` and `ItemWriter`) and the core framework itself (retry, -which is its own library). - -[[batchArchitectureConsiderations]] -=== General Batch Principles and Guidelines - -The following key principles, guidelines, and general considerations should be considered -when building a batch solution. - -* Remember that a batch architecture typically affects on-line architecture and vice -versa. Design with both architectures and environments in mind using common building -blocks when possible. - -* Simplify as much as possible and avoid building complex logical structures in single -batch applications. - -* Keep the processing and storage of data physically close together (in other words, keep -your data where your processing occurs). - -* Minimize system resource use, especially I/O. Perform as many operations as possible in -internal memory. - -* Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O -is avoided. In particular, the following four common flaws need to be looked for: -** Reading data for every transaction when the data could be read once and cached or kept -in the working storage. -** Rereading data for a transaction where the data was read earlier in the same -transaction. -** Causing unnecessary table or index scans. -** Not specifying key values in the WHERE clause of an SQL statement. - -* Do not do things twice in a batch run. For instance, if you need data summarization for -reporting purposes, you should (if possible) increment stored totals when data is being -initially processed, so your reporting application does not have to reprocess the same -data. - -* Allocate enough memory at the beginning of a batch application to avoid time-consuming -reallocation during the process. - -* Always assume the worst with regard to data integrity. Insert adequate checks and -record validation to maintain data integrity. - -* Implement checksums for internal validation where possible. For example, flat files -should have a trailer record telling the total of records in the file and an aggregate of -the key fields. - -* Plan and execute stress tests as early as possible in a production-like environment -with realistic data volumes. - -* In large batch systems, backups can be challenging, especially if the system is running -concurrent with on-line on a 24-7 basis. Database backups are typically well taken care -of in the on-line design, but file backups should be considered to be just as important. -If the system depends on flat files, file backup procedures should not only be in place -and documented but be regularly tested as well. - -[[batchProcessingStrategy]] -=== Batch Processing Strategies - -To help design and implement batch systems, basic batch application building blocks and -patterns should be provided to the designers and programmers in the form of sample -structure charts and code shells. When starting to design a batch job, the business logic -should be decomposed into a series of steps that can be implemented using the following -standard building blocks: - -* __Conversion Applications:__ For each type of file supplied by or generated to an -external system, a conversion application must be created to convert the transaction -records supplied into a standard format required for processing. This type of batch -application can partly or entirely consist of translation utility modules (see Basic -Batch Services). -// TODO Add a link to "Basic Batch Services", once you discover where that content is. -* __Validation Applications:__ Validation applications ensure that all input/output -records are correct and consistent. Validation is typically based on file headers and -trailers, checksums and validation algorithms, and record level cross-checks. -* __Extract Applications:__ An application that reads a set of records from a database or -input file, selects records based on predefined rules, and writes the records to an -output file. -* __Extract/Update Applications:__ An application that reads records from a database or -an input file and makes changes to a database or an output file driven by the data found -in each input record. -* __Processing and Updating Applications:__ An application that performs processing on -input transactions from an extract or a validation application. The processing usually -involves reading a database to obtain data required for processing, potentially updating -the database and creating records for output processing. -* __Output/Format Applications:__ Applications that read an input file, restructure data -from this record according to a standard format, and produce an output file for printing -or transmission to another program or system. - -Additionally, a basic application shell should be provided for business logic that cannot -be built using the previously mentioned building blocks. -// TODO What is an example of such a system? - -In addition to the main building blocks, each application may use one or more of standard -utility steps, such as: - - -* Sort: A program that reads an input file and produces an output file where records -have been re-sequenced according to a sort key field in the records. Sorts are usually -performed by standard system utilities. -* Split: A program that reads a single input file and writes each record to one of -several output files based on a field value. Splits can be tailored or performed by -parameter-driven standard system utilities. -* Merge: A program that reads records from multiple input files and produces one output -file with combined data from the input files. Merges can be tailored or performed by -parameter-driven standard system utilities. - -Batch applications can additionally be categorized by their input source: - -* Database-driven applications are driven by rows or values retrieved from the database. -* File-driven applications are driven by records or values retrieved from a file. -* Message-driven applications are driven by messages retrieved from a message queue. - -The foundation of any batch system is the processing strategy. Factors affecting the -selection of the strategy include: estimated batch system volume, concurrency with -on-line systems or with other batch systems, available batch windows. (Note that, with -more enterprises wanting to be up and running 24x7, clear batch windows are -disappearing). - -Typical processing options for batch are (in increasing order of implementation -complexity): - -* Normal processing during a batch window in off-line mode. -* Concurrent batch or on-line processing. -* Parallel processing of many different batch runs or jobs at the same time. -* Partitioning (processing of many instances of the same job at the same time). -* A combination of the preceding options. - -Some or all of these options may be supported by a commercial scheduler. - -The following section discusses these processing options in more detail. It is important -to notice that, as a rule of thumb, the commit and locking strategy adopted by batch -processes depends on the type of processing performed and that the on-line locking -strategy should also use the same principles. Therefore, the batch architecture cannot be -simply an afterthought when designing an overall architecture. - -The locking strategy can be to use only normal database locks or to implement an -additional custom locking service in the architecture. The locking service would track -database locking (for example, by storing the necessary information in a dedicated -db-table) and give or deny permissions to the application programs requesting a db -operation. Retry logic could also be implemented by this architecture to avoid aborting a -batch job in case of a lock situation. - -*1. Normal processing in a batch window* For simple batch processes running in a separate -batch window where the data being updated is not required by on-line users or other batch -processes, concurrency is not an issue and a single commit can be done at the end of the -batch run. - -In most cases, a more robust approach is more appropriate. Keep in mind that batch -systems have a tendency to grow as time goes by, both in terms of complexity and the data -volumes they handle. If no locking strategy is in place and the system still relies on a -single commit point, modifying the batch programs can be painful. Therefore, even with -the simplest batch systems, consider the need for commit logic for restart-recovery -options as well as the information concerning the more complex cases described later in -this section. - -*2. Concurrent batch or on-line processing* Batch applications processing data that can -be simultaneously updated by on-line users should not lock any data (either in the -database or in files) which could be required by on-line users for more than a few -seconds. Also, updates should be committed to the database at the end of every few -transactions. This minimizes the portion of data that is unavailable to other processes -and the elapsed time the data is unavailable. - -Another option to minimize physical locking is to have logical row-level locking -implemented with either an Optimistic Locking Pattern or a Pessimistic Locking Pattern. - - -* Optimistic locking assumes a low likelihood of record contention. It typically means -inserting a timestamp column in each database table used concurrently by both batch and -on-line processing. When an application fetches a row for processing, it also fetches the -timestamp. As the application then tries to update the processed row, the update uses the -original timestamp in the WHERE clause. If the timestamp matches, the data and the -timestamp are updated. If the timestamp does not match, this indicates that another -application has updated the same row between the fetch and the update attempt. Therefore, -the update cannot be performed. - - -* Pessimistic locking is any locking strategy that assumes there is a high likelihood of -record contention and therefore either a physical or logical lock needs to be obtained at -retrieval time. One type of pessimistic logical locking uses a dedicated lock-column in -the database table. When an application retrieves the row for update, it sets a flag in -the lock column. With the flag in place, other applications attempting to retrieve the -same row logically fail. When the application that sets the flag updates the row, it also -clears the flag, enabling the row to be retrieved by other applications. Please note that -the integrity of data must be maintained also between the initial fetch and the setting -of the flag, for example by using db locks (such as `SELECT FOR UPDATE`). Note also that -this method suffers from the same downside as physical locking except that it is somewhat -easier to manage building a time-out mechanism that gets the lock released if the user -goes to lunch while the record is locked. - -These patterns are not necessarily suitable for batch processing, but they might be used -for concurrent batch and on-line processing (such as in cases where the database does not -support row-level locking). As a general rule, optimistic locking is more suitable for -on-line applications, while pessimistic locking is more suitable for batch applications. -Whenever logical locking is used, the same scheme must be used for all applications -accessing data entities protected by logical locks. - -Note that both of these solutions only address locking a single record. Often, we may -need to lock a logically related group of records. With physical locks, you have to -manage these very carefully in order to avoid potential deadlocks. With logical locks, it -is usually best to build a logical lock manager that understands the logical record -groups you want to protect and that can ensure that locks are coherent and -non-deadlocking. This logical lock manager usually uses its own tables for lock -management, contention reporting, time-out mechanism, and other concerns. - -*3. Parallel Processing* Parallel processing allows multiple batch runs or jobs to run in -parallel to minimize the total elapsed batch processing time. This is not a problem as -long as the jobs are not sharing the same files, db-tables, or index spaces. If they do, -this service should be implemented using partitioned data. Another option is to build an -architecture module for maintaining interdependencies by using a control table. A control -table should contain a row for each shared resource and whether it is in use by an -application or not. The batch architecture or the application in a parallel job would -then retrieve information from that table to determine if it can get access to the -resource it needs or not. - -If the data access is not a problem, parallel processing can be implemented through the -use of additional threads to process in parallel. In the mainframe environment, parallel -job classes have traditionally been used, in order to ensure adequate CPU time for all -the processes. Regardless, the solution has to be robust enough to ensure time slices for -all the running processes. - -Other key issues in parallel processing include load balancing and the availability of -general system resources such as files, database buffer pools, and so on. Also note that -the control table itself can easily become a critical resource. - -*4. Partitioning* Using partitioning allows multiple versions of large batch applications -to run concurrently. The purpose of this is to reduce the elapsed time required to -process long batch jobs. Processes that can be successfully partitioned are those where -the input file can be split and/or the main database tables partitioned to allow the -application to run against different sets of data. - -In addition, processes which are partitioned must be designed to only process their -assigned data set. A partitioning architecture has to be closely tied to the database -design and the database partitioning strategy. Note that database partitioning does not -necessarily mean physical partitioning of the database, although in most cases this is -advisable. The following picture illustrates the partitioning approach: - -.Partitioned Process -image::{batch-asciidoc}images/partitioned.png[Figure 1.2: Partitioned Process, scaledwidth="60%"] - - -The architecture should be flexible enough to allow dynamic configuration of the number -of partitions. Both automatic and user controlled configuration should be considered. -Automatic configuration may be based on parameters such as the input file size and the -number of input records. - -*4.1 Partitioning Approaches* Selecting a partitioning approach has to be done on a -case-by-case basis. The following list describes some of the possible partitioning -approaches: - -_1. Fixed and Even Break-Up of Record Set_ - -This involves breaking the input record set into an even number of portions (for example, -10, where each portion has exactly 1/10th of the entire record set). Each portion is then -processed by one instance of the batch/extract application. - -In order to use this approach, preprocessing is required to split the record set up. The -result of this split will be a lower and upper bound placement number which can be used -as input to the batch/extract application in order to restrict its processing to only its -portion. - -Preprocessing could be a large overhead, as it has to calculate and determine the bounds -of each portion of the record set. - -_2. Break up by a Key Column_ - -This involves breaking up the input record set by a key column, such as a location code, -and assigning data from each key to a batch instance. In order to achieve this, column -values can be either: - -* Assigned to a batch instance by a partitioning table (described later in this -section). - -* Assigned to a batch instance by a portion of the value (such as 0000-0999, 1000 - 1999, -and so on). - -Under option 1, adding new values means a manual reconfiguration of the batch/extract to -ensure that the new value is added to a particular instance. - -Under option 2, this ensures that all values are covered via an instance of the batch -job. However, the number of values processed by one instance is dependent on the -distribution of column values (there may be a large number of locations in the 0000-0999 -range, and few in the 1000-1999 range). Under this option, the data range should be -designed with partitioning in mind. - -Under both options, the optimal even distribution of records to batch instances cannot be -realized. There is no dynamic configuration of the number of batch instances used. - -_3. Breakup by Views_ - -This approach is basically breakup by a key column but on the database level. It involves -breaking up the record set into views. These views are used by each instance of the batch -application during its processing. The breakup is done by grouping the data. - -With this option, each instance of a batch application has to be configured to hit a -particular view (instead of the main table). Also, with the addition of new data -values, this new group of data has to be included into a view. There is no dynamic -configuration capability, as a change in the number of instances results in a change to -the views. - -_4. Addition of a Processing Indicator_ - -This involves the addition of a new column to the input table, which acts as an -indicator. As a preprocessing step, all indicators are marked as being non-processed. -During the record fetch stage of the batch application, records are read on the condition -that that record is marked as being non-processed, and once they are read (with lock), -they are marked as being in processing. When that record is completed, the indicator is -updated to either complete or error. Many instances of a batch application can be started -without a change, as the additional column ensures that a record is only processed once. -// TODO On completion, what is the record marked as? Same for on error. (I expected a -// sentence or two on the order of "On completion, indicators are marked as being -// complete.") - -With this option, I/O on the table increases dynamically. In the case of an updating -batch application, this impact is reduced, as a write must occur anyway. - -_5. Extract Table to a Flat File_ - -This involves the extraction of the table into a file. This file can then be split into -multiple segments and used as input to the batch instances. - -With this option, the additional overhead of extracting the table into a file and -splitting it may cancel out the effect of multi-partitioning. Dynamic configuration can -be achieved by changing the file splitting script. - -_6. Use of a Hashing Column_ - -This scheme involves the addition of a hash column (key/index) to the database tables -used to retrieve the driver record. This hash column has an indicator to determine which -instance of the batch application processes this particular row. For example, if there -are three batch instances to be started, then an indicator of 'A' marks a row for -processing by instance 1, an indicator of 'B' marks a row for processing by instance 2, -and an indicator of 'C' marks a row for processing by instance 3. - -The procedure used to retrieve the records would then have an additional `WHERE` clause -to select all rows marked by a particular indicator. The inserts in this table would -involve the addition of the marker field, which would be defaulted to one of the -instances (such as 'A'). - -A simple batch application would be used to update the indicators, such as to -redistribute the load between the different instances. When a sufficiently large number -of new rows have been added, this batch can be run (anytime, except in the batch window) -to redistribute the new rows to other instances. -// TODO Why not in the batch window? - -Additional instances of the batch application only require the running of the batch -application as described in the preceding paragraphs to redistribute the indicators to -work with a new number of instances. - -*4.2 Database and Application Design Principles* - -An architecture that supports multi-partitioned applications which run against -partitioned database tables using the key column approach should include a central -partition repository for storing partition parameters. This provides flexibility and -ensures maintainability. The repository generally consists of a single table, known as -the partition table. - -Information stored in the partition table is static and, in general, should be maintained -by the DBA. The table should consist of one row of information for each partition of a -multi-partitioned application. The table should have columns for Program ID Code, -Partition Number (logical ID of the partition), Low Value of the db key column for this -partition, and High Value of the db key column for this partition. - -On program start-up, the program `id` and partition number should be passed to the -application from the architecture (specifically, from the Control Processing Tasklet). If -a key column approach is used, these variables are used to read the partition table in -order to determine what range of data the application is to process. In addition the -partition number must be used throughout the processing to: - -* Add to the output files/database updates in order for the merge process to work -properly. -* Report normal processing to the batch log and any errors to the architecture error -handler. - -*4.3 Minimizing Deadlocks* - -When applications run in parallel or are partitioned, contention in database resources -and deadlocks may occur. It is critical that the database design team eliminates -potential contention situations as much as possible as part of the database design. - -Also, the developers must ensure that the database index tables are designed with -deadlock prevention and performance in mind. - -Deadlocks or hot spots often occur in administration or architecture tables, such as log -tables, control tables, and lock tables. The implications of these should be taken into -account as well. A realistic stress test is crucial for identifying the possible -bottlenecks in the architecture. - -To minimize the impact of conflicts on data, the architecture should provide services -such as wait-and-retry intervals when attaching to a database or when encountering a -deadlock. This means a built-in mechanism to react to certain database return codes and, -instead of issuing an immediate error, waiting a predetermined amount of time and -retrying the database operation. - -*4.4 Parameter Passing and Validation* - -The partition architecture should be relatively transparent to application developers. -The architecture should perform all tasks associated with running the application in a -partitioned mode, including: - -* Retrieving partition parameters before application start-up. -* Validating partition parameters before application start-up. -* Passing parameters to the application at start-up. - -The validation should include checks to ensure that: - -* The application has sufficient partitions to cover the whole data range. -* There are no gaps between partitions. - -If the database is partitioned, some additional validation may be necessary to ensure -that a single partition does not span database partitions. - -Also, the architecture should take into consideration the consolidation of partitions. -Key questions include: - -* Must all the partitions be finished before going into the next job step? -* What happens if one of the partitions aborts? +the application, built by using Maven. diff --git a/spring-batch-docs/src/main/asciidoc/step.adoc b/spring-batch-docs/src/main/asciidoc/step.adoc index 093ef74be6..fd44627218 100644 --- a/spring-batch-docs/src/main/asciidoc/step.adoc +++ b/spring-batch-docs/src/main/asciidoc/step.adoc @@ -17,7 +17,7 @@ processing. This is a necessarily vague description because the contents of any or complex as the developer desires. A simple `Step` might load data from a file into the database, requiring little or no code (depending upon the implementations used). A more complex `Step` might have complicated business rules that are applied as part of the -processing, as shown in the following image: +processing, as the following image shows: .Step image::{batch-asciidoc}images/step.png[Step, scaledwidth="60%"] @@ -25,7 +25,7 @@ image::{batch-asciidoc}images/step.png[Step, scaledwidth="60%"] [[chunkOrientedProcessing]] === Chunk-oriented Processing -Spring Batch uses a 'Chunk-oriented' processing style within its most common +Spring Batch uses a "`chunk-oriented`" processing style in its most common implementation. Chunk oriented processing refers to reading the data one at a time and creating 'chunks' that are written out within a transaction boundary. Once the number of items read equals the commit interval, the entire chunk is written out by the @@ -49,7 +49,7 @@ for(int i = 0; i < commitInterval; i++){ itemWriter.write(items); ---- -A chunk-oriented step can also be configured with an optional `ItemProcessor` +You can also configure a chunk-oriented step with an optional `ItemProcessor` to process items before passing them to the `ItemWriter`. The following image shows the process when an `ItemProcessor` is registered in the step: @@ -79,20 +79,21 @@ for(Object item: items){ itemWriter.write(processedItems); ---- -For more details about item processors and their use cases, please refer to the +For more details about item processors and their use cases, see the <> section. [[configuringAStep]] -==== Configuring a `Step` +==== Configuring a Step Despite the relatively short list of required dependencies for a `Step`, it is an extremely complex class that can potentially contain many collaborators. [role="xmlContent"] -In order to ease configuration, the Spring Batch XML namespace can be used, as shown in -the following example: +To ease configuration, you can use the Spring Batch XML namespace, as +the following example shows: .XML Configuration +==== [source, xml, role="xmlContent"] ---- @@ -103,12 +104,14 @@ the following example: ---- +==== [role="javaContent"] -When using Java configuration, the Spring Batch builders can be used, as shown in the -following example: +When using Java configuration, you can use the Spring Batch builders, as the +following example shows: .Java Configuration +==== [source, java, role="javaContent"] ---- /** @@ -137,9 +140,10 @@ public Step sampleStep(PlatformTransactionManager transactionManager) { .build(); } ---- +==== ifdef::backend-html5[] -The configuration above includes the only required dependencies to create a item-oriented +The preceding configuration includes the only required dependencies to create a item-oriented step: * `reader`: The `ItemReader` that provides items for processing. @@ -157,7 +161,7 @@ transactions during processing. * `job-repository`: The XML-specific name of the `JobRepository` that periodically stores the `StepExecution` and `ExecutionContext` during processing (just before committing). For an in-line `` (one defined within a ``), it is an attribute on the `` -element. For a standalone ``, it is defined as an attribute of the . +element. For a standalone ``, it is defined as an attribute of the ``. [role="javaContent"] * `repository`: The Java-specific name of the `JobRepository` that periodically stores @@ -173,35 +177,35 @@ item-based step and the number of items to be processed before the transaction i committed. [role="xmlContent"] -It should be noted that `job-repository` defaults to `jobRepository` and +Note that `job-repository` defaults to `jobRepository` and `transaction-manager` defaults to `transactionManager`. Also, the `ItemProcessor` is optional, since the item could be directly passed from the reader to the writer. [role="javaContent"] -It should be noted that `repository` defaults to `jobRepository` and `transactionManager` +Note that `repository` defaults to `jobRepository` and `transactionManager` defaults to `transactionManager` (all provided through the infrastructure from `@EnableBatchProcessing`). Also, the `ItemProcessor` is optional, since the item could be directly passed from the reader to the writer. endif::backend-html5[] ifdef::backend-pdf[] -The configuration above includes the only required dependencies to create a item-oriented +The preceding configuration above the only required dependencies to create a item-oriented step: * `reader`: The `ItemReader` that provides items for processing. * `writer`: The `ItemWriter` that processes the items provided by the `ItemReader`. -* `transaction-manager`/`transactionManager`: Spring's `PlatformTransactionManager` that +* `transaction-manager` (XML)/`transactionManager` (Java): Spring's `PlatformTransactionManager` that begins and commits transactions during processing. -* `job-repository`/`repository`: The `JobRepository` that periodically stores the +* `job-repository` (XML)/`repository` (Java): The `JobRepository` that periodically stores the `StepExecution` and `ExecutionContext` during processing (just before committing). In XML, for an in-line (one defined within a ``), it is an attribute on the `` element. For a standalone step, it is defined as an attribute of the ``. -* `commit-interval`/`chunk`: The number of items to be processed before the transaction +* `commit-interval` (XML)/`chunk` (Java): The number of items to be processed before the transaction is committed. -It should be noted that `job-repository`/`repository` defaults to `jobRepository` and -`transaction-manager`/`transactionManager` defaults to `transactionManager`. Also, the +Note that `job-repository` (XML)/`repository` (Java) defaults to `jobRepository` and +`transaction-manager` (XML)/`transactionManager` (Java) defaults to `transactionManager`. Also, the `ItemProcessor` is optional, since the item could be directly passed from the reader to the writer. endif::backend-pdf[] @@ -212,16 +216,17 @@ endif::backend-pdf[] [role="xmlContent"] If a group of `Steps` share similar configurations, then it may be helpful to define a -"parent" `Step` from which the concrete `Steps` may inherit properties. Similar to class -inheritance in Java, the "child" `Step` combines its elements and attributes with the +"`parent`" `Step` from which the concrete `Steps` may inherit properties. Similar to class +inheritance in Java, the "`child`" `Step` combines its elements and attributes with the parent's. The child also overrides any of the parent's `Steps`. [role="xmlContent"] -In the following example, the `Step`, "concreteStep1", inherits from "parentStep". It is -instantiated with 'itemReader', 'itemProcessor', 'itemWriter', `startLimit=5`, and -`allowStartIfComplete=true`. Additionally, the `commitInterval` is '5', since it is -overridden by the "concreteStep1" `Step`, as shown in the following example: +In the following example, the `Step`, `concreteStep1`, inherits from `parentStep`. It is +instantiated with `itemReader`, `itemProcessor`, `itemWriter`, `startLimit=5`, and +`allowStartIfComplete=true`. Additionally, the `commitInterval` is `5`, since it is +overridden by the `concreteStep1` `Step`, as the following example shows: +==== [source, xml, role="xmlContent"] ---- @@ -236,6 +241,7 @@ overridden by the "concreteStep1" `Step`, as shown in the following example: ---- +==== [role="xmlContent"] The `id` attribute is still required on the step within the job element. This is for two @@ -245,8 +251,8 @@ reasons: standalone step is referenced in more than one step in the job, an error occurs. [role="xmlContent"] -* When creating job flows, as described later in this chapter, the `next` attribute -should be referring to the step in the flow, not the standalone step. +* When creating job flows, as described <>, the `next` attribute +should refer to the step in the flow, not the standalone step. [[abstractStep]] [role="xmlContent"] @@ -256,13 +262,13 @@ should be referring to the step in the flow, not the standalone step. Sometimes, it may be necessary to define a parent `Step` that is not a complete `Step` configuration. If, for instance, the `reader`, `writer`, and `tasklet` attributes are left off of a `Step` configuration, then initialization fails. If a parent must be -defined without these properties, then the `abstract` attribute should be used. An +defined without one or more of these properties, the `abstract` attribute should be used. An `abstract` `Step` is only extended, never instantiated. [role="xmlContent"] -In the following example, the `Step` `abstractParentStep` would not be instantiated if it -were not declared to be abstract. The `Step`, "concreteStep2", has 'itemReader', -'itemWriter', and commit-interval=10. +In the following example, the `Step` (`abstractParentStep`) would not be instantiated if it +were not declared to be abstract. The `Step`, (`concreteStep2`) has `itemReader`, +`itemWriter`, and `commit-interval=10`. [source, xml, role="xmlContent"] ---- @@ -285,8 +291,8 @@ were not declared to be abstract. The `Step`, "concreteStep2", has 'itemReader', [role="xmlContent"] Some of the configurable elements on `Steps` are lists, such as the `` element. -If both the parent and child `Steps` declare a `` element, then the -child's list overrides the parent's. In order to allow a child to add additional +If both the parent and child `Steps` declare a `` element, the +child's list overrides the parent's. To allow a child to add additional listeners to the list defined by the parent, every list element has a `merge` attribute. If the element specifies that `merge="true"`, then the child's list is combined with the parent's instead of overriding it. @@ -317,13 +323,12 @@ In the following example, the `Step` "concreteStep3", is created with two listen ==== The Commit Interval As mentioned previously, a step reads in and writes out items, periodically committing -using the supplied `PlatformTransactionManager`. With a `commit-interval` of 1, it +by using the supplied `PlatformTransactionManager`. With a `commit-interval` of 1, it commits after writing each individual item. This is less than ideal in many situations, since beginning and committing a transaction is expensive. Ideally, it is preferable to process as many items as possible in each transaction, which is completely dependent upon the type of data being processed and the resources with which the step is interacting. -For this reason, the number of items that are processed within a commit can be -configured. +For this reason, you can configure the number of items that are processed within a commit. [role="xmlContent"] The following example shows a `step` whose `tasklet` has a `commit-interval` @@ -373,18 +378,18 @@ is passed to the `ItemWriter`, and the transaction is committed. [[stepRestart]] ==== Configuring a `Step` for Restart -In the "<>" section , restarting a +In the "`<>`" section , restarting a `Job` was discussed. Restart has numerous impacts on steps, and, consequently, may require some specific configuration. [[startLimit]] ===== Setting a Start Limit -There are many scenarios where you may want to control the number of times a `Step` may -be started. For example, a particular `Step` might need to be configured so that it only -runs once because it invalidates some resource that must be fixed manually before it can +There are many scenarios where you may want to control the number of times a `Step` can +be started. For example, you might need to configure a particular `Step` might so that it +runs only once because it invalidates some resource that must be fixed manually before it can be run again. This is configurable on the step level, since different steps may have -different requirements. A `Step` that may only be executed once can exist as part of the +different requirements. A `Step` that can be executed only once can exist as part of the same `Job` as a `Step` that can be run infinitely. [role="xmlContent"] @@ -427,9 +432,9 @@ start-limit is `Integer.MAX_VALUE`. In the case of a restartable job, there may be one or more steps that should always be run, regardless of whether or not they were successful the first time. An example might be a validation step or a `Step` that cleans up resources before processing. During -normal processing of a restarted job, any step with a status of 'COMPLETED', meaning it -has already been completed successfully, is skipped. Setting `allow-start-if-complete` to -"true" overrides this so that the step always runs. +normal processing of a restarted job, any step with a status of `COMPLETED` (meaning it +has already been completed successfully), is skipped. Setting `allow-start-if-complete` to +`true` overrides this so that the step always runs. [role="xmlContent"] The following code fragment shows how to define a restartable job in XML: @@ -545,42 +550,39 @@ games and summarizes them. It contains three steps: `playerLoad`, `gameLoad`, an while the `gameLoad` step does the same for games. The final step, `playerSummarization`, then summarizes the statistics for each player, based upon the provided games. It is assumed that the file loaded by `playerLoad` must be loaded only -once, but that `gameLoad` can load any games found within a particular directory, +once but that `gameLoad` can load any games found within a particular directory, deleting them after they have been successfully loaded into the database. As a result, the `playerLoad` step contains no additional configuration. It can be started any number -of times, and, if complete, is skipped. The `gameLoad` step, however, needs to be run +of times is skipped if complete. The `gameLoad` step, however, needs to be run every time in case extra files have been added since it last ran. It has -'allow-start-if-complete' set to 'true' in order to always be started. (It is assumed -that the database table games are loaded into has a process indicator on it, to ensure +`allow-start-if-complete` set to `true` to always be started. (It is assumed +that the database table that games are loaded into has a process indicator on it, to ensure new games can be properly found by the summarization step). The summarization step, which is the most important in the job, is configured to have a start limit of 2. This -is useful because if the step continually fails, a new exit code is returned to the +is useful because, if the step continually fails, a new exit code is returned to the operators that control job execution, and it can not start again until manual intervention has taken place. -[NOTE] -==== -This job provides an example for this document and is not the same as the `footballJob` +NOTE: This job provides an example for this document and is not the same as the `footballJob` found in the samples project. -==== The remainder of this section describes what happens for each of the three runs of the `footballJob` example. Run 1: -. `playerLoad` runs and completes successfully, adding 400 players to the 'PLAYERS' +. `playerLoad` runs and completes successfully, adding 400 players to the `PLAYERS` table. . `gameLoad` runs and processes 11 files worth of game data, loading their contents -into the 'GAMES' table. +into the `GAMES` table. . `playerSummarization` begins processing and fails after 5 minutes. Run 2: . `playerLoad` does not run, since it has already completed successfully, and -`allow-start-if-complete` is 'false' (the default). +`allow-start-if-complete` is `false` (the default). . `gameLoad` runs again and processes another 2 files, loading their contents into the -'GAMES' table as well (with a process indicator indicating they have yet to be +`GAMES` table as well (with a process indicator indicating they have yet to be processed). . `playerSummarization` begins processing of all remaining game data (filtering using the process indicator) and fails again after 30 minutes. @@ -588,9 +590,9 @@ process indicator) and fails again after 30 minutes. Run 3: . `playerLoad` does not run, since it has already completed successfully, and -`allow-start-if-complete` is 'false' (the default). +`allow-start-if-complete` is `false` (the default). . `gameLoad` runs again and processes another 2 files, loading their contents into the -'GAMES' table as well (with a process indicator indicating they have yet to be +`GAMES` table as well (with a process indicator indicating they have yet to be processed). . `playerSummarization` is not started and the job is immediately killed, since this is the third execution of `playerSummarization`, and its limit is only 2. Either the limit @@ -600,12 +602,12 @@ must be raised or the `Job` must be executed as a new `JobInstance`. ==== Configuring Skip Logic There are many scenarios where errors encountered while processing should not result in -`Step` failure, but should be skipped instead. This is usually a decision that must be +`Step` failure but should be skipped instead. This is usually a decision that must be made by someone who understands the data itself and what meaning it has. Financial data, for example, may not be skippable because it results in money being transferred, which needs to be completely accurate. Loading a list of vendors, on the other hand, might allow for skips. If a vendor is not loaded because it was formatted incorrectly or was -missing necessary information, then there probably are not issues. Usually, these bad +missing necessary information, there probably are not issues. Usually, these bad records are logged as well, which is covered later when discussing listeners. [role="xmlContent"] @@ -645,11 +647,10 @@ public Step step1() { } ---- - In the preceding example, a `FlatFileItemReader` is used. If, at any point, a `FlatFileParseException` is thrown, the item is skipped and counted against the total skip limit of 10. Exceptions (and their subclasses) that are declared might be thrown -during any phase of the chunk processing (read, process, write) but separate counts +during any phase of the chunk processing (read, process, or write). Separate counts are made of skips on read, process, and write inside the step execution, but the limit applies across all skips. Once the skip limit is reached, the next exception found causes the step to fail. In other words, the eleventh @@ -700,10 +701,10 @@ public Step step1() { ---- By identifying `java.lang.Exception` as a skippable exception class, the configuration -indicates that all `Exceptions` are skippable. However, by 'excluding' +indicates that all `Exceptions` are skippable. However, by "`excluding`" `java.io.FileNotFoundException`, the configuration refines the list of skippable exception classes to be all `Exceptions` __except__ `FileNotFoundException`. Any excluded -exception classes is fatal if encountered (that is, they are not skipped). +exception class is fatal if encountered (that is, they are not skipped). For any exception encountered, the skippability is determined by the nearest superclass in the class hierarchy. Any unclassified exception is treated as 'fatal'. @@ -717,7 +718,7 @@ The order of the `skip` and `noSkip` method calls does not matter. endif::backend-html5[] ifdef::backend-pdf[] -The order of specifying include versus exclude (by using either the XML tags or `skip` and +The order of specifying include versus exclude (by using either the XML tags or the `skip` and `noSkip` method calls) does not matter. endif::backend-pdf[] @@ -727,9 +728,9 @@ endif::backend-pdf[] In most cases, you want an exception to cause either a skip or a `Step` failure. However, not all exceptions are deterministic. If a `FlatFileParseException` is encountered while reading, it is always thrown for that record. Resetting the `ItemReader` does not help. -However, for other exceptions, such as a `DeadlockLoserDataAccessException`, which +However, for other exceptions (such as a `DeadlockLoserDataAccessException`, which indicates that the current process has attempted to update a record that another process -holds a lock on. Waiting and trying again might result in success. +holds a lock on), waiting and trying again might result in success. [role="xmlContent"] In XML, retry should be configured as follows: @@ -767,7 +768,7 @@ public Step step1() { ---- The `Step` allows a limit for the number of times an individual item can be retried and a -list of exceptions that are 'retryable'. More details on how retry works can be found in +list of exceptions that are "`retryable`". You can find more details on how retry works in <>. [[controllingRollback]] @@ -778,7 +779,7 @@ cause the transaction controlled by the `Step` to rollback. If skip is configure described earlier, exceptions thrown from the `ItemReader` do not cause a rollback. However, there are many scenarios in which exceptions thrown from the `ItemWriter` should not cause a rollback, because no action has taken place to invalidate the transaction. -For this reason, the `Step` can be configured with a list of exceptions that should not +For this reason, you can configure the `Step` with a list of exceptions that should not cause rollback. [role="xmlContent"] @@ -818,16 +819,16 @@ public Step step1() { [[transactionalReaders]] ===== Transactional Readers -The basic contract of the `ItemReader` is that it is forward only. The step buffers -reader input, so that in the case of a rollback, the items do not need to be re-read +The basic contract of the `ItemReader` is that it is forward-only. The step buffers +reader input so that, in case of a rollback, the items do not need to be re-read from the reader. However, there are certain scenarios in which the reader is built on top of a transactional resource, such as a JMS queue. In this case, since the queue is tied to the transaction that is rolled back, the messages that have been pulled from the -queue are put back on. For this reason, the step can be configured to not buffer the +queue are put back on. For this reason, you can configure the step to not buffer the items. [role="xmlContent"] -The following example shows how to create reader that does not buffer items in XML: +The following example shows how to create a reader that does not buffer items in XML: .XML Configuration [source, xml, role="xmlContent"] @@ -841,7 +842,7 @@ The following example shows how to create reader that does not buffer items in X ---- [role="javaContent"] -The following example shows how to create reader that does not buffer items in Java: +The following example shows how to create a reader that does not buffer items in Java: .Java Configuration [source, java, role="javaContent"] @@ -860,8 +861,8 @@ public Step step1() { [[transactionAttributes]] ==== Transaction Attributes -Transaction attributes can be used to control the `isolation`, `propagation`, and -`timeout` settings. More information on setting transaction attributes can be found in +You can use transaction attributes to control the `isolation`, `propagation`, and +`timeout` settings. You can find more information on setting transaction attributes in the https://docs.spring.io/spring/docs/current/spring-framework-reference/data-access.html#transaction[Spring core documentation]. @@ -910,16 +911,16 @@ public Step step1() { ==== Registering `ItemStream` with a `Step` The step has to take care of `ItemStream` callbacks at the necessary points in its -lifecycle (For more information on the `ItemStream` interface, see +lifecycle. (For more information on the `ItemStream` interface, see <>). This is vital if a step fails and might need to be restarted, because the `ItemStream` interface is where the step gets the information it needs about persistent state between executions. If the `ItemReader`, `ItemProcessor`, or `ItemWriter` itself implements the `ItemStream` -interface, then these are registered automatically. Any other streams need to be +interface, these are registered automatically. Any other streams need to be registered separately. This is often the case where indirect dependencies, such as -delegates, are injected into the reader and writer. A stream can be registered on the -`step` through the 'stream' element. +delegates, are injected into the reader and writer. You can register a stream on the +`step` through the `stream` element. [role="xmlContent"] The following example shows how to register a `stream` on a `step` in XML: @@ -984,9 +985,9 @@ public CompositeItemWriter compositeItemWriter() { } ---- -In the example above, the `CompositeItemWriter` is not an `ItemStream`, but both of its +In the preceding example, the `CompositeItemWriter` is not an `ItemStream`, but both of its delegates are. Therefore, both delegate writers must be explicitly registered as streams -in order for the framework to handle them correctly. The `ItemReader` does not need to be +for the framework to handle them correctly. The `ItemReader` does not need to be explicitly registered as a stream because it is a direct property of the `Step`. The step is now restartable, and the state of the reader and writer is correctly persisted in the event of a failure. @@ -995,17 +996,17 @@ event of a failure. ==== Intercepting `Step` Execution Just as with the `Job`, there are many events during the execution of a `Step` where a -user may need to perform some functionality. For example, in order to write out to a flat +user may need to perform some functionality. For example, to write out to a flat file that requires a footer, the `ItemWriter` needs to be notified when the `Step` has -been completed, so that the footer can be written. This can be accomplished with one of many +been completed so that the footer can be written. This can be accomplished with one of many `Step` scoped listeners. -Any class that implements one of the extensions of `StepListener` (but not that interface -itself since it is empty) can be applied to a step through the `listeners` element. -The `listeners` element is valid inside a step, tasklet, or chunk declaration. It is -recommended that you declare the listeners at the level at which its function applies, +You can apply any class that implements one of the extensions of `StepListener` (but not that interface +itself, since it is empty) to a step through the `listeners` element. +The `listeners` element is valid inside a step, tasklet, or chunk declaration. We +recommend that you declare the listeners at the level at which its function applies or, if it is multi-featured (such as `StepExecutionListener` and `ItemReadListener`), -then declare it at the most granular level where it applies. +declare it at the most granular level where it applies. [role="xmlContent"] The following example shows a listener applied at the chunk level in XML: @@ -1040,17 +1041,17 @@ public Step step1() { } ---- -An `ItemReader`, `ItemWriter` or `ItemProcessor` that itself implements one of the +An `ItemReader`, `ItemWriter`, or `ItemProcessor` that itself implements one of the `StepListener` interfaces is registered automatically with the `Step` if using the namespace `` element or one of the `*StepFactoryBean` factories. This only applies to components directly injected into the `Step`. If the listener is nested inside -another component, it needs to be explicitly registered (as described previously under +another component, you need to explicitly register it (as described previously under <>). In addition to the `StepListener` interfaces, annotations are provided to address the same concerns. Plain old Java objects can have methods with these annotations that are then converted into the corresponding `StepListener` type. It is also common to annotate -custom implementations of chunk components such as `ItemReader` or `ItemWriter` or +custom implementations of chunk components, such as `ItemReader` or `ItemWriter` or `Tasklet`. The annotations are analyzed by the XML parser for the `` elements as well as registered with the `listener` methods in the builders, so all you need to do is use the XML namespace or builders to register the listeners with a step. @@ -1060,7 +1061,7 @@ is use the XML namespace or builders to register the listeners with a step. `StepExecutionListener` represents the most generic listener for `Step` execution. It allows for notification before a `Step` is started and after it ends, whether it ended -normally or failed, as shown in the following example: +normally or failed, as the following example shows: [source, java] ---- @@ -1073,7 +1074,7 @@ public interface StepExecutionListener extends StepListener { } ---- -`ExitStatus` is the return type of `afterStep` in order to allow listeners the chance to +`ExitStatus` has a return type of `afterStep`, to give listeners the chance to modify the exit code that is returned upon completion of a `Step`. The annotations corresponding to this interface are: @@ -1084,10 +1085,10 @@ The annotations corresponding to this interface are: [[chunkListener]] ===== `ChunkListener` -A chunk is defined as the items processed within the scope of a transaction. Committing a -transaction, at each commit interval, commits a 'chunk'. A `ChunkListener` can be used to +A "`chunk`" is defined as the items processed within the scope of a transaction. Committing a +transaction, at each commit interval, commits a chunk. You can use a `ChunkListener` to perform logic before a chunk begins processing or after a chunk has completed -successfully, as shown in the following interface definition: +successfully, as the following interface definition shows: [source, java] ---- @@ -1100,9 +1101,9 @@ public interface ChunkListener extends StepListener { } ---- -The beforeChunk method is called after the transaction is started but before read is -called on the `ItemReader`. Conversely, `afterChunk` is called after the chunk has been -committed (and not at all if there is a rollback). +The beforeChunk method is called after the transaction is started but before reading begins +on the `ItemReader`. Conversely, `afterChunk` is called after the chunk has been +committed (or not at all if there is a rollback). The annotations corresponding to this interface are: @@ -1110,7 +1111,7 @@ The annotations corresponding to this interface are: * `@AfterChunk` * `@AfterChunkError` -A `ChunkListener` can be applied when there is no chunk declaration. The `TaskletStep` is +You can apply a `ChunkListener` when there is no chunk declaration. The `TaskletStep` is responsible for calling the `ChunkListener`, so it applies to a non-item-oriented tasklet as well (it is called before and after the tasklet). @@ -1118,9 +1119,9 @@ as well (it is called before and after the tasklet). ===== `ItemReadListener` When discussing skip logic previously, it was mentioned that it may be beneficial to log -the skipped records, so that they can be dealt with later. In the case of read errors, -this can be done with an `ItemReaderListener`, as shown in the following interface -definition: +the skipped records so that they can be dealt with later. In the case of read errors, +this can be done with an `ItemReaderListener`, as the following interface +definition shows: [source, java] ---- @@ -1147,8 +1148,8 @@ The annotations corresponding to this interface are: [[itemProcessListener]] ===== `ItemProcessListener` -Just as with the `ItemReadListener`, the processing of an item can be 'listened' to, as -shown in the following interface definition: +As with the `ItemReadListener`, the processing of an item can be "`listened`" to, as +the following interface definition shows: [source, java] ---- @@ -1176,8 +1177,8 @@ The annotations corresponding to this interface are: [[itemWriteListener]] ===== `ItemWriteListener` -The writing of an item can be 'listened' to with the `ItemWriteListener`, as shown in the -following interface definition: +You can "`listen`" to the writing of an item with the `ItemWriteListener`, as the +following interface definition shows: [source, java] ---- @@ -1209,7 +1210,7 @@ The annotations corresponding to this interface are: for being notified of errors, but none informs you that a record has actually been skipped. `onWriteError`, for example, is called even if an item is retried and successful. For this reason, there is a separate interface for tracking skipped items, as -shown in the following interface definition: +the following interface definition shows: [source, java] ---- @@ -1239,12 +1240,12 @@ The annotations corresponding to this interface are: One of the most common use cases for a `SkipListener` is to log out a skipped item, so that another batch process or even human process can be used to evaluate and fix the -issue leading to the skip. Because there are many cases in which the original transaction +issue that leads to the skip. Because there are many cases in which the original transaction may be rolled back, Spring Batch makes two guarantees: -. The appropriate skip method (depending on when the error happened) is called only once +* The appropriate skip method (depending on when the error happened) is called only once per item. -. The `SkipListener` is always called just before the transaction is committed. This is +* The `SkipListener` is always called just before the transaction is committed. This is to ensure that any transactional resources call by the listener are not rolled back by a failure within the `ItemWriter`. @@ -1252,21 +1253,21 @@ failure within the `ItemWriter`. === `TaskletStep` <> is not the only way to process in a -`Step`. What if a `Step` must consist of a simple stored procedure call? You could +`Step`. What if a `Step` must consist of a stored procedure call? You could implement the call as an `ItemReader` and return null after the procedure finishes. However, doing so is a bit unnatural, since there would need to be a no-op `ItemWriter`. Spring Batch provides the `TaskletStep` for this scenario. -`Tasklet` is a simple interface that has one method, `execute`, which is called +The `Tasklet` interface has one method, `execute`, which is called repeatedly by the `TaskletStep` until it either returns `RepeatStatus.FINISHED` or throws an exception to signal a failure. Each call to a `Tasklet` is wrapped in a transaction. -`Tasklet` implementors might call a stored procedure, a script, or a simple SQL update +`Tasklet` implementors might call a stored procedure, a script, or a SQL update statement. ifdef::backend-html5[] [role="xmlContent"] -To create a `TaskletStep` in XML, the 'ref' attribute of the `` element should -reference a bean that defines a `Tasklet` object. No `` element should be used +To create a `TaskletStep` in XML, the `ref` attribute of the `` element should +reference a bean that defines a `Tasklet` object. No `` element should be used within the ``. The following example shows a simple tasklet: [source, xml, role="xmlContent"] @@ -1294,8 +1295,8 @@ endif::backend-html5[] ifdef::backend-pdf[] To create a `TaskletStep` the bean associated with the step (through the `ref` attribute -when using the namespace or passed to the `tasklet` method when using java config), -should be a bean that implements the interface `Tasklet`. The following example shows a +when using the namespace or passed to the `tasklet` method when using Java configuration) +should implement the `Tasklet` interface. The following example shows a simple `tasklet`: .XML Configuration @@ -1318,12 +1319,7 @@ public Step step1() { ---- endif::backend-pdf[] -[NOTE] -==== -`TaskletStep` automatically registers the - tasklet as a `StepListener` if it implements the `StepListener` - interface. -==== +NOTE: If it implements the `StepListener` interface, `TaskletStep` automatically registers the tasklet as a `StepListener`. [[taskletAdapter]] ==== `TaskletAdapter` @@ -1331,7 +1327,7 @@ endif::backend-pdf[] As with other adapters for the `ItemReader` and `ItemWriter` interfaces, the `Tasklet` interface contains an implementation that allows for adapting itself to any pre-existing class: `TaskletAdapter`. An example where this may be useful is an existing DAO that is -used to update a flag on a set of records. The `TaskletAdapter` can be used to call this +used to update a flag on a set of records. You can use the `TaskletAdapter` to call this class without having to write an adapter for the `Tasklet` interface. [role="xmlContent"] @@ -1368,8 +1364,8 @@ public MethodInvokingTaskletAdapter myTasklet() { [[exampleTaskletImplementation]] ==== Example `Tasklet` Implementation -Many batch jobs contain steps that must be done before the main processing begins in -order to set up various resources or after processing has completed to cleanup those +Many batch jobs contain steps that must be done before the main processing begins, +to set up various resources or after processing has completed to cleanup those resources. In the case of a job that works heavily with files, it is often necessary to delete certain files locally after they have been uploaded successfully to another location. The following example (taken from the @@ -1469,21 +1465,21 @@ public FileDeletingTasklet fileDeletingTasklet() { === Controlling Step Flow With the ability to group steps together within an owning job comes the need to be able -to control how the job "flows" from one step to another. The failure of a `Step` does not +to control how the job "`flows`" from one step to another. The failure of a `Step` does not necessarily mean that the `Job` should fail. Furthermore, there may be more than one type -of 'success' that determines which `Step` should be executed next. Depending upon how a +of "`success`" that determines which `Step` should be executed next. Depending upon how a group of `Steps` is configured, certain steps may not even be processed at all. [[SequentialFlow]] ==== Sequential Flow -The simplest flow scenario is a job where all of the steps execute sequentially, as shown -in the following image: +The simplest flow scenario is a job where all of the steps execute sequentially, as +the following image shows: .Sequential Flow image::{batch-asciidoc}images/sequential-flow.png[Sequential Flow, scaledwidth="60%"] -This can be achieved by using the 'next' in a `step`. +This can be achieved by using `next` in a `step`. [role="xmlContent"] The following example shows how to use the `next` attribute in XML: @@ -1514,25 +1510,22 @@ public Job job() { } ---- -In the scenario above, 'step A' runs first because it is the first `Step` listed. If -'step A' completes normally, then 'step B' runs, and so on. However, if 'step A' fails, -then the entire `Job` fails and 'step B' does not execute. +In the scenario above, `stepA` runs first because it is the first `Step` listed. If +`stepA` completes normally, `stepB` runs, and so on. However, if `step A` fails, +the entire `Job` fails and `stepB` does not execute. [role="xmlContent"] -[NOTE] -==== -With the Spring Batch XML namespace, the first step listed in the configuration is +NOTE: With the Spring Batch XML namespace, the first step listed in the configuration is _always_ the first step run by the `Job`. The order of the other step elements does not -matter, but the first step must always appear first in the xml. -==== +matter, but the first step must always appear first in the XML. [[conditionalFlow]] ==== Conditional Flow -In the example above, there are only two possibilities: +In the preceding example, there are only two possibilities: -. The `step` is successful and the next `step` should be executed. -. The `step` failed and, thus, the `job` should fail. +. The `step` is successful, and the next `step` should be executed. +. The `step` failed, and, thus, the `job` should fail. In many cases, this may be sufficient. However, what about a scenario in which the failure of a `step` should trigger a different `step`, rather than causing failure? The @@ -1544,18 +1537,18 @@ image::{batch-asciidoc}images/conditional-flow.png[Conditional Flow, scaledwidth [[nextElement]] [role="xmlContent"] -In order to handle more complex scenarios, the Spring Batch XML namespace allows transitions -elements to be defined within the step element. One such transition is the `next` +To handle more complex scenarios, the Spring Batch XML namespace lets you define transitions +elements within the step element. One such transition is the `next` element. Like the `next` attribute, the `next` element tells the `Job` which `Step` to execute next. However, unlike the attribute, any number of `next` elements are allowed on a given `Step`, and there is no default behavior in the case of failure. This means that, if -transition elements are used, then all of the behavior for the `Step` transitions must be +transition elements are used, all of the behavior for the `Step` transitions must be defined explicitly. Note also that a single step cannot have both a `next` attribute and a `transition` element. [role="xmlContent"] -The `next` element specifies a pattern to match and the step to execute next, as shown in -the following example: +The `next` element specifies a pattern to match and the step to execute next, as +the following example shows: .XML Configuration [source, xml, role="xmlContent"] @@ -1573,7 +1566,7 @@ the following example: [role="javaContent"] The Java API offers a fluent set of methods that let you specify the flow and what to do when a step fails. The following example shows how to specify one step (`stepA`) and then -proceed to either of two different steps (`stepB` and `stepC`), depending on whether +proceed to either of two different steps (`stepB` or `stepC`), depending on whether `stepA` succeeds: .Java Configuration @@ -1601,17 +1594,17 @@ match the `ExitStatus` that results from the execution of the `Step`. Only two special characters are allowed in the pattern: -* "*" matches zero or more characters -* "?" matches exactly one character +* `*` matches zero or more characters +* `?` matches exactly one character -For example, "c*t" matches "cat" and "count", while "c?t" matches "cat" but not "count". +For example, `c*t` matches `cat` and `count`, while `c?t` matches `cat` but not `count`. While there is no limit to the number of transition elements on a `Step`, if the `Step` -execution results in an `ExitStatus` that is not covered by an element, then the +execution results in an `ExitStatus` that is not covered by an element, the framework throws an exception and the `Job` fails. The framework automatically orders transitions from most specific to least specific. This means that, even if the ordering -were swapped for "stepA" in the example above, an `ExitStatus` of "FAILED" would still go -to "stepC". +were swapped for `stepA` in the preceding example, an `ExitStatus` of `FAILED` would still go +to `stepC`. [[batchStatusVsExitStatus]] ===== Batch Status Versus Exit Status @@ -1625,7 +1618,7 @@ record the status of a `Job` or `Step`. It can be one of the following values: or job has completed successfully, `FAILED` is set when it fails, and so on. [role="xmlContent"] -The following example contains the 'next' element when using XML configuration: +The following example contains the `next` element when using XML configuration: // TODO It might help readers to know the difference between STARTING and STARTED (same // for STOPPING and STOPPED). Specifically, when does the status go from STARTING to // STARTED? @@ -1636,7 +1629,7 @@ The following example contains the 'next' element when using XML configuration: ---- [role="javaContent"] -The following example contains the 'on' element when using Java Configuration: +The following example contains the `on` element when using Java Configuration: [source, java, role="javaContent"] ---- @@ -1645,20 +1638,20 @@ The following example contains the 'on' element when using Java Configuration: ... ---- -At first glance, it would appear that 'on' references the `BatchStatus` of the `Step` to +At first glance, it would appear that `on` references the `BatchStatus` of the `Step` to which it belongs. However, it actually references the `ExitStatus` of the `Step`. As the name implies, `ExitStatus` represents the status of a `Step` after it finishes execution. [role="xmlContent"] -More specifically, when using XML configuration, the 'next' element shown in the +More specifically, when using XML configuration, the `next` element shown in the preceding XML configuration example references the exit code of `ExitStatus`. [role="xmlContent"] -When using Java configuration, the 'on()' method shown in the preceding +When using Java configuration, the `on()` method shown in the preceding Java configuration example references the exit code of `ExitStatus`. -In English, it says: "go to stepB if the exit code is `FAILED` ". By default, the exit -code is always the same as the `BatchStatus` for the `Step`, which is why the entry above +In English, it says: "`go to stepB if the exit code is FAILED`". By default, the exit +code is always the same as the `BatchStatus` for the `Step`, which is why the preceding entry works. However, what if the exit code needs to be different? A good example comes from the skip sample job within the samples project: @@ -1694,13 +1687,13 @@ public Job job() { `step1` has three possibilities: -. The `Step` failed, in which case the job should fail. -. The `Step` completed successfully. -. The `Step` completed successfully but with an exit code of 'COMPLETED WITH SKIPS'. In +* The `Step` failed, in which case the job should fail. +* The `Step` completed successfully. +* The `Step` completed successfully but with an exit code of `COMPLETED WITH SKIPS`. In this case, a different step should be run to handle the errors. The preceding configuration works. However, something needs to change the exit code based on -the condition of the execution having skipped records, as shown in the following example: +the condition of the execution having skipped records, as the following example shows: [source, java] ---- @@ -1718,7 +1711,7 @@ public class SkipCheckingListener extends StepExecutionListenerSupport { } ---- -The above code is a `StepExecutionListener` that first checks to make sure the `Step` was +The preceding code is a `StepExecutionListener` that first checks to make sure the `Step` was successful and then checks to see if the skip count on the `StepExecution` is higher than 0. If both conditions are met, a new `ExitStatus` with an exit code of `COMPLETED WITH SKIPS` is returned. @@ -1726,7 +1719,7 @@ successful and then checks to see if the skip count on the `StepExecution` is hi [[configuringForStop]] ==== Configuring for Stop -After the discussion of <>, +After the discussion of <>, one might wonder how the `BatchStatus` and `ExitStatus` are determined for the `Job`. While these statuses are determined for the `Step` by the code that is executed, the statuses for the `Job` are determined based on the configuration. @@ -1755,10 +1748,10 @@ public Job job() { } ---- -If no transitions are defined for a `Step`, then the status of the `Job` is defined as +If no transitions are defined for a `Step`, the status of the `Job` is defined as follows: -* If the `Step` ends with `ExitStatus` FAILED, then the `BatchStatus` and `ExitStatus` of +* If the `Step` ends with `ExitStatus` of `FAILED`, the `BatchStatus` and `ExitStatus` of the `Job` are both `FAILED`. * Otherwise, the `BatchStatus` and `ExitStatus` of the `Job` are both `COMPLETED`. @@ -1777,23 +1770,23 @@ a status of `FAILED` but for the job to have a status of `COMPLETED`. ===== Ending at a Step Configuring a step end instructs a `Job` to stop with a `BatchStatus` of `COMPLETED`. A -`Job` that has finished with status `COMPLETED` cannot be restarted (the framework throws +`Job` that has finished with a status of `COMPLETED` cannot be restarted (the framework throws a `JobInstanceAlreadyCompleteException`). [role="xmlContent"] -When using XML configuration, the 'end' element is used for this task. The `end` element -also allows for an optional 'exit-code' attribute that can be used to customize the -`ExitStatus` of the `Job`. If no 'exit-code' attribute is given, then the `ExitStatus` is +When using XML configuration, you can use the `end` element for this task. The `end` element +also allows for an optional `exit-code` attribute that you can use to customize the +`ExitStatus` of the `Job`. If no `exit-code` attribute is given, the `ExitStatus` is `COMPLETED` by default, to match the `BatchStatus`. [role="javaContent"] -When using Java configuration, the 'end' method is used for this task. The `end` method -also allows for an optional 'exitStatus' parameter that can be used to customize the -`ExitStatus` of the `Job`. If no 'exitStatus' value is provided, then the `ExitStatus` is +When using Java configuration, the `end` method is used for this task. The `end` method +also allows for an optional `exitStatus` parameter that you can use to customize the +`ExitStatus` of the `Job`. If no `exitStatus` value is provided, the `ExitStatus` is `COMPLETED` by default, to match the `BatchStatus`. -Consider the following scenario: if `step2` fails, then the `Job` stops with a -`BatchStatus` of `COMPLETED` and an `ExitStatus` of `COMPLETED` and `step3` does not run. +Consider the following scenario: If `step2` fails, the `Job` stops with a +`BatchStatus` of `COMPLETED` and an `ExitStatus` of `COMPLETED`, and `step3` does not run. Otherwise, execution moves to `step3`. Note that if `step2` fails, the `Job` is not restartable (because the status is `COMPLETED`). @@ -1837,15 +1830,15 @@ Configuring a step to fail at a given point instructs a `Job` to stop with a from being restarted. [role="xmlContent"] -When using XML configuration, the 'fail' element also allows for an optional 'exit-code' -attribute that can be used to customize the `ExitStatus` of the `Job`. If no 'exit-code' -attribute is given, then the `ExitStatus` is `FAILED` by default, to match the +When using XML configuration, the `fail` element also allows for an optional `exit-code` +attribute that can be used to customize the `ExitStatus` of the `Job`. If no `exit-code` +attribute is given, the `ExitStatus` is `FAILED` by default, to match the `BatchStatus`. -Consider the following scenario if `step2` fails, then the `Job` stops with a +Consider the following scenario: If `step2` fails, the `Job` stops with a `BatchStatus` of `FAILED` and an `ExitStatus` of `EARLY TERMINATION` and `step3` does not execute. Otherwise, execution moves to `step3`. Additionally, if `step2` fails and the -`Job` is restarted, then execution begins again on `step2`. +`Job` is restarted, execution begins again on `step2`. [role="xmlContent"] The following example shows the scenario in XML: @@ -1888,14 +1881,14 @@ Configuring a job to stop at a particular step instructs a `Job` to stop with a so that the operator can take some action before restarting the `Job`. [role="xmlContent"] -When using XML configuration, a 'stop' element requires a 'restart' attribute that specifies -the step where execution should pick up when the Job is restarted. +When using XML configuration, a `stop` element requires a `restart` attribute that specifies +the step where execution should pick up when the `Job` is restarted. [role="javaContent"] -When using Java configuration, the `stopAndRestart` method requires a 'restart' attribute +When using Java configuration, the `stopAndRestart` method requires a `restart` attribute that specifies the step where execution should pick up when the Job is restarted. -Consider the following scenario: if `step1` finishes with `COMPLETE`, then the job then +Consider the following scenario: If `step1` finishes with `COMPLETE`, the job then stops. Once it is restarted, execution begins on `step2`. [role="xmlContent"] @@ -1929,7 +1922,7 @@ public Job job() { In some situations, more information than the `ExitStatus` may be required to decide which step to execute next. In this case, a `JobExecutionDecider` can be used to assist -in the decision, as shown in the following example: +in the decision, as the following example shows: [source, java] ---- @@ -1971,7 +1964,7 @@ well as all of the transitions: [role="javaContent"] In the following example, a bean implementing the `JobExecutionDecider` is passed -directly to the `next` call when using Java configuration. +directly to the `next` call when using Java configuration: .Java Configuration [source, java, role="javaContent"] @@ -1995,10 +1988,10 @@ time in a linear fashion. In addition to this typical style, Spring Batch also a for a job to be configured with parallel flows. [role="xmlContent"] -The XML namespace allows you to use the 'split' element. As the following example shows, -the 'split' element contains one or more 'flow' elements, where entire separate flows can -be defined. A 'split' element may also contain any of the previously discussed transition -elements, such as the 'next' attribute or the 'next', 'end' or 'fail' elements. +The XML namespace lets you use the `split` element. As the following example shows, +the `split` element contains one or more `flow` elements, where entire separate flows can +be defined. A `split` element can also contain any of the previously discussed transition +elements, such as the `next` attribute or the `next`, `end`, or `fail` elements. [source, xml, role="xmlContent"] ---- @@ -2015,11 +2008,11 @@ elements, such as the 'next' attribute or the 'next', 'end' or 'fail' elements. ---- [role="javaContent"] -Java based configuration lets you configure splits through the provided builders. As the -following example shows, the 'split' element contains one or more 'flow' elements, where -entire separate flows can be defined. A 'split' element may also contain any of the -previously discussed transition elements, such as the 'next' attribute or the 'next', -'end' or 'fail' elements. +Java-based configuration lets you configure splits through the provided builders. As the +following example shows, the `split` element contains one or more `flow` elements, where +entire separate flows can be defined. A `split` element can also contain any of the +previously discussed transition elements, such as the `next` attribute or the `next`, +`end`, or `fail` elements. [source, java, role="javaContent"] ---- @@ -2054,12 +2047,12 @@ public Job job(Flow flow1, Flow flow2) { ==== Externalizing Flow Definitions and Dependencies Between Jobs Part of the flow in a job can be externalized as a separate bean definition and then -re-used. There are two ways to do so. The first is to simply declare the flow as a +re-used. There are two ways to do so. The first is to declare the flow as a reference to one defined elsewhere. [role="xmlContent"] -The following example shows how to declare a flow as a reference to a flow defined -elsewhere in XML: +The following XML example shows how to declare a flow as a reference to a flow defined +elsewhere: .XML Configuration [source, xml, role="xmlContent"] @@ -2076,10 +2069,10 @@ elsewhere in XML: ---- [role="javaContent"] -The following example shows how to declare a flow as a reference to a flow defined -elsewhere in Java: +The following Java example shows how to declare a flow as a reference to a flow defined +elsewhere: -.Java Configuration +.Java Confguration [source, java, role="javaContent"] ---- @Bean @@ -2100,7 +2093,7 @@ public Flow flow1() { } ---- -The effect of defining an external flow as shown in the preceding example is to insert +The effect of defining an external flow, as shown in the preceding example, is to insert the steps from the external flow into the job as if they had been declared inline. In this way, many jobs can refer to the same template flow and compose such templates into different logical flows. This is also a good way to separate the integration testing of @@ -2172,16 +2165,16 @@ public DefaultJobParametersExtractor jobParametersExtractor() { The job parameters extractor is a strategy that determines how the `ExecutionContext` for the `Step` is converted into `JobParameters` for the `Job` that is run. The `JobStep` is useful when you want to have some more granular options for monitoring and reporting on -jobs and steps. Using `JobStep` is also often a good answer to the question: "How do I -create dependencies between jobs?" It is a good way to break up a large system into +jobs and steps. Using `JobStep` is also often a good answer to the question: "`How do I +create dependencies between jobs?`" It is a good way to break up a large system into smaller modules and control the flow of jobs. [[late-binding]] === Late Binding of `Job` and `Step` Attributes Both the XML and flat file examples shown earlier use the Spring `Resource` abstraction -to obtain a file. This works because `Resource` has a `getFile` method, which returns a -`java.io.File`. Both XML and flat file resources can be configured using standard Spring +to obtain a file. This works because `Resource` has a `getFile` method that returns a +`java.io.File`. You can configure both XML and flat file resources by using standard Spring constructs: [role="xmlContent"] @@ -2216,7 +2209,7 @@ The preceding `Resource` loads the file from the specified file system location. that absolute locations have to start with a double slash (`//`). In most Spring applications, this solution is good enough, because the names of these resources are known at compile time. However, in batch scenarios, the file name may need to be -determined at runtime as a parameter to the job. This can be solved using '-D' parameters +determined at runtime as a parameter to the job. This can be solved using `-D` parameters to read a system property. [role="xmlContent"] @@ -2249,12 +2242,12 @@ public FlatFileItemReader flatFileItemReader(@Value("${input.file.name}") String All that would be required for this solution to work would be a system argument (such as `-Dinput.file.name="file://outputs/file.txt"`). -NOTE: Although a `PropertyPlaceholderConfigurer` can be used here, it is not +NOTE: Although you can use a `PropertyPlaceholderConfigurer` here, it is not necessary if the system property is always set because the `ResourceEditor` in Spring already filters and does placeholder replacement on system properties. -Often, in a batch setting, it is preferable to parametrize the file name in the -`JobParameters` of the job, instead of through system properties, and access them that +Often, in a batch setting, it is preferable to parameterize the file name in the +`JobParameters` of the job (instead of through system properties) and access them that way. To accomplish this, Spring Batch allows for the late binding of various `Job` and `Step` attributes. @@ -2286,7 +2279,7 @@ public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters['input.file } ---- -Both the `JobExecution` and `StepExecution` level `ExecutionContext` can be accessed in +You can access both the `JobExecution` and `StepExecution` level `ExecutionContext` in the same way. [role="xmlContent"] @@ -2339,34 +2332,28 @@ public FlatFileItemReader flatFileItemReader(@Value("#{stepExecutionContext['inp } ---- -[NOTE] -==== -Any bean that uses late-binding must be declared with scope="step". See -<> for more information. It should be noted -that a `Step` bean should not be step-scoped. If late-binding is needed in a step -definition, the components of that step (ie tasklet, item reader/writer, etc) +NOTE: Any bean that uses late binding must be declared with `scope="step"`. See +<> for more information. +A `Step` bean should not be step-scoped. If late binding is needed in a step +definition, the components of that step (tasklet, item reader or writer, and so on) are the ones that should be scoped instead. -==== -[NOTE] -==== -If you are using Spring 3.0 (or above), the expressions in step-scoped beans are in the +NOTE: If you use Spring 3.0 (or above), the expressions in step-scoped beans are in the Spring Expression Language, a powerful general purpose language with many interesting features. To provide backward compatibility, if Spring Batch detects the presence of older versions of Spring, it uses a native expression language that is less powerful and that has slightly different parsing rules. The main difference is that the map keys in the example above do not need to be quoted with Spring 2.5, but the quotes are mandatory in Spring 3.0. -==== // TODO Where is that older language described? It'd be good to have a link to it here. -// Also, given that we're up to version 5 of Spring, should we still be talking about +// Also, given that we are up to version 5 of Spring, should we still be talking about // things from before version 3? (In other words, we should provide a link or drop the // whole thing.) [[step-scope]] ==== Step Scope -All of the late binding examples shown earlier have a scope of "`step`" declared on the +All of the late binding examples shown earlier have a scope of `step` declared on the bean definition. [role="xmlContent"] @@ -2397,10 +2384,10 @@ public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters[input.file. } ---- -Using a scope of `Step` is required in order to use late binding, because the bean cannot -actually be instantiated until the `Step` starts, to allow the attributes to be found. +Using a scope of `Step` is required to use late binding, because the bean cannot +actually be instantiated until the `Step` starts, to let the attributes be found. Because it is not part of the Spring container by default, the scope must be added -explicitly, by using the `batch` namespace or by including a bean definition explicitly +explicitly, by using the `batch` namespace, by including a bean definition explicitly for the `StepScope`, or by using the `@EnableBatchProcessing` annotation. Use only one of those methods. The following example uses the `batch` namespace: @@ -2426,10 +2413,10 @@ The following example includes the bean definition explicitly: ==== Job Scope `Job` scope, introduced in Spring Batch 3.0, is similar to `Step` scope in configuration -but is a Scope for the `Job` context, so that there is only one instance of such a bean +but is a scope for the `Job` context, so that there is only one instance of such a bean per running job. Additionally, support is provided for late binding of references -accessible from the `JobContext` using `#{..}` placeholders. Using this feature, bean -properties can be pulled from the job or job execution context and the job parameters. +accessible from the `JobContext` by using `#{..}` placeholders. Using this feature, you can pull bean +properties from the job or job execution context and the job parameters. [role="xmlContent"] The following example shows an example of binding to job scope in XML: @@ -2481,7 +2468,7 @@ public FlatFileItemReader flatFileItemReader(@Value("#{jobExecutionContext['inpu Because it is not part of the Spring container by default, the scope must be added explicitly, by using the `batch` namespace, by including a bean definition explicitly for -the JobScope, or using the `@EnableBatchProcessing` annotation (but not all of them). +the JobScope, or by using the `@EnableBatchProcessing` annotation (choose only one approach). The following example uses the `batch` namespace: [source, xml] @@ -2503,10 +2490,7 @@ The following example includes a bean that explicitly defines the `JobScope`: ---- -[NOTE] -==== -There are some practical limitations of using job-scoped beans in multi-threaded +NOTE: There are some practical limitations of using job-scoped beans in multi-threaded or partitioned steps. Spring Batch does not control the threads spawned in these use cases, so it is not possible to set them up correctly to use such beans. Hence, -it is not recommended to use job-scoped beans in multi-threaded or partitioned steps. -==== +we do not recommend using job-scoped beans in multi-threaded or partitioned steps. diff --git a/spring-batch-docs/src/main/asciidoc/testing.adoc b/spring-batch-docs/src/main/asciidoc/testing.adoc index d5e3af15c2..d005d256bd 100644 --- a/spring-batch-docs/src/main/asciidoc/testing.adoc +++ b/spring-batch-docs/src/main/asciidoc/testing.adoc @@ -3,7 +3,6 @@ :toclevels: 4 [[testing]] - == Unit Testing ifndef::onlyonetoggle[] @@ -13,36 +12,31 @@ endif::onlyonetoggle[] As with other application styles, it is extremely important to unit test any code written as part of a batch job. The Spring core documentation covers how to unit and integration test with Spring in great detail, so it is not be repeated here. It is important, however, -to think about how to 'end to end' test a batch job, which is what this chapter covers. -The spring-batch-test project includes classes that facilitate this end-to-end test +to think about how to "`end to end`" test a batch job, which is what this chapter covers. +The `spring-batch-test` project includes classes that facilitate this end-to-end test approach. [[creatingUnitTestClass]] - - === Creating a Unit Test Class -In order for the unit test to run a batch job, the framework must load the job's -ApplicationContext. Two annotations are used to trigger this behavior: +For the unit test to run a batch job, the framework must load the job's +`ApplicationContext`. Two annotations are used to trigger this behavior: -* `@RunWith(SpringJUnit4ClassRunner.class)`: Indicates that the class should use Spring's +* `@RunWith(SpringJUnit4ClassRunner.class)` indicates that the class should use Spring's JUnit facilities -* `@ContextConfiguration(...)`: Indicates which resources to configure the +* `@ContextConfiguration(...)` indicates which resources to configure the `ApplicationContext` with. Starting from v4.1, it is also possible to inject Spring Batch test utilities -like the `JobLauncherTestUtils` and `JobRepositoryTestUtils` in the test context -using the `@SpringBatchTest` annotation. +(such as the `JobLauncherTestUtils` and `JobRepositoryTestUtils`) in the test context +by using the `@SpringBatchTest` annotation. -[NOTE] -==== -It should be noted that `JobLauncherTestUtils` requires a `Job` bean and that +NOTE: Note that `JobLauncherTestUtils` requires a `Job` bean and that `JobRepositoryTestUtils` requires a `DataSource` bean. Since `@SpringBatchTest` registers a `JobLauncherTestUtils` and a `JobRepositoryTestUtils` in the test context, it is expected that the test context contains a single autowire candidate for a `Job` and a `DataSource` (either a single bean definition or one that is annotated with `org.springframework.context.annotation.Primary`). -==== [role="javaContent"] The following Java example shows the annotations in use: @@ -70,23 +64,21 @@ public class SkipSampleFunctionalTests { ... } ---- [[endToEndTesting]] - - === End-To-End Testing of Batch Jobs -'End To End' testing can be defined as testing the complete run of a batch job from +"`End To end`" testing can be defined as testing the complete run of a batch job from beginning to end. This allows for a test that sets up a test condition, executes the job, and verifies the end result. Consider an example of a batch job that reads from the database and writes to a flat file. -The test method begins by setting up the database with test data. It clears the CUSTOMER +The test method begins by setting up the database with test data. It clears the `CUSTOMER` table and then inserts 10 new records. The test then launches the `Job` by using the `launchJob()` method. The `launchJob()` method is provided by the `JobLauncherTestUtils` class. The `JobLauncherTestUtils` class also provides the `launchJob(JobParameters)` -method, which allows the test to give particular parameters. The `launchJob()` method +method, which lets the test give particular parameters. The `launchJob()` method returns the `JobExecution` object, which is useful for asserting particular information about the `Job` run. In the following case, the test verifies that the `Job` ended with -status "COMPLETED". +a status of `COMPLETED`. [role="xmlContent"] The following listing shows the example in XML: @@ -164,8 +156,6 @@ public class SkipSampleFunctionalTests { ---- [[testingIndividualSteps]] - - === Testing Individual Steps For complex batch jobs, test cases in the end-to-end testing approach may become @@ -192,7 +182,7 @@ execution. That is the goal of two components in Spring Batch: `StepScopeTestExecutionListener` and `StepScopeTestUtils`. The listener is declared at the class level, and its job is to create a step execution -context for each test method, as shown in the following example: +context for each test method, as the following example shows: [source, java] ---- @@ -228,7 +218,7 @@ The other is the Spring Batch `StepScopeTestExecutionListener`. It works by look factory method in the test case for a `StepExecution`, using that as the context for the test method, as if that execution were active in a `Step` at runtime. The factory method is detected by its signature (it must return a `StepExecution`). If a factory method is -not provided, then a default `StepExecution` is created. +not provided, a default `StepExecution` is created. Starting from v4.1, the `StepScopeTestExecutionListener` and `JobScopeTestExecutionListener` are imported as test execution listeners @@ -284,7 +274,6 @@ int count = StepScopeTestUtils.doInStepScope(stepExecution, ---- [[validatingOutputFiles]] - === Validating Output Files When a batch job writes to the database, it is easy to query the database to verify that @@ -293,7 +282,7 @@ important that the output be verified. Spring Batch provides a class called `Ass to facilitate the verification of output files. The method called `assertFileEquals` takes two `File` objects (or two `Resource` objects) and asserts, line by line, that the two files have the same content. Therefore, it is possible to create a file with the expected -output and to compare it to the actual result, as shown in the following example: +output and to compare it to the actual result, as the following example shows: [source, java] ---- @@ -305,13 +294,11 @@ AssertFile.assertFileEquals(new FileSystemResource(EXPECTED_FILE), ---- [[mockingDomainObjects]] - - === Mocking Domain Objects Another common issue encountered while writing unit and integration tests for Spring Batch components is how to mock domain objects. A good example is a `StepExecutionListener`, as -illustrated in the following code snippet: +the following code snippet shows: [source, java] ---- @@ -326,10 +313,10 @@ public class NoWorkFoundStepExecutionListener extends StepExecutionListenerSuppo } ---- -The preceding listener example is provided by the framework and checks a `StepExecution` +The framework provides the preceding listener example and checks a `StepExecution` for an empty read count, thus signifying that no work was done. While this example is -fairly simple, it serves to illustrate the types of problems that may be encountered when -attempting to unit test classes that implement interfaces requiring Spring Batch domain +fairly simple, it serves to illustrate the types of problems that you may encounter when +you try to unit test classes that implement interfaces requiring Spring Batch domain objects. Consider the following unit test for the listener's in the preceding example: [source, java] @@ -356,7 +343,7 @@ Because the Spring Batch domain model follows good object-oriented principles, t model, it does make creating stub objects for unit testing verbose. To address this issue, the Spring Batch test module includes a factory for creating domain objects: `MetaDataInstanceFactory`. Given this factory, the unit test can be updated to be more -concise, as shown in the following example: +concise, as the following example shows: [source, java] ---- @@ -374,6 +361,6 @@ public void testAfterStep() { } ---- -The preceding method for creating a simple `StepExecution` is just one convenience method -available within the factory. A full method listing can be found in its +The preceding method for creating a simple `StepExecution` is only one convenience method +available within the factory. You can find a full method listing in its link:$$http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/test/MetaDataInstanceFactory.html$$[Javadoc]. diff --git a/spring-batch-docs/src/main/asciidoc/transaction-appendix.adoc b/spring-batch-docs/src/main/asciidoc/transaction-appendix.adoc index f4e68ae8a6..4f9aa335c6 100644 --- a/spring-batch-docs/src/main/asciidoc/transaction-appendix.adoc +++ b/spring-batch-docs/src/main/asciidoc/transaction-appendix.adoc @@ -3,7 +3,6 @@ :toclevels: 4 [[transactions]] - [appendix] == Batch Processing and Transactions @@ -12,7 +11,7 @@ Consider the following simple example of a nested batch with no retries. It shows a common scenario for batch processing: An input source is processed until exhausted, and -we commit periodically at the end of a "chunk" of processing. +it commits periodically at the end of a "`chunk`" of processing. ---- @@ -29,7 +28,7 @@ we commit periodically at the end of a "chunk" of processing. ---- -The input operation (3.1) could be a message-based receive (such as from JMS), or a +The input operation (3.1) could be a message-based receive (such as from JMS) or a file-based read, but to recover and continue processing with a chance of completing the whole job, it must be transactional. The same applies to the operation at 3.2. It must be either transactional or idempotent. @@ -41,7 +40,7 @@ must roll back the whole chunk. === Simple Stateless Retry It is also useful to use a retry for an operation which is not transactional, such as a -call to a web-service or other remote resource, as shown in the following example: +call to a web-service or other remote resource, as the following example shows: ---- @@ -58,14 +57,14 @@ call to a web-service or other remote resource, as shown in the following exampl This is actually one of the most useful applications of a retry, since a remote call is much more likely to fail and be retryable than a database update. As long as the remote access (2.1) eventually succeeds, the transaction, `TX` (0), commits. If the remote -access (2.1) eventually fails, then the transaction, `TX` (0), is guaranteed to roll +access (2.1) eventually fails, the transaction, `TX` (0), is guaranteed to roll back. [[repeatRetry]] === Typical Repeat-Retry Pattern The most typical batch processing pattern is to add a retry to the inner block of the -chunk, as shown in the following example: +chunk, as the following example shows: ---- @@ -89,46 +88,46 @@ chunk, as shown in the following example: ---- -The inner `RETRY` (4) block is marked as "stateful". See <> for a description of a stateful retry. This means that if the +The inner `RETRY` (4) block is marked as "`stateful`". See <> for a description of a stateful retry. This means that, if the retry `PROCESS` (5) block fails, the behavior of the `RETRY` (4) is as follows: . Throw an exception, rolling back the transaction, `TX` (2), at the chunk level, and allowing the item to be re-presented to the input queue. -. When the item re-appears, it might be retried depending on the retry policy in place, -executing `PROCESS` (5) again. The second and subsequent attempts might fail again and +. When the item re-appears, it might be retried, depending on the retry policy in place, and +executing `PROCESS` (5) again. The second and subsequent attempts might fail again and re-throw the exception. . Eventually, the item reappears for the final time. The retry policy disallows another attempt, so `PROCESS` (5) is never executed. In this case, we follow the `RECOVER` (6) -path, effectively "skipping" the item that was received and is being processed. +path, effectively "`skipping`" the item that was received and is being processed. -Note that the notation used for the `RETRY` (4) in the plan above explicitly shows that +Note that the notation used for the `RETRY` (4) in the plan explicitly shows that the input step (4.1) is part of the retry. It also makes clear that there are two alternate paths for processing: the normal case, as denoted by `PROCESS` (5), and the recovery path, as denoted in a separate block by `RECOVER` (6). The two alternate paths are completely distinct. Only one is ever taken in normal circumstances. -In special cases (such as a special `TransactionValidException` type), the retry policy +In special cases (such as a special `TranscationValidException` type), the retry policy might be able to determine that the `RECOVER` (6) path can be taken on the last attempt after `PROCESS` (5) has just failed, instead of waiting for the item to be re-presented. This is not the default behavior, because it requires detailed knowledge of what has happened inside the `PROCESS` (5) block, which is not usually available. For example, if -the output included write access before the failure, then the exception should be +the output included write access before the failure, the exception should be re-thrown to ensure transactional integrity. -The completion policy in the outer `REPEAT` (1) is crucial to the success of the above +The completion policy in the outer `REPEAT` (1) is crucial to the success of the plan. If the output (5.1) fails, it may throw an exception (it usually does, as described), in which case the transaction, `TX` (2), fails, and the exception could -propagate up through the outer batch `REPEAT` (1). We do not want the whole batch to +propagate up through the outer batch `REPEAT` (1). We do not want the whole batch to stop, because the `RETRY` (4) might still be successful if we try again, so we add `exception=not critical` to the outer `REPEAT` (1). -Note, however, that if the `TX` (2) fails and we __do__ try again, by virtue of the outer +Note, however, that if the `TX` (2) fails and we _do_ try again, by virtue of the outer completion policy, the item that is next processed in the inner `REPEAT` (3) is not -guaranteed to be the one that just failed. It might be, but it depends on the -implementation of the input (4.1). Thus, the output (5.1) might fail again on either a -new item or the old one. The client of the batch should not assume that each `RETRY` (4) -attempt is going to process the same items as the last one that failed. For example, if +guaranteed to be the one that just failed. It might be, but it depends on the +implementation of the input (4.1). Thus, the output (5.1) might fail again on either a +new item or the old one. The client of the batch should not assume that each `RETRY` (4) +attempt is going to process the same items as the last one that failed. For example, if the termination policy for `REPEAT` (1) is to fail after 10 attempts, it fails after 10 consecutive attempts but not necessarily at the same item. This is consistent with the overall retry strategy. The inner `RETRY` (4) is aware of the history of each item and @@ -170,7 +169,7 @@ asynchronous chunk processing: The individual items in chunks in the <> can also, in principle, be processed concurrently. In this case, the transaction boundary has to move to the level of the individual item, so that each transaction is on a single thread, as -shown in the following example: +the following example shows: ---- @@ -195,7 +194,7 @@ shown in the following example: ---- This plan sacrifices the optimization benefit, which the simple plan had, of having all -the transactional resources chunked together. It is only useful if the cost of the +the transactional resources chunked together. It is useful only if the cost of the processing (5) is much higher than the cost of transaction management (3). [[transactionPropagation]] @@ -227,7 +226,7 @@ Again, and for the same reason, the inner transaction, `TX` (3), can cause the o transaction, `TX` (1), to fail, even if the `RETRY` (2) is eventually successful. Unfortunately, the same effect percolates from the retry block up to the surrounding -repeat batch if there is one, as shown in the following example: +repeat batch if there is one, as the following example shows: ---- @@ -253,25 +252,25 @@ back at the end. What about non-default propagation? * In the preceding example, `PROPAGATION_REQUIRES_NEW` at `TX` (3) prevents the outer -`TX` (1) from being polluted if both transactions are eventually successful. But if `TX` -(3) commits and `TX` (1) rolls back, then `TX` (3) stays committed, so we violate the -transaction contract for `TX` (1). If `TX` (3) rolls back, `TX` (1) does not necessarily +`TX` (1) from being polluted if both transactions are eventually successful. But if `TX` +(3) commits and `TX` (1) rolls back, `TX` (3) stays committed, so we violate the +transaction contract for `TX` (1). If `TX` (3) rolls back, `TX` (1) does not necessarily roll back (but it probably does in practice, because the retry throws a roll back exception). * `PROPAGATION_NESTED` at `TX` (3) works as we require in the retry case (and for a batch with skips): `TX` (3) can commit but subsequently be rolled back by the outer -transaction, `TX` (1). If `TX` (3) rolls back, `TX` (1) rolls back in practice. This +transaction, `TX` (1). If `TX` (3) rolls back, `TX` (1) rolls back in practice. This option is only available on some platforms, not including Hibernate or JTA, but it is the only one that consistently works. Consequently, the `NESTED` pattern is best if the retry block contains any database access. -[[specialTransactionOrthogonal]] +[[specialTransactionOrthonogonal]] === Special Case: Transactions with Orthogonal Resources Default propagation is always OK for simple cases where there are no nested database -transactions. Consider the following example, where the `SESSION` and `TX` are not +transactions. Consider the following example, where the `SESSION` and `TX` are not global `XA` resources, so their resources are orthogonal: ---- @@ -287,19 +286,19 @@ global `XA` resources, so their resources are orthogonal: ---- -Here there is a transactional message `SESSION` (0), but it does not participate in other +Here there is a transactional message, `SESSION` (0), but it does not participate in other transactions with `PlatformTransactionManager`, so it does not propagate when `TX` (3) -starts. There is no database access outside the `RETRY` (2) block. If `TX` (3) fails and +starts. There is no database access outside the `RETRY` (2) block. If `TX` (3) fails and then eventually succeeds on a retry, `SESSION` (0) can commit (independently of a `TX` -block). This is similar to the vanilla "best-efforts-one-phase-commit" scenario. The +block). This is similar to the vanilla "`best-efforts-one-phase-commit`" scenario. The worst that can happen is a duplicate message when the `RETRY` (2) succeeds and the `SESSION` (0) cannot commit (for example, because the message system is unavailable). [[statelessRetryCannotRecover]] === Stateless Retry Cannot Recover -The distinction between a stateless and a stateful retry in the typical example above is -important. It is actually ultimately a transactional constraint that forces the +The distinction between a stateless and a stateful retry in the typical example shown earlier is +important. It is actually ultimately a transactional constraint that forces the distinction, and this constraint also makes it obvious why the distinction exists. We start with the observation that there is no way to skip an item that failed and @@ -332,14 +331,14 @@ follows: The preceding example shows a stateless `RETRY` (3) with a `RECOVER` (5) path that kicks in after the final attempt fails. The `stateless` label means that the block is repeated -without re-throwing any exception up to some limit. This only works if the transaction, -`TX` (4), has propagation NESTED. +without re-throwing any exception up to some limit. This works only if the transaction, +`TX` (4), has propagation nested. If the inner `TX` (4) has default propagation properties and rolls back, it pollutes the outer `TX` (1). The inner transaction is assumed by the transaction manager to have corrupted the transactional resource, so it cannot be used again. -Support for NESTED propagation is sufficiently rare that we choose not to support -recovery with stateless retries in the current versions of Spring Batch. The same effect +Support for nested propagation is sufficiently rare that we choose not to support +recovery with stateless retries in the current versions of Spring Batch. The same effect can always be achieved (at the expense of repeating more processing) by using the -typical pattern above. +typical pattern shown earlier. diff --git a/spring-batch-docs/src/main/asciidoc/whatsnew.adoc b/spring-batch-docs/src/main/asciidoc/whatsnew.adoc index 6854628856..c370d65be9 100644 --- a/spring-batch-docs/src/main/asciidoc/whatsnew.adoc +++ b/spring-batch-docs/src/main/asciidoc/whatsnew.adoc @@ -3,10 +3,9 @@ :toclevels: 4 [[whatsNew]] - == What's New in Spring Batch 5.0 -Spring Batch 5.0 release has the following major themes: +Spring Batch 5.0 has the following major themes: * Java 17 Requirement * Dependencies Re-baseline @@ -15,17 +14,18 @@ Spring Batch 5.0 release has the following major themes: * Pruning For more details about the changes, -please refer to the link:$$https://github.com/spring-projects/spring-batch/wiki/Spring-Batch-5.0-Migration-Guide$$[migration guide]. +see the link:$$https://github.com/spring-projects/spring-batch/wiki/Spring-Batch-5.0-Migration-Guide$$[migration guide]. === Java 17 Requirement Spring Batch follows Spring Framework's baselines for both Java version and third party dependencies. -With Spring Batch 5, the Spring Framework version is being upgraded to Spring Framework 6 which requires Java 17. +With Spring Batch 5, the Spring Framework version is being upgraded to Spring Framework 6, which requires Java 17. As a result, the Java version requirement for Spring Batch is also increasing to Java 17. +[[dependencies-re-baseline]] === Dependencies Re-baseline -In order to continue the integration with supported versions of the third party libraries that Spring Batch uses, +To continue the integration with supported versions of the third party libraries that Spring Batch uses, Spring Batch 5 is updating the dependencies across the board to the following versions: * Spring Framework 6 @@ -37,24 +37,32 @@ Spring Batch 5 is updating the dependencies across the board to the following ve This release also marks the migration to Jakarta EE 9 APIs. -=== Batch infrastructure configuration updates +[[batch-infrastructure-configuration-updates]] +=== Batch Infrastructure Configuration Updates + +Spring Batch 5 includes the following infrastructure configuration updates: + +* <> +* <> -==== DataSource requirement updates +[[datasource-requirement-updates]] +==== DataSource Requirement Updates -Historically, Spring Batch provided a Map-based job repository and job explorer implementations to work with +Historically, Spring Batch provided a map-based job repository and job explorer implementations to work with an in-memory job repository. These implementations were deprecated in version 4 and completely removed in version 5. -The recommended replacement is to use the Jdbc-based implementations with an embedded database like H2, HSQL, etc. +The recommended replacement is to use the JDBC-based implementations with an embedded database, such as H2, HSQL, and others. -In this release, the `@EnableBatchProcessing` annotation will configure a Jdbc-based `JobRepository` which requires a +In this release, the `@EnableBatchProcessing` annotation configures a JDBC-based `JobRepository`, which requires a `DataSource` bean in the application context. The `DataSource` bean could refer to an embedded database to work with an in-memory job repository. -==== Transaction manager bean exposure +[[transaction-manager-bean-exposure]] +==== Transaction Manager Bean Exposure -Up until version 4.3, the `@EnableBatchProcessing` annotation exposed a tranasaction manager bean in the application -context. While this was convenient in many cases, the unconditional exposure of a tranasaction manager could -interfere with a user-defined transaction manager. In this release, `@EnableBatchProcessing` does not expose a -transaction manager bean in the application context anymore. +Until version 4.3, the `@EnableBatchProcessing` annotation exposed a transaction manager bean in the application +context. While this was convenient in many cases, the unconditional exposure of a transaction manager could +interfere with a user-defined transaction manager. In this release, `@EnableBatchProcessing` no longer exposes a +transaction manager bean in the application context. === New features @@ -70,7 +78,14 @@ This release introduces the support of SAP HANA as an additional supported datab === Pruning -==== Deprecated APIs removal +Spring Batch 5 removes a number of items that are no longer needed, including: + +* <> +* <> +* <> + +[[deprecated-apis-removal]] +==== Deprecated APIs Removal The following APIs were deprecated in previous versions and have been removed in this release: @@ -109,11 +124,13 @@ The following APIs were deprecated in previous versions and have been removed in * Method `org.springframework.batch.integration.config.annotation.BatchIntegrationConfiguration#remotePartitioningMasterStepBuilderFactory()` * Method `org.springframework.batch.item.util.FileUtils#setUpOutputFile(File file, boolean restarted, boolean overwriteOutputFile)` -==== SQLFire support removal +[[sqlfire-support-removal]] +==== SQLFire Support Removal SqlFire has been announced to be EOL as of November 1st, 2014. The support of SQLFire as a job repository was deprecated in version 4.3 and removed in version 5.0. -==== JSR-352 implementation removal +[[jsr-352-implementation-removal]] +==== JSR-352 Implementation Removal -Due to a lack of adoption, the implementation of the JSR-352 has been discontinued in this release. +Due to a lack of adoption, the implementation of JSR-352 has been removed from this release. From f1c64b6c77eae19b528b8ee7ede68da6a7ab2b2c Mon Sep 17 00:00:00 2001 From: Jay Bryant Date: Mon, 28 Mar 2022 08:40:19 -0500 Subject: [PATCH 2/2] Corrected a typo --- spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc b/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc index 1eee0f5c53..1f3b4f4aef 100644 --- a/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc +++ b/spring-batch-docs/src/main/asciidoc/readersAndWriters.adoc @@ -1694,7 +1694,7 @@ file-1.txt file-2.txt ignored.txt ---- file-1.txt and file-2.txt are formatted the same and, for business reasons, should be -processed together. The `MuliResourceItemReader` can be used to read in both files by +processed together. The `MultiResourceItemReader` can be used to read in both files by using wildcards. [role="xmlContent"]