Spring Batch is a framework for writing batch processing applications. It provides reusable functions for processing large volumes of records, including logging, transactions, restarts and resource management. A typical batch job reads data, processes it, and writes the results. It supports various processing strategies like normal processing, concurrent processing, parallel processing and partitioning. The core components are jobs made up of steps, which use readers, processors and writers to operate on chunks of data in a configurable flow.
Overview of Spring Batch, its purpose in processing large data sets and essential functions.
Introduction to Spring Batch use cases, including batch processing scenarios and various business strategies.
Different strategies for batch processing: normal, concurrent, parallel processing, and partitioning.
Overview of batch architecture and job hierarchy including execution structure.
Details on how to run jobs in different environments and launching jobs with JobLaunchers.
Definition of Steps and Tasklets, discussing their role in batch job processing.
Methods to control the flow of steps in batch jobs including sequential and conditional flows.
Multiple types of ItemReaders for data input and ItemProcessors for transforming data within batch jobs. Functionality of ItemWriters for data output and concepts of logging item processing and errors.
Using Spring Batch to execute system commands within batch processes.
Mechanisms for passing data to future steps in batch processing.
Strategies for scaling batch jobs through parallel processing and different modes.
Demonstrating Spring Batch features with real-world examples including data conversion scenarios.
Concluding remarks and links to additional resources for learning about Spring Batch.
Introduction Basics Batch Processing Strategies, Batch Architecture Overview Job Hierarchy, Running Job Step, Chunk-oriented Processing, Tasklet Controlling Step Flow ItemReaders, ItemProcessors, ItemWriters More than basics Logging Item Processing and Failures Executing System Commands Passing Data to Future Steps Spring Batch Integration Scaling and Parallel Processing 4/2020furuCRM 2
3.
Introduction Spring Batchis a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. 4/2020furuCRM 3
4.
Usage Scenarios Atypical batch program generally: Reads a large number of records from a database, file, or queue. Processes the data in some fashion. Writes back data in a modified form 4/2020furuCRM 4
5.
Usage Scenarios BusinessScenarios Commit batch process periodically Concurrent batch processing: parallel processing of a job Staged, enterprise message-driven processing Massively parallel batch processing Manual or scheduled restart after failure Sequential processing of dependent steps (with extensions to workflow-driven batches) Partial processing: skip records (for example, on rollback) Whole-batch transaction, for cases with a small batch size or existing stored procedures/scripts 4/2020furuCRM 5
6.
Batch Processing Strategies 1. Normal processing in a batch window The data being updated is not required by on-line users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run. 2. Concurrent batch or on-line processing Data that can be simultaneously updated by on-line users should not lock any data (either in the database or in files) which could be required by on-line users for more than a few seconds. Also, updates should be committed to the database at the end of every few transactions. 3. Parallel Processing Multiple batch runs or jobs to run in parallel to minimize the total elapsed batch processing time. The jobs are not sharing the same files, db-tables, or index spaces. 4. Partitioning Multiple versions of large batch applications to run concurrently to reduce the elapsed time required to process long batch jobs. Processes that can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data. 4/2020furuCRM 6
Running Job Launchinga batch job requires two things: Job Job Launcher. Launching from the command line : New JVM will be instantiated for each Job, Every job will have its own JobLauncher. 4/2020furuCRM 9
10.
Running Job Launchingfrom within a web container Within the scope of an HttpRequest One JobLauncher configured for asynchronous job launching Multiple requests will invoke to launch their jobs. 4/2020furuCRM 10
11.
Step Step isa domain object that encapsulates an independent, sequential phase of a batch job Step contains all of the information necessary to define and control the actual batch processing. 4/2020furuCRM 11
Tasklet A simpleinterface that has one method, execute, which is called repeatedly until it either returns status FINISHED or throws an exception to signal a failure. Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. 4/2020furuCRM 13
ItemReaders Flat File:Flat-file item readers read lines of data from a flat file that typically describes records with fields of data defined by fixed positions in the file or delimited by some special character (such as a comma). XML: XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema. Database: A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReader implementations invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements. 4/2020furuCRM 16
ItemWriters ItemWriter issimilar in functionality to an ItemReader but with inverse operations. Resources still need to be located, opened, and closed but they differ in that an ItemWriter writes out, rather than reading in. In the case of databases or queues, these operations may be inserts, updates, or sends. The format of the serialization of the output is specific to each batch job. 4/2020furuCRM 19
20.
ItemProcessor Given oneobject, transform it and return another. The provided object may or may not be of the same type. The point is that business logic may be applied within the process, and it is completely up to the developer to create that logic. 4/2020furuCRM 20
Scaling and ParallelProcessing There are two modes of parallel processing: Single process, multi-threaded Multi-process These break down into categories as well, as follows: Multi-threaded Step (single process) Parallel Steps (single process) Remote Chunking of Step (multi process) Partitioning a Step (single or multi process) 4/2020furuCRM 24