Spring Batch
 Introduction  Basics  Batch Processing Strategies, Batch Architecture Overview  Job Hierarchy, Running Job  Step, Chunk-oriented Processing, Tasklet  Controlling Step Flow  ItemReaders, ItemProcessors, ItemWriters  More than basics  Logging Item Processing and Failures  Executing System Commands  Passing Data to Future Steps  Spring Batch Integration  Scaling and Parallel Processing 4/2020furuCRM 2
Introduction  Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.  Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. 4/2020furuCRM 3
Usage Scenarios  A typical batch program generally:  Reads a large number of records from a database, file, or queue.  Processes the data in some fashion.  Writes back data in a modified form 4/2020furuCRM 4
Usage Scenarios  Business Scenarios  Commit batch process periodically  Concurrent batch processing: parallel processing of a job  Staged, enterprise message-driven processing  Massively parallel batch processing  Manual or scheduled restart after failure  Sequential processing of dependent steps (with extensions to workflow-driven batches)  Partial processing: skip records (for example, on rollback)  Whole-batch transaction, for cases with a small batch size or existing stored procedures/scripts 4/2020furuCRM 5
Batch Processing Strategies  1. Normal processing in a batch window  The data being updated is not required by on-line users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run.  2. Concurrent batch or on-line processing  Data that can be simultaneously updated by on-line users should not lock any data (either in the database or in files) which could be required by on-line users for more than a few seconds. Also, updates should be committed to the database at the end of every few transactions.  3. Parallel Processing  Multiple batch runs or jobs to run in parallel to minimize the total elapsed batch processing time. The jobs are not sharing the same files, db-tables, or index spaces.  4. Partitioning  Multiple versions of large batch applications to run concurrently to reduce the elapsed time required to process long batch jobs. Processes that can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data. 4/2020furuCRM 6
Batch Architecture Overview 4/2020furuCRM 7
Job Hierarchy 4/2020furuCRM 8 Daily_Job Daily_Job Date1 Daily_Job Date2Execution Date1 (x) Execution Date1 (x) Execution Date1 (o) Stp1 Validate data Stp2 Save to DB Stp3 Create email …
Running Job  Launching a batch job requires two things:  Job  Job Launcher.  Launching from the command line :  New JVM will be instantiated for each Job,  Every job will have its own JobLauncher. 4/2020furuCRM 9
Running Job  Launching from within a web container  Within the scope of an HttpRequest  One JobLauncher configured for asynchronous job launching  Multiple requests will invoke to launch their jobs. 4/2020furuCRM 10
Step  Step is a domain object that encapsulates an independent, sequential phase of a batch job  Step contains all of the information necessary to define and control the actual batch processing. 4/2020furuCRM 11
Chunk-oriented Processing 4/2020furuCRM 12
Tasklet  A simple interface that has one method, execute, which is called repeatedly until it either returns status FINISHED or throws an exception to signal a failure.  Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. 4/2020furuCRM 13
Controlling Step Flow – Sequential Flow 4/2020furuCRM 14
Controlling Step Flow – Conditional Flow 4/2020furuCRM 15
ItemReaders  Flat File: Flat-file item readers read lines of data from a flat file that typically describes records with fields of data defined by fixed positions in the file or delimited by some special character (such as a comma).  XML: XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema.  Database: A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReader implementations invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements. 4/2020furuCRM 16
ItemReaders  DatabaseCursor 4/2020furuCRM 17
ItemReaders  DatabasePaging 4/2020furuCRM 18
ItemWriters  ItemWriter is similar in functionality to an ItemReader but with inverse operations.  Resources still need to be located, opened, and closed but they differ in that an ItemWriter writes out, rather than reading in.  In the case of databases or queues, these operations may be inserts, updates, or sends.  The format of the serialization of the output is specific to each batch job. 4/2020furuCRM 19
ItemProcessor  Given one object, transform it and return another.  The provided object may or may not be of the same type. The point is that business logic may be applied within the process, and it is completely up to the developer to create that logic. 4/2020furuCRM 20
Logging Item Processing and Failures 4/2020furuCRM 21
Executing System Commands 4/2020furuCRM 22
Passing Data to Future Steps 4/2020furuCRM 23
Scaling and Parallel Processing  There are two modes of parallel processing:  Single process, multi-threaded  Multi-process  These break down into categories as well, as follows:  Multi-threaded Step (single process)  Parallel Steps (single process)  Remote Chunking of Step (multi process)  Partitioning a Step (single or multi process) 4/2020furuCRM 24
Multi-threaded Step 4/2020furuCRM 25
Parallel Steps 4/2020furuCRM 26
Remote Chunking 4/2020furuCRM 27
Partitioning 4/2020furuCRM 28
Spring Batch Demo  CSV to MySQL  MySQL to XML (cursor)  Passing data between steps  MySQL to CSV (paging) 4/2020furuCRM 29
Thank You furuCRM https://docs.spring.io/spring-batch/docs/current/reference/html/index-single.html https://mkyong.com/tutorials/spring-batch-tutorial/ https://www.tutorialspoint.com/spring_batch/index.htm https://howtodoinjava.com/spring-batch/ https://github.com/spring-projects/spring-batch

Java spring batch

  • 1.
  • 2.
     Introduction  Basics Batch Processing Strategies, Batch Architecture Overview  Job Hierarchy, Running Job  Step, Chunk-oriented Processing, Tasklet  Controlling Step Flow  ItemReaders, ItemProcessors, ItemWriters  More than basics  Logging Item Processing and Failures  Executing System Commands  Passing Data to Future Steps  Spring Batch Integration  Scaling and Parallel Processing 4/2020furuCRM 2
  • 3.
    Introduction  Spring Batchis a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.  Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. 4/2020furuCRM 3
  • 4.
    Usage Scenarios  Atypical batch program generally:  Reads a large number of records from a database, file, or queue.  Processes the data in some fashion.  Writes back data in a modified form 4/2020furuCRM 4
  • 5.
    Usage Scenarios  BusinessScenarios  Commit batch process periodically  Concurrent batch processing: parallel processing of a job  Staged, enterprise message-driven processing  Massively parallel batch processing  Manual or scheduled restart after failure  Sequential processing of dependent steps (with extensions to workflow-driven batches)  Partial processing: skip records (for example, on rollback)  Whole-batch transaction, for cases with a small batch size or existing stored procedures/scripts 4/2020furuCRM 5
  • 6.
    Batch Processing Strategies 1. Normal processing in a batch window  The data being updated is not required by on-line users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run.  2. Concurrent batch or on-line processing  Data that can be simultaneously updated by on-line users should not lock any data (either in the database or in files) which could be required by on-line users for more than a few seconds. Also, updates should be committed to the database at the end of every few transactions.  3. Parallel Processing  Multiple batch runs or jobs to run in parallel to minimize the total elapsed batch processing time. The jobs are not sharing the same files, db-tables, or index spaces.  4. Partitioning  Multiple versions of large batch applications to run concurrently to reduce the elapsed time required to process long batch jobs. Processes that can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data. 4/2020furuCRM 6
  • 7.
  • 8.
    Job Hierarchy 4/2020furuCRM 8 Daily_Job Daily_JobDate1 Daily_Job Date2Execution Date1 (x) Execution Date1 (x) Execution Date1 (o) Stp1 Validate data Stp2 Save to DB Stp3 Create email …
  • 9.
    Running Job  Launchinga batch job requires two things:  Job  Job Launcher.  Launching from the command line :  New JVM will be instantiated for each Job,  Every job will have its own JobLauncher. 4/2020furuCRM 9
  • 10.
    Running Job  Launchingfrom within a web container  Within the scope of an HttpRequest  One JobLauncher configured for asynchronous job launching  Multiple requests will invoke to launch their jobs. 4/2020furuCRM 10
  • 11.
    Step  Step isa domain object that encapsulates an independent, sequential phase of a batch job  Step contains all of the information necessary to define and control the actual batch processing. 4/2020furuCRM 11
  • 12.
  • 13.
    Tasklet  A simpleinterface that has one method, execute, which is called repeatedly until it either returns status FINISHED or throws an exception to signal a failure.  Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. 4/2020furuCRM 13
  • 14.
    Controlling Step Flow– Sequential Flow 4/2020furuCRM 14
  • 15.
    Controlling Step Flow– Conditional Flow 4/2020furuCRM 15
  • 16.
    ItemReaders  Flat File:Flat-file item readers read lines of data from a flat file that typically describes records with fields of data defined by fixed positions in the file or delimited by some special character (such as a comma).  XML: XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema.  Database: A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReader implementations invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements. 4/2020furuCRM 16
  • 17.
  • 18.
  • 19.
    ItemWriters  ItemWriter issimilar in functionality to an ItemReader but with inverse operations.  Resources still need to be located, opened, and closed but they differ in that an ItemWriter writes out, rather than reading in.  In the case of databases or queues, these operations may be inserts, updates, or sends.  The format of the serialization of the output is specific to each batch job. 4/2020furuCRM 19
  • 20.
    ItemProcessor  Given oneobject, transform it and return another.  The provided object may or may not be of the same type. The point is that business logic may be applied within the process, and it is completely up to the developer to create that logic. 4/2020furuCRM 20
  • 21.
    Logging Item Processingand Failures 4/2020furuCRM 21
  • 22.
  • 23.
    Passing Data toFuture Steps 4/2020furuCRM 23
  • 24.
    Scaling and ParallelProcessing  There are two modes of parallel processing:  Single process, multi-threaded  Multi-process  These break down into categories as well, as follows:  Multi-threaded Step (single process)  Parallel Steps (single process)  Remote Chunking of Step (multi process)  Partitioning a Step (single or multi process) 4/2020furuCRM 24
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    Spring Batch Demo CSV to MySQL  MySQL to XML (cursor)  Passing data between steps  MySQL to CSV (paging) 4/2020furuCRM 29
  • 30.