深入理解Spring Batch

4星 · 超过85%的资源需积分: 10 18 浏览量更新于2024-07-26 2 收藏 3.28MB PDF 举报

"Spring Batch in Action 是一本关于Spring Batch的书籍，旨在介绍和深入探讨Spring Batch框架。这本书适合想要理解和掌握Spring Batch的开发者，通过它能学习到如何在实际项目中进行批处理任务的开发与管理。" Spring Batch是Spring框架的一部分，专为处理批量数据和任务执行而设计。它提供了完整的解决方案，包括读取、处理和写入大量数据，以及错误处理、事务管理和作业监控等功能。这本书将帮助读者理解Spring Batch的核心概念，以及如何在实际应用中充分利用它的特性。 Part 1: Background 部分主要介绍了Spring Batch的基本概念，包括批处理的重要性、Spring Batch的设计哲学以及如何开始使用Spring Batch。第一章"Introducing Spring Batch"会引导读者了解Spring Batch的用途和价值，第二章"Getting Started with Spring Batch"则提供了一个快速入门的教程，让读者能够快速搭建起一个基本的批处理项目。 Part 2: Core Spring Batch 部分详细讲解了Spring Batch的核心组件和使用方法。第四章"Batch Configuration"涵盖了配置批处理作业的细节，第五章"Running Batch Jobs"讨论了如何启动和管理批处理作业。第六章"Reading Data"和第七章"Writing Data"分别关注输入和输出操作，讲解了如何处理各种数据源。第八章"Processing Data"讨论了数据处理逻辑，包括过滤、转换和聚合等操作。第九章"Implementing Bulletproof Jobs"则关注如何构建健壮和可恢复的批处理作业，而第十章"Transaction Management"讲述了如何在批处理中正确管理事务，确保数据的一致性。 Part 3: Advanced Spring Batch 针对更高级的主题，如执行处理、企业集成和扩展功能进行了深入探讨。第十一章"Handling Execution"讲解了如何处理作业执行中的复杂情况，如并发执行、重试策略和失败恢复。第十二章"Enterprise Integration"可能涉及Spring Batch与其他企业系统的集成，如消息队列、数据库和外部服务。 "Spring Batch in Action"是一本全面的指南，它不仅介绍了Spring Batch的基本用法，还深入到了高级特性和实践策略，适合希望在Java环境中进行高效批处理开发的工程师阅读。通过本书的学习，读者将能够熟练地利用Spring Batch来解决大规模数据处理问题，提升工作效率。

©Manning Publications Co. Please post comments or corrections to the Author Online forum:

http://www.manning-sandbox.com/forum.jspa?forumID=679

Table 1.2 Supported technologies by Spring Batch for read/write scenarios

Data source type Technology/format Description

Database

JDBC

Leverages paging, cursors and batch updates.

Database

Hibernate

Leverages cursors.

Database

JPA

Leverages paging.

Database

iBatis

Leverages paging.

File

Flat file

Supports delimited and fixed length flat files.

File XML Uses StAX for parsing, builds on top Spring OXM, so

support tools like JAXB, XStream or Castor.

You see from table 1.2 that Spring Batch supports many technologies out-of-the-box, making it quite

versatile. We’ll study thoroughly this support in chapters 6 and 7.

Spring Batch is not limited to this support as it is also flexible at different levels: each component

provides many hooks where you can plug your own implementations. Then, if no Spring Batch’s

component fits your needs, you can implement your own read or write components, by implementing

straightforward interfaces (

ItemReader and ItemWriter, respectively). And at last, Spring Batch

does not limit you to read/write scenarios, as batch applications are also about moving files, calling

stored procedures or web services and so on. So a Spring Batch process is usually made of read/write

steps but also of more specific steps.

Spring Batch is not a scheduler!

Spring Batch drives batch processes but does not provide advanced support to launch them,

especially on a time basis. Spring Batch leaves this job to dedicated tools like schedulers (Quartz or

Cron to name of few). A scheduler usually triggers the launching of Spring Batch processes, by

accessing to the Spring Batch runtime if it can (Quartz for example, as it is Java solution) or by

launching a dedicated JVM process (Cron for example). Sometimes a scheduler launches batch

processes in sequence: first process A and then process B if A succeeded or process C if A failed. The

scheduler can use files generated by the processes or exit codes to organize the sequence. Spring

Batch is also capable to orchestrate such sequences: Spring Batch’s jobs are made of steps and the

sequence of steps can be easily configured thanks to the XML namespace or Java annotations (this is

covered in chapter 10). This is where we can say that Spring Batch and a scheduler overlap.

This ends our discovery of the core features of Spring Batch. We’re sure that you’ve been happy to

learn how Spring Batch can free you from cumbersome technical code like I/O handling to let you focus

on the business code. In the next section, we’re going to explore other Spring Batch’s features through

use cases. These features are mainly about making your batch applications more robust and scalable.

1.3 Use cases

We covered in the very first section of this chapter the specificities of batch applications: they handle

large amounts of data through automatic processing and as so, they must be very robust and reliable.

Licensed to Pedro Rodriguez <prodriguez@opnworks.com>

©Manning Publications Co. Please post comments or corrections to the Author Online forum:

http://www.manning-sandbox.com/forum.jspa?forumID=679

Spring Batch provides strong foundations for handling these specificities, especially in the way it runs

batch processes. This is what we can call the runtime side of Spring Batch features. To show you in

which situations Spring Batch can be very useful for you batch applications, we’re going to see different

runtime scenarios (transaction management, error handling and making batch application scale) and

how Spring Batch handles them.

1.3.1 Handling transactions

In read/write scenarios, Spring Batch is able to manage transactions for you. It means that any

operation on a database will be run inside a transaction. Spring Batch handles the transaction creation

and whether it should be committed (in case of success) or rolled back (in case of failure). This is a very

interesting feature because your code won’t be cluttered with transaction management. Moreover, as

Spring Batch’s transaction management builds on top Spring’s one, it supports native database

transaction, JTA, Hibernate and so on. It means you can switch from transaction management

technology to another without impact on your code.

Another benefit of letting Spring Batch drive transactions for you is that it can do so in a batch-

oriented way, which helps you handle large volumes of data. When doing a bulk of inserts in a database,

you usually don’t want to have only one transaction spanning all of them: in case of error, all the inserts

will be lost, and it forces the database to maintain a large rollback segment. You don’t want either to

have on transaction for each insert: transactions aren’t cheap and doing so can have dramatic impacts

on performances. The best strategy is usually to handle records in… batch! It means that you want to

handle 10 or 100 or 1000 records in one transaction, the number of records being called the batch size.

Doing so is not difficult, but doing it for all your read/write operations becomes quickly cumbersome,

especially when it comes to handling errors (more on this later!). For read/write scenarios, Spring Batch

allows to set a batch size, as shown in the following snippet:

<batch:chunk reader="reader" writer="writer" commit-interval="100" />

By setting the

commit-interval attribute to 100, we tell Spring Batch to ask 100 records to the

reader, open a transaction, send the records to the writer and commit the transaction. Externalizing the

batch size is very interesting as there’s no “best” value for this setting: it depends on many factors like

the writing instructions, the data, or the database. Being able to set the batch size without impact on

the code simplifies tweaking the batch processes, when doing performance tests.

Committing transaction happens on sunny days, but how Spring Batch handles errors? Errors can

have impacts on transaction but also on the whole batch process.

1.3.2 Handling errors

Batch processes handle a lot of data automatically and many things can go wrong: incorrect format in

input files, violation of database constraints, bugs and so forth. Usually, a batch process is not all-or-

nothing operation: you don’t want to stop the whole process because of a tiny error. This is one of the

specificities that can make batch applications really tricky to write: foresee errors and handle them.

Imagining reading records from a flat file before inserting them into a database: if a line doesn’t

respect the format, what should we do? By default, Spring Batch will launch an exception and stop the

process. The following snippet shows the configuration for this default behavior:

Licensed to Pedro Rodriguez <prodriguez@opnworks.com>

http://www.manning-sandbox.com/forum.jspa?forumID=679

But perhaps you can live with an incorrect line, so why not skipping it and letting the show going? If you

know about the types of exception that the reader can launch in case of incorrect data, you can ask

Spring Batch to skip the item and read the next one, as shown in the following snippet:

<chunk reader="reader" writer="writer"

commit-interval="100" skip-limit="5">

<skippable-exception-classes>

<batch:include class="o.s.b.item.file.FlatFileParseException"/>

</skippable-exception-classes>

</chunk>

The previous example implies that the reader can launch

FlatFileParseExceptions when

something cause wrong, and it this case, we want Spring Batch just to ignore the exception and keep

reading. Any other exception would make Spring Batch stop the batch process. Note the

skip-limit

attribute set to 5: it tells Spring Batch to stop the process as soon as 5 records have been skipped,

meaning you can live with incorrect lines, but not too many.

And what happens if Spring Batch had no other choice than stopping the process? A common need is

to restart the job exactly where it failed once the reason of the failure has been found and solved.

Indeed, it would be a pity to re-process millions of records when only one of the lasts couldn’t have

been processed. The good news is Spring Batch can store in a database all the information about the

batch processes it performs. We’ll see in chapter 3 what kinds of information are stored. The point is

Spring Batch can use them to restart a batch process where it exactly stopped. Storing batch execution

data is also interesting for monitoring, as you can plug a user interface on the database to browse the

data and detect any problem. We’ll study monitoring Spring Batch applications in chapter 12 and learn

more about error handling with Spring Batch in chapter 9.

We saw how Spring Batch handles robustness and reliability, now let’s see how it can help to scale

batch processes when they face performance issues.

1.3.3 Scaling batch processes

Batch processes have usually a window they can execute in, but sometimes this window gets too small

and you’ll need to make your batch processes execute faster. Spring Batch provides support for

executing your batch jobs in the same JVM process or in several JVM processes. There are also different

strategies to choose from for the execution of the steps of the batch (splitting chunk writings, execute

steps in parallel). We are not going to cover all the combinations here but only give an overview, as

chapter 13 is dedicated to the topic.

In a read/write scenario, a first strategy for scaling is to split the chunk writings. You can choose to

share the chunk writings between several threads, in the same process. This is especially useful when

you’re running on multi-core hardware. Figure 1.5 illustrates splitting chunk writings between threads.

Licensed to Pedro Rodriguez <prodriguez@opnworks.com>

剩余204页未读，继续阅读

chen1peng123

粉丝: 0
资源: 2

深入理解Spring Batch

Spring Batch批处理框架详解

Spring Batch入门精通：Job、Step与持久化机制详解

Spring框架权威指南：Spring in Action中文版

Spring Batch In Action

spring batch in action

Spring Batch in Action

Spring Batch in Action英文pdf版

Spring.Batch.in.Action.pdf

Spring Roo In Action

spring-batch-in-action:我制作的Spring批处理

最新资源