DataStage8并行作业教程：官方指南

5星 · 超过95%的资源需积分: 10 90 浏览量更新于2024-08-02 收藏 1MB PDF 举报

"DataStage8 是IBM的一款强大的ETL（Extract, Transform, Load）工具，用于数据集成和数据仓库建设。本教程是官方提供的并行JOB开发指南，旨在帮助用户了解和掌握DataStage8的使用，包括如何打开和运行样本作业、查看和编译作业，以及运行和检查作业结果等基本操作。" 在DataStage8中，用户可以构建复杂的数据处理流程，将来自不同源的数据清洗、转换，并加载到目标系统中。这个教程是针对初学者的，通过一系列的模块化教学，逐步引导用户熟悉DataStage的工作环境和功能。第1章：介绍这一章通常会概述DataStage8的基本概念，其在数据集成中的作用，以及并行JOB的概念。并行JOB是DataStage的一个关键特性，它能利用多处理器或集群资源，提高数据处理速度。第2章：教程项目目标本章会明确教程的目标，可能包括理解DataStage8的工作流程、掌握作业设计和管理，以及如何实现数据的高效处理和迁移。第3章：模块1 - 打开和运行样本作业这部分详细介绍了如何启动DataStage的Designer客户端，它是进行作业设计的主要工具。然后，教程会指导用户打开提供的样本作业，以便学习和实践。首先，用户会学习如何打开和浏览作业结构，理解各个组件的功能。 Lesson1.1：打开样本作业这部分重点是熟悉Designer界面，以及如何找到和打开示例作业。用户将学习如何导航和理解作业的拓扑视图。 Lesson1.2：查看和编译样本作业接下来，用户将深入到作业的细节，了解Sequential File stage（顺序文件阶段）和DataSet stage（数据集阶段）这两个重要的数据处理组件。然后，教程会演示如何编译作业，确保所有组件都正确无误。 Lesson1.3：运行样本作业最后，用户将学习如何执行作业，并查看运行结果。运行作业后，会讲解如何检查输出数据集，验证作业是否按预期工作。每个课后都有一个“Lesson checkpoint”，这可能是对所学内容的小结，让用户确认他们已经掌握了这些基本技能。通过这个教程，用户不仅能够熟悉DataStage8的工作环境，还能获得实际操作的经验，为进一步深入学习和应用DataStage8打下坚实的基础。教程后续章节可能会涉及更多高级主题，如错误处理、调度、性能优化以及与其他IBM WebSphere产品如Information Server的集成等。

sample job

design area

repository tree

palette

Lesson checkpoint

In this lesson, you opened your first job.

You learned the following tasks:

v How to start the Designer client

v How to open a job

v Where to find the tutorial objects in the repository tree

Lesson 1.2: Viewing and compiling the sample job

In this lesson, you view the sample job to understand its components. You compile the job to prepare it

to run on your system.

The sample job has a Sequential File stage to read data from the flat file and a Data Set stage to write

data to the staging area. The two stages are joined by a link. The data that will flow between the two

stages on the link was defined when the job was designed. When the job is run, the data will flow down

this link.

Chapter 3. Module 1: Opening and running the sample job 7

Exploring the Sequential File stage

To explore the Sequential File stage:

1. In the sample job, double-click the Sequential File stage that is named GlobalCo_billTo_flat. The

stage editor opens to the Properties tab of the Output page. All parallel job stages have properties

tabs. You use the properties tab to specify the actions that the stage performs when the job is run.

2. Look at the File property under the Source category. You use this property to specify the file that the

stage will read when the job runs. In the sample job, the File property points to a file called

GlobalCo_BillTo.csv. You specify the directory that contains this file when you run the job. The name

of the directory has been defined as a job parameter named #tutorial_direct#, the # characters show

that the name is a job parameter. Job parameters are used to so that variable information (for

example, file name or directory name) can be specified when the job runs rather than when the job is

designed.

3. Look at the First Line is Column Names property under the Options category. In the sample job,

this property is set to True because the first line of the GlobalCo_BillTo.csv file contains the names of

the columns in the file. The remaining properties have default values.

4. Click on the Format tab. The Format tab looks similar to the Properties tab, but the properties that

the job designer sets here describe the format of the flat file that the stage reads. In this case the file

is comma-delimited, which means that each field within a row is separated by a comma character.

The Format tab also specifies that the file has DOS line endings. This setting means that the file can

be read even when the file resides on a UNIX system.

5. Click the Columns tab. The Columns tab is where the column metadata for the stage is defined. The

column metadata defines the data that will flow down the link to the Data Set stage when the job

runs. The GlobalCo_BillTo.csv file contains many columns. All of these columns have the data type

VarChar. As you work through the tutorial, you will apply stricter data typing to these columns to

cleanse the data.

6. Click the View Data tab in the top right corner of the stage editor window.

7. In the Value field of the Resolve Job Parameter window, specify the name of the directory in which

the tutorial data was installed and click OK (you have to specify directory path whenever you view

data or run the job).

8. In the Data Browser window, click OK. A window opens that shows the first 100 rows of the data

that the GlobalCo_BillTo.csv file contains (100 rows is the default setting, but you can change it).

9. Click Close to close the Data Browser window.

10. Click OK to close the Sequential File stage editor.

Exploring the Data Set stage

To explore the Data Set stage:

1. In the sample job, double-click the Data Set stage that is named GlobalCoBillTo_ds. The stage editor

opens in the Properties tab of the Input page.

2. Look at the File property under the Target category. This property is used to specify the control file

for the data set that the stage will write the data to when the job runs. In the sample job, the File

property points to a file that is named GlobalCo_BillTo.ds. You specify the directory that contains this

file when you run the job. A data set is the internal format for transferring data inside parallel jobs.

Data Set stages are used to land data that will be used by another job.

3. Click on the Columns tab. The column metadata for this stage is the same as the column metadata for

the Sequential File stage and defines the data that the job will write to the data set.

4. Click OK to close the stage editor.

8 Parallel Job Primer

The Data Set stage editor does not have a Format tab because the data set does not require any

formatting data. Although the View Data button is available on this tab, there is no data for this stage

yet. If you click the View Data button, you will receive a message that no data exists. The data gets

created when the job runs.

Compiling the sample job

To compile the sample job:

1. Select File → Compile. The Compile Job window opens. As the job is compiled, the window is

updated with messages from the compiler.

2. When the Compile Job window displays a message that the job is compiled, click OK.

The sample job is now compiled and ready to run.

Lesson checkpoint

In this lesson, you explored a simple data extraction job that reads data from a file and writes it to a

staging area.

You learned the following tasks:

v How to open stage editors

v How to view the data that a stage represents

v How to compile a job so that it is ready to run

Lesson 1.3: Running the sample job

In this lesson, you use the Director client to run the sample job and to view the log that the job produces

as it runs. You also use the Designer client to look at the data set that is written by the sample job.

You run the job from the Director client. The Director client is the operating console. You use the Director

client to run and troubleshoot jobs that you are developing in the Designer client. You also use the

Director client to run fully developed jobs in the production environment.

You use the job log to debug any errors you receive when you run the job.

Running the job

To run the job:

1. In the Designer client, select Tools → Run Director. Because you are logged in to the tutorial project

through the Designer client, you do not need to start the Director from the start menu and log on to

the project. In the Director client, the sample job has a status of compiled, which means that the job is

ready to run.

Chapter 3. Module 1: Opening and running the sample job 9

剩余75页未读，继续阅读

dulingqiang

粉丝: 0
资源: 4

DataStage8并行作业教程：官方指南

datastage 8 tutorial(中文)

DataStage8+官方培训文档-e

DataStage EssentialsV8

DataStage

Datastage 8 的监控工具使用指南

Datastage 8 并行作业教程(中文版)

DATASTAGE——DATASTAGE经验积累

DataStage精华

DataStage文档

DataStage用法

最新资源