Exploring the Sequential File stage
To explore the Sequential File stage:
1. In the sample job, double-click the Sequential File stage that is named GlobalCo_billTo_flat. The
stage editor opens to the Properties tab of the Output page. All parallel job stages have properties
tabs. You use the properties tab to specify the actions that the stage performs when the job is run.
2. Look at the File property under the Source category. You use this property to specify the file that the
stage will read when the job runs. In the sample job, the File property points to a file called
GlobalCo_BillTo.csv. You specify the directory that contains this file when you run the job. The name
of the directory has been defined as a job parameter named #tutorial_direct#, the # characters show
that the name is a job parameter. Job parameters are used to so that variable information (for
example, file name or directory name) can be specified when the job runs rather than when the job is
designed.
3. Look at the First Line is Column Names property under the Options category. In the sample job,
this property is set to True because the first line of the GlobalCo_BillTo.csv file contains the names of
the columns in the file. The remaining properties have default values.
4. Click on the Format tab. The Format tab looks similar to the Properties tab, but the properties that
the job designer sets here describe the format of the flat file that the stage reads. In this case the file
is comma-delimited, which means that each field within a row is separated by a comma character.
The Format tab also specifies that the file has DOS line endings. This setting means that the file can
be read even when the file resides on a UNIX system.
5. Click the Columns tab. The Columns tab is where the column metadata for the stage is defined. The
column metadata defines the data that will flow down the link to the Data Set stage when the job
runs. The GlobalCo_BillTo.csv file contains many columns. All of these columns have the data type
VarChar. As you work through the tutorial, you will apply stricter data typing to these columns to
cleanse the data.
6. Click the View Data tab in the top right corner of the stage editor window.
7. In the Value field of the Resolve Job Parameter window, specify the name of the directory in which
the tutorial data was installed and click OK (you have to specify directory path whenever you view
data or run the job).
8. In the Data Browser window, click OK. A window opens that shows the first 100 rows of the data
that the GlobalCo_BillTo.csv file contains (100 rows is the default setting, but you can change it).
9. Click Close to close the Data Browser window.
10. Click OK to close the Sequential File stage editor.
Exploring the Data Set stage
To explore the Data Set stage:
1. In the sample job, double-click the Data Set stage that is named GlobalCoBillTo_ds. The stage editor
opens in the Properties tab of the Input page.
2. Look at the File property under the Target category. This property is used to specify the control file
for the data set that the stage will write the data to when the job runs. In the sample job, the File
property points to a file that is named GlobalCo_BillTo.ds. You specify the directory that contains this
file when you run the job. A data set is the internal format for transferring data inside parallel jobs.
Data Set stages are used to land data that will be used by another job.
3. Click on the Columns tab. The column metadata for this stage is the same as the column metadata for
the Sequential File stage and defines the data that the job will write to the data set.
4. Click OK to close the stage editor.
8 Parallel Job Primer