Kettle3.0用户手册:ETL工具入门指南

需积分: 10 6 下载量 156 浏览量 更新于2024-07-19 收藏 10.2MB PDF 举报
"Kettle3.0用户手册" Kettle是一款强大的开源ETL(Extract, Transform, Load)工具,由Pentaho公司开发,主要用于数据的抽取、转换和加载。Kettle3.0用户手册是深圳市神盾信息技术有限公司提供的详尽指南,涵盖了从基础到高级的各种功能,帮助用户理解和操作Kettle进行数据处理。 1. **Kettle介绍** - **什么是Kettle**:Kettle是一个图形化的数据集成平台,它允许用户通过拖拽方式构建数据处理流程,包括数据清洗、转换和加载到各种数据存储系统。 - **Kettle的安装**:手册提供了安装步骤,包括下载、配置和启动Kettle的主要组件如Spoon(图形化开发环境)。 - **运行Spoon**:Spoon是Kettle的主要界面,用户可以通过它设计、测试和运行转换和任务。 - **资源库**:Kettle支持资源库管理,用于存储和版本控制转换和任务,便于团队协作。 - **资源库自动登录**:手册详细介绍了如何设置和使用资源库的自动登录功能。 2. **创建转换和任务** - **转换**:转换是Kettle中的数据处理流程,包含一系列步骤,每个步骤负责特定的数据处理任务。 - **任务**:任务是定时或触发执行的脚本,通常用于调度和监控转换。 3. **数据库连接** - **描述**:Kettle提供与多种数据库系统的连接能力,用户可以设置和管理数据库连接。 - **设置窗口**和**选项**:手册指导用户如何配置数据库连接参数,如主机名、端口、用户名和密码。 - **数据库用法**:讨论了如何在Kettle中使用这些连接进行数据操作。 4. **SQL编辑器** - **描述**:内置的SQL编辑器让用户能方便地编写和执行SQL查询,以预览或操作数据库数据。 - **局限性**:可能列出了一些编辑器的功能限制。 5. **数据库浏览器** - **描述**:这个功能允许用户浏览数据库结构,查看表、视图等对象。 - **屏幕截图**:手册可能包含实际界面的示例图片。 6. **节点连接(Hops)** - **描述**:节点连接定义了转换中步骤之间的数据流路径。 - **转换连接**和**任务连接**:分别说明了在转换和任务中如何设置和管理这些连接。 - **创建和拆分连接**,以及**连接颜色**:解释了如何操作连接以满足不同需求。 7. **变量** - **变量使用**:Kettle支持用户自定义变量,手册会展示如何声明和使用它们。 - **变量范围**:分为环境变量、Kettle变量和内部变量,各有其作用域和用法。 8. **转换设置** - **描述**:这部分内容涉及转换的全局设置,如运行时选项。 - **选项**:提供了详细的设置参数及其影响。 9. **转换步骤** - **描述**:Kettle包含大量预定义的转换步骤,用于执行各种数据处理任务。 - **运行步骤的多个副本**和**分发或复制**:讨论了如何配置步骤以并行处理数据。 - **错误处理**:提供了处理转换执行中出现错误的方法。 10. **其他功能** - 可能还包括更多高级特性,如作业(Job)、插件管理、日志和监控等。 Kettle3.0用户手册是学习和使用Kettle的宝贵资源,它详细阐述了各个方面的功能和操作,对于数据工程师和分析师来说,是实现高效数据集成和转换的重要参考资料。
2012-02-09 上传
目录 Kettle 3.0 用户手册 ...................................................................................................................... 1 Kettle 3.0 用户手册 ...................................................................................................................... 6 1. Kettle 介绍 ................................................................................................................................ 6 1.1 什么是kettle ............................................................................................................. 6 1.2 Kettle 的安装 ............................................................................................................ 6 1.3 运行Spoon ............................................................................................................... 6 1.4 资源库 ....................................................................................................................... 6 1.5 资源库自动登录 ....................................................................................................... 7 1.6 定义 ........................................................................................................................... 8 1.6.1 转换 ................................................................................................................... 8 1.6.2 任务 ................................................................................................................... 8 1.7 选项 ........................................................................................................................... 9 1.7.1 General 标签 ................................................................................................... 10 1.7.2 Look Feel 标签 ............................................................................................... 11 1.8 搜索元数据 ............................................................................................................. 12 1.9 设置环境变量 ......................................................................................................... 13 2. 创建一个转换或任务 ............................................................................................................. 13 3. 数据库连接(Database Connections) ...................................................................................... 14 3.1 描述 ......................................................................................................................... 14 3.2 设置窗口 ................................................................................................................. 14 3.3 选项 ......................................................................................................................... 14 3.4 数据库用法 ............................................................................................................. 15 4. SQL 编辑器(SQL Editor) ....................................................................................................... 16 4.1 描述 ......................................................................................................................... 16 4.2 屏幕截图 ................................................................................................................. 16 4.3 局限性 ..................................................................................................................... 16 5. 数据库浏览器(Database Explorer) ........................................................................................ 17 5.1 屏幕截图 ................................................................................................................. 17 5.2 描述 ......................................................................................................................... 17 6. 节点连接(Hops) ................................................................................................................ 18 6.1 描述 ......................................................................................................................... 18 6.2 转换连接 ................................................................................................................. 18 6.3 任务连接 ................................................................................................................. 18 6.4 屏幕截图 ................................................................................................................. 18 6.5 创建一个连接 ......................................................................................................... 19 6.6 拆分一个连接 ......................................................................................................... 19 6.7 转换连接颜色 ......................................................................................................... 19 7. 变量(Variables) .................................................................................................................. 20 技术资料,【Kette3.0 用户手册】 ©深圳市神盾信息技术有限公司,2008 第2 页/共202 页 7.1 变量使用 ................................................................................................................. 20 7.2 变量范围 ................................................................................................................. 20 7.2.1 环境变量 ......................................................................................................... 20 7.2.2 Kettle 变量 ...................................................................................................... 21 7.2.3 内部变量 ......................................................................................................... 21 8. 转换设置(Transformation Settings) .................................................................................. 22 8.1 描述 ......................................................................................................................... 22 8.2 屏幕截图 ................................................................................................................. 22 8.3 选项 ......................................................................................................................... 25 8.4 其它 ......................................................................................................................... 26 9. 转换步骤(Transformation steps) ....................................................................................... 27 9.1 描述 ......................................................................................................................... 27 9.2 运行步骤的多个副本 ............................................................................................. 27 9.3 分发或者复制 ......................................................................................................... 28 9.4 常用错误处理 ......................................................................................................... 29 9.5 Apache 虚拟文件系统(VFS)支持 .................................................................... 31 9.6 转换步骤类型 ......................................................................................................... 33 9.6.1 文本文件输入(Text Input) ........................................................................ 33 9.6.2 表输入(Table Input) ................................................................................... 45 9.6.3 获取系统信息(Get System Info) ............................................................... 47 9.6.4 生成行(Generate Rows) ............................................................................ 51 9.6.5 文件反序列化(De-serialize from file)(原来名称为Cube 输入) .......... 52 9.6.6 XBase 输入(XBase input) ......................................................................... 53 9.6.7 Excel 输入(Excel Input) ............................................................................ 54 9.6.8 XML 输入(XML input) .................................................................................. 58 9.6.9 获取文件名(Get File Names) ......................................................................... 61 9.6.10 文本文件输出(Text File Output) .................................................................... 62 9.6.11 表输出(Table output) ...................................................................................... 65 9.6.12 插入/更新(Insert/Update) ............................................................................... 68 9.6.13 更新(Update) ............................................................................................. 70 9.6.14 删除(Delete) .................................................................................................... 71 9.6.15 序列化到文件(Serialize to file)(以前是Cube Output) .................................. 72 9.6.16 XML 输出(XML output) ........................................................................... 73 9.6.17 Excel 输出(Excel Output) ............................................................................... 76 9.6.18 Access 输出(Microsoft Access Output) ..................................................... 78 9.6.19 数据库查询(Database lookup) ....................................................................... 80 9.6.20 流查询(Stream lookup) ............................................................................. 81 9.6.21 调用数据库存储过程(Call DB Procedure) .................................................... 83 9.6.22 HTTP 客户端(HTTP Cient) ............................................................................ 84 9.6.23 字段选择 (Select values) ............................................................................... 86 9.6.24 过滤记录(Filter rows) ..................................................................................... 89 9.6.25 排序记录(Sort rows) ................................................................................. 90 9.6.26 添加序列(Add sequence) ................................................................................ 91 9.6.27 空操作-什么都不做(Dummy-do nothing) ..................................................... 93 技术资料,【Kette3.0 用户手册】 ©深圳市神盾信息技术有限公司,2008 第3 页/共202 页 9.6.28 行转列(Row Normaliser) ................................................................................ 95 9.6.29 拆分字段(Split Fields)............................................................................... 97 9.6.30 去除重复记录(Unique rows) .......................................................................... 98 9.6.31 分组(Group By) ............................................................................................ 100 9.6.32 设置为空值(Null if) ...................................................................................... 101 9.6.33 计算器(Calculator) .................................................................................. 102 9.6.34 增加XML(XML Add) ............................................................................. 104 9.6.35 增加常量(Add constants) ........................................................................ 106 9.6.36 行转列(Row Denormaliser) ..................................................................... 107 9.6.37 行扁平化(Flattener) ................................................................................. 108 9.6.38 值映射(Value Mapper) ............................................................................ 110 9.6.39 被冻结的步骤(Blocking step) ................................................................. 111 9.6.40 记录关联(笛卡尔输出)(Join Rows-Cartesian Product)....................... 112 9.6.41 数据库连接(Database Join) ..................................................................... 114 9.6.42 合并记录(Merge rows) ............................................................................ 115 9.6.43 存储合并(Stored Merge) ......................................................................... 116 9.6.44 合并连接(Merge Join) .................................................................................. 117 9.6.45 JavaScript 值(JavaScript Value) ............................................................. 119 9.6.46 改进的JavaScript 值(Modified JavaScript Value) ................................ 127 9.6.47 执行SQL 语句(Execute SQL script) ...................................................... 129 9.6.48 维度更新/查询(Dimension lookup/update) ............................................ 132 9.6.49 联合更新/查询(Combination lookup/update) ......................................... 133 9.6.50 映射(Mapping) ........................................................................................ 134 9.6.51 从结果获取记录(Get rows from result) .................................................. 135 9.6.52 复制记录到结果(Copy rows to result) .................................................... 135 9.6.53 设置变量(Set Variable) ........................................................................... 136 9.6.54 获取变量(Get Variable) ........................................................................... 137 9.6.55 从以前的结果获取文件(Get files from result) ....................................... 138 9.6.56 复制文件名到结果(Set files in result) .................................................... 139 9.6.57 记录注射器(Injector) .............................................................................. 140 9.6.58 套接字读入器(Socket Reader) ................................................................ 141 9.6.59 套接字输写器(Socket Writer) ................................................................. 141 9.6.60 聚合行(Aggregate Rows) ........................................................................ 142 9.6.61 流XML 输入(Streaming XML Input) .................................................. 143 9.6.62 中止(Abort) ............................................................................................. 149 9.6.63 Oracle 批量装载(Oracle bulk loader) ...................................................... 151 10. 任务设置(Job Settings) ........................................................................................... 153 10.1 描述 ....................................................................................................................... 153 10.2 屏幕截图 ............................................................................................................... 153 10.3 选项 ....................................................................................................................... 153 10.4 其它 ....................................................................................................................... 154 11. 任务条目(Job Entries) ............................................................................................. 154 11.1 描述 ....................................................................................................................... 154 11.2 任务条目类型 ....................................................................................................... 155 技术资料,【Kette3.0 用户手册】 ©深圳市神盾信息技术有限公司,2008 第4 页/共202 页 11.2.1 特殊的任务条目 ........................................................................................... 155 11.2.2 转换 ............................................................................................................... 157 11.2.3 任务 ............................................................................................................... 159 11.2.4 Shell .............................................................................................................. 161 11.2.5 Mail ............................................................................................................... 163 11.2.6 SQL ............................................................................................................... 165 11.2.7 FTP ................................................................................................................ 166 11.2.8 Table Exists ................................................................................................... 168 11.2.9 File Exists ...................................................................................................... 169 11.2.10 Evaluation(javascript) ................................................................................... 170 11.2.11 SFTP .............................................................................................................. 171 11.2.12 HTTP ............................................................................................................. 173 11.2.13 Create file ...................................................................................................... 175 11.2.14 Delete file ...................................................................................................... 176 11.2.15 Wait for file ................................................................................................... 177 11.2.16 File compare .................................................................................................. 178 11.2.17 Put files with secureFTP ............................................................................... 180 11.2.18 Ping a host ..................................................................................................... 181 11.2.19 Wait for .......................................................................................................... 182 11.2.20 Display Msgbox info ..................................................................................... 183 11.2.21 Abort job ....................................................................................................... 184 11.2.22 XSL transformation ....................................................................................... 185 11.2.23 Zip files ......................................................................................................... 186 12. 图形界面(Graphical View) ...................................................................................... 187 12.1 描述 ....................................................................................................................... 187 12.2 添加步骤或者任务条目 ....................................................................................... 188 12.2.1 拖放创建步骤 ............................................................................................... 188 12.2.2 从步骤类型树创建步骤 ............................................................................... 188 12.2.3 在你想要的位置创建步骤 ........................................................................... 189 12.3 隐藏步骤 ............................................................................................................... 189 12.4 转换步骤选项(右键上下文菜单) ................................................................... 189 12.4.1 编辑步骤 ....................................................................................................... 189 12.4.2 编辑步骤描述 ............................................................................................... 189 12.4.3 数据迁移 ....................................................................................................... 189 12.4.4 复制 ............................................................................................................... 189 12.4.5 复制步骤 ....................................................................................................... 189 12.4.6 删除步骤 ....................................................................................................... 190 12.4.7 显示输入字段 ............................................................................................... 190 12.4.8 显示输出字段 ............................................................................................... 190 12.5 任务条目选项(右键上下文菜单) ................................................................... 190 12.5.1 打开转换/任务 .............................................................................................. 190 12.5.2 编辑任务入口 ............................................................................................... 190 12.5.3 编辑任务入口描述 ....................................................................................... 190 12.5.4 复制任务入口 ............................................................................................... 190 技术资料,【Kette3.0 用户手册】 ©深圳市神盾信息技术有限公司,2008 第5 页/共202 页 12.5.5 复制选择的任务入口到剪贴板 ................................................................... 190 12.5.6 排列/分布 ...................................................................................................... 191 12.5.7 拆开节点 ....................................................................................................... 191 12.5.8 删除所有任务入口的副本 ........................................................................... 191 12.6 添加节点连接 ....................................................................................................... 191 12.7 运行转换 ............................................................................................................... 191 12.8 屏幕截图 ............................................................................................................... 191 12.9 执行选项 ............................................................................................................... 192 12.9.1 在哪里执行 ................................................................................................... 192 12.9.2 预览 ............................................................................................................... 192 12.9.3 使用安全模式 ............................................................................................... 192 12.9.4 日志级别 ....................................................................................................... 192 12.9.5 重放日期 ....................................................................................................... 192 12.9.6 参数 ............................................................................................................... 192 12.9.7 变量 ............................................................................................................... 192 12.10 设置远程或者从属服务器 ................................................................................... 193 12.10.1 概述 ....................................................................................................... 193 12.10.2 屏幕截图 ............................................................................................... 193 13. 日志(Logging) ......................................................................................................... 193 13.1 日志描述 ............................................................................................................... 193 13.2 屏幕截图 ............................................................................................................... 194 13.3 日志网格 ............................................................................................................... 194 13.3.1 转换日志网格 ............................................................................................... 194 13.3.2 任务日志网格 ............................................................................................... 195 13.4 按钮 ....................................................................................................................... 195 13.4.1 转换按钮 ....................................................................................................... 195 13.4.2 任务按钮 ....................................................................................................... 197 14. 网格(Grids) .............................................................................................................. 198 14.1 描述 ....................................................................................................................... 198 14.2 功能 ....................................................................................................................... 198 14.3 导航 ....................................................................................................................... 199 15. 资源库浏览器(Repository Explorer) ...................................................................... 199 15.1 描述 ....................................................................................................................... 199 15.2 屏幕截图 ............................................................................................................... 200 15.3 右键单击功能 ....................................................................................................... 200 15.4 备份/资源库 .......................................................................................................... 200 16. 共享对象(Share objects) ......................................................................................... 201