datax 抽取多表各个字段汇总
时间: 2023-07-05 21:32:12 浏览: 207
datax抽取数据到hdfs的null值变成''(引号)的问题
3星 · 编辑精心推荐
要抽取多个表的各个字段并进行汇总,可以使用 DataX 的 join 插件。具体步骤如下:
1. 在配置文件中,先分别配置好多个数据源的读取器和写入器。
2. 然后在 job 中配置 join 插件,指定要关联的表和字段。
3. 在 join 插件中,可以通过设置 join 类型、关联条件和需要输出的字段等来完成数据的汇总。
示例配置文件如下:
```json
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader1",
"parameter": {
"username": "root",
"password": "123456",
"column": ["id", "name"],
"where": "id > 100",
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.1:3306/test",
"table": ["table1"]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "123456",
"column": ["id", "name"],
"preSql": "truncate table table3",
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.1:3306/test",
"table": "table3"
}
]
}
}
},
{
"reader": {
"name": "mysqlreader2",
"parameter": {
"username": "root",
"password": "123456",
"column": ["id", "age"],
"where": "age > 18",
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.2:3306/test",
"table": ["table2"]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "123456",
"column": ["id", "age"],
"preSql": "truncate table table3",
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.1:3306/test",
"table": "table3"
}
]
}
}
},
{
"setting": {
"speed": {
"channel": 2
}
},
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "123456",
"column": ["table1.id", "table1.name", "table2.age"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.1:3306/test",
"table": ["table1"]
},
{
"jdbcUrl": "jdbc:mysql://192.168.1.2:3306/test",
"table": ["table2"]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "123456",
"column": ["id", "name", "age"],
"preSql": "truncate table table3",
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.1:3306/test",
"table": "table3"
}
]
}
},
"plugin": [
{
"name": "join",
"parameter": {
"type": "left",
"joinKey": "table1.id=table2.id",
"outputKeys": "table1.id,table1.name,table2.age"
}
}
]
}
],
"setting": {
"errorLimit": {
"record": 0,
"percentage": 0.02
}
}
}
}
```
在以上配置文件中,先分别配置了两个数据源的读取器和写入器,然后在第三个任务中配置了 join 插件来完成数据的汇总。其中 join 插件的参数包括 join 类型、关联条件和需要输出的字段等。最后将结果输出到另一个数据源中。
阅读全文