用datax把数据抽取到分区表
时间: 2024-05-05 22:17:26 浏览: 122
大数据采集技术-datax分库分表.pdf
5星 · 资源好评率100%
使用datax抽取数据到分区表的步骤如下:
1. 创建分区表,例如:
```sql
CREATE TABLE my_table (
id int,
name varchar(50),
phone varchar(20),
email varchar(50)
) PARTITIONED BY (dt string);
```
2. 在datax的job配置文件中,设置分区参数,例如:
```json
{
"job": {
"setting": {
"speed": {
"byte": 1048576
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "123456",
"column": [
"id",
"name",
"phone",
"email",
"dt"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/my_database",
"table": [
"my_table"
],
"querySql": ""
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "123456",
"column": [
"id",
"name",
"phone",
"email",
"dt"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/my_database",
"table": [
"my_table"
],
"preSql": "DELETE FROM my_table WHERE dt='${bizdate}';",
"postSql": ""
}
]
}
}
}
]
}
}
```
注意,需要在reader和writer的column中添加分区字段dt。
3. 在运行datax时,传递分区参数,例如:
```shell
python datax.py job.json -p "bizdate=20220101"
```
注意,分区参数的名称需要与表中定义的分区字段名称一致。
4. 确认数据已经成功抽取到分区表中,例如:
```sql
SELECT * FROM my_table WHERE dt='20220101';
```
阅读全文