为什么sqoop数据导入hive时会产生一个.java的文件?
时间: 2023-05-30 19:04:54 浏览: 351
Sqoop是一个数据传输工具,可以将关系型数据库中的数据导入到Hadoop生态系统中的其他存储系统,如Hive。当Sqoop将数据导入到Hive时,它会生成一个Java文件。这个Java文件包含了Hive表的DDL(数据定义语言),以及将数据插入到Hive表中的Java代码。这个Java文件可以用于在Hadoop集群上运行Sqoop任务,也可以手动修改为更适合特定需求的代码。
生成Java文件的目的是为了使Sqoop任务更加灵活和可定制化。Sqoop可以自动生成Java代码,但是如果需要进行更高级的操作,例如自定义映射器或减少导入过程中的数据丢失,就需要手动修改生成的Java代码。通过生成Java文件,Sqoop提供了一种简单且可定制的方法,让用户可以更好地控制导入过程。
相关问题
[root@zhaosai conf]# sqoop import --connect jdbc:mysql://zhaosai:3306/mydb --username root --password jqe6b6 --table news --target-dir /user/news --fields-terminated-by “;” --hive-import --hive-table news -m 1 Warning: /opt/programs/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /opt/programs/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/programs/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /opt/programs/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 23/06/10 16:07:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 23/06/10 16:07:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 23/06/10 16:07:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 23/06/10 16:07:15 INFO tool.CodeGenTool: Beginning code generation 23/06/10 16:07:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
这段代码是使用 Sqoop 工具将 MySQL 数据库中的 news 表导入到 Hadoop 的 HDFS 中,并且使用 Hive 表示,并且在导入过程中使用了分隔符“;”。但是在执行过程中出现了一个异常,提示无法加载数据库驱动类 com.mysql.jdbc.Driver。可能是因为没有将 mysql-connector-java.jar 包添加到 Sqoop 的 classpath 中导致的。需要将该 jar 包添加到 Sqoop 的 lib 目录下,并重新执行该命令。
root@zhaosai conf]# sqoop import --connect jdbc:mysql://192.168.160.130:3306/mydb --username root -P --table news --hive-import --hive-table mydb.news --incremental append --check-column id --last-value 0 --split-by id --target-dir /hdfs://zhaosai:9000/user/hive/warehouse/news --num-mappers 1 23/06/07 17:23:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 23/06/07 17:24:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 23/06/07 17:24:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 23/06/07 17:24:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 23/06/07 17:24:04 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 23/06/07 17:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `news` AS t LIMIT 1 23/06/07 17:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `news` AS t LIMIT 1 23/06/07 17:24:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/programs/hadoop-2.7.6 注: /tmp/sqoop-root/compile/b07035b094b6ac39b87f2ef11261c934/news.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 23/06/07 17:24:05 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b07035b094b6ac39b87f2ef11261c934/news.jar 23/06/07 17:24:05 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`id`) FROM `news` 23/06/07 17:24:05 ERROR tool.ImportTool: Import failed: java.io.IOException: java.sql.SQLSyntaxErrorException: Unknown column 'id' in 'field list' at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:322) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:511) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.sql.SQLSyntaxErrorException: Unknown column 'id' in 'field list' at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.StatementImpl.executeQuery(StatementImpl.java:1200) at org.apache.sqoop.tool.ImportTool.getMaxColumnId(ImportTool.java:238) at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:309)
根据你提供的信息,这是一个使用Sqoop从MySQL导入数据到Hive的命令。但是,在执行命令时出现了错误。错误信息是“Unknown column 'id' in 'field list'”,意思是在字段列表中找不到名为“id”的列。这可能是因为在MySQL表中不存在“id”列,或者“id”列的名称不正确。您需要检查MySQL表中是否存在该列,并确保在Sqoop命令中正确指定了列名。
阅读全文