阿里终面:大数据量下10亿数据MySQL高效插入策略

需积分: 0 1 下载量 103 浏览量 更新于2024-08-03 收藏 589KB PDF 举报
在"阿里终面:10亿数据如何快速插入MySQL?"这篇文档中,主要探讨了在大数据场景下如何高效地将10亿级数据快速插入MySQL数据库的问题。MySQL作为关系型数据库系统,其索引结构对性能有着显著影响。文章首先提到,由于MySQL的B+树索引设计,单表建议存储上限为2000万行,这是因为超过此数量会导致索引深度增加,从而降低插入和查询性能。 B+树的叶子节点每页大小为16KB,适合存储1KB大小的数据,这意味着每个叶子节点可以存储大约16条记录。非叶子节点存储的是主键和指向叶子节点的指针,这限制了每个非叶子节点能够关联的叶子节点数量。通过计算,当数据量超过2000万,B+树的层数会从3层升至4层,严重影响性能。 为了优化大量数据的插入,文档提出了以下策略: 1. 批量写入:单条插入效率低,可以通过批量插入来提高性能。比如,可以将数据分为100个批次,每批100条数据进行写入,利用InnoDB存储引擎的事务支持,确保批量写入的原子性。 2. 顺序插入:按照主键ID的顺序进行插入,能最大化利用索引性能。而非主键索引插入可能导致随机访问,影响插入速度。 3. 并发控制:并发写入同一张表可能会导致数据乱序,因此不建议这样做。可以通过增加批量插入的阈值来提升并发度,而不是并发写入单表。 4. 存储引擎选择:MyISAM虽然插入性能优于InnoDB,但缺乏事务支持,不适合大规模批量插入。因此,在追求高性能的同时,应权衡事务保障的需求。 5. 索引管理:尽量避免在表创建后频繁创建或调整非主键索引,以保持插入性能。 总结来说,本文针对10亿数据快速插入MySQL的挑战,提供了基于批量操作、顺序插入和合理配置存储引擎的解决方案,强调了在大数据处理时对数据库性能优化的重要性。

2023-06-06 18:10:33,041 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2023-06-06 18:10:33,075 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2023-06-06 18:10:33,218 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 2023-06-06 18:10:33,218 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2023-06-06 18:10:33,782 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_log` AS t LIMIT 1 2023-06-06 18:10:33,825 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_log` AS t LIMIT 1 2023-06-06 18:10:33,834 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/module/hadoop-3.1.4 注: /tmp/sqoop-root/compile/5f4cfb16d119de74d33f1a0d776d5ae0/user_log.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 2023-06-06 18:10:35,111 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5f4cfb16d119de74d33f1a0d776d5ae0/user_log.jar 2023-06-06 18:10:35,125 WARN manager.MySQLManager: It looks like you are importing from mysql. 2023-06-06 18:10:35,126 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 2023-06-06 18:10:35,126 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 2023-06-06 18:10:35,126 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 2023-06-06 18:10:35,130 ERROR tool.ImportTool: Import failed: No primary key could be found for table user_log. Please specify one with --split-by or perform a sequential import with '-m 1'.

2023-06-07 上传
2023-06-07 上传

[root@QAQ ~]# sudo tail -n 50 /var/log/mysql/error.log 2023-07-14T02:43:01.816556Z 0 [Note] Shutting down plugin 'partition' 2023-07-14T02:43:01.816558Z 0 [Note] Shutting down plugin 'BLACKHOLE' 2023-07-14T02:43:01.816566Z 0 [Note] Shutting down plugin 'ARCHIVE' 2023-07-14T02:43:01.816568Z 0 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA' 2023-07-14T02:43:01.816594Z 0 [Note] Shutting down plugin 'MRG_MYISAM' 2023-07-14T02:43:01.816597Z 0 [Note] Shutting down plugin 'MyISAM' 2023-07-14T02:43:01.816605Z 0 [Note] Shutting down plugin 'INNODB_SYS_VIRTUAL' 2023-07-14T02:43:01.816608Z 0 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES' 2023-07-14T02:43:01.816610Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES' 2023-07-14T02:43:01.816612Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS' 2023-07-14T02:43:01.816614Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN' 2023-07-14T02:43:01.816616Z 0 [Note] Shutting down plugin 'INNODB_SYS_FIELDS' 2023-07-14T02:43:01.816618Z 0 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS' 2023-07-14T02:43:01.816620Z 0 [Note] Shutting down plugin 'INNODB_SYS_INDEXES' 2023-07-14T02:43:01.816622Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS' 2023-07-14T02:43:01.816624Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLES' 2023-07-14T02:43:01.816626Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE' 2023-07-14T02:43:01.816628Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE' 2023-07-14T02:43:01.816630Z 0 [Note] Shutting down plugin 'INNODB_FT_CONFIG' 2023-07-14T02:43:01.816632Z 0 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED' 2023-07-14T02:43:01.816634Z 0 [Note] Shutting down plugin 'INNODB_FT_DELETED' 2023-07-14T02:43:01.816636Z 0 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD' 2023-07-14T02:43:01.816638Z 0 [Note] Shutting down plugin 'INNODB_METRICS' 2023-07-14T02:43:01.816640Z 0 [Note] Shutting down plugin 'INNODB_TEMP_TABLE_INFO' 2023-07-14T02:43:01.816642Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS' 2023-07-14T02:43:01.816644Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU' 2023-07-14T02:43:01.816645Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE' 2023-07-14T02:43:01.816647Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET' 2023-07-14T02:43:01.816649Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX' 2023-07-14T02:43:01.816651Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET' 2023-07-14T02:43:01.816653Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM' 2023-07-14T02:43:01.816655Z 0 [Note] Shutting down plugin 'INNODB_CMP_RESET' 2023-07-14T02:43:01.816657Z 0 [Note] Shutting down plugin 'INNODB_CMP' 2023-07-14T02:43:01.816659Z 0 [Note] Shutting down plugin 'INNODB_LOCK_WAITS' 2023-07-14T02:43:01.816661Z 0 [Note] Shutting down plugin 'INNODB_LOCKS' 2023-07-14T02:43:01.816664Z 0 [Note] Shutting down plugin 'INNODB_TRX' 2023-07-14T02:43:01.816666Z 0 [Note] Shutting down plugin 'InnoDB' 2023-07-14T02:43:01.816703Z 0 [Note] InnoDB: FTS optimize thread exiting. 2023-07-14T02:43:01.816748Z 0 [Note] InnoDB: Starting shutdown... 2023-07-14T02:43:01.916861Z 0 [Note] InnoDB: Dumping buffer pool(s) to /www/server/data/ib_buffer_pool 2023-07-14T02:43:01.917024Z 0 [Note] InnoDB: Buffer pool(s) dump completed at 230714 10:43:01 2023-07-14T02:43:03.438469Z 0 [Note] InnoDB: Shutdown completed; log sequence number 2767449 2023-07-14T02:43:03.439797Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2023-07-14T02:43:03.439814Z 0 [Note] Shutting down plugin 'MEMORY' 2023-07-14T02:43:03.439819Z 0 [Note] Shutting down plugin 'CSV' 2023-07-14T02:43:03.439823Z 0 [Note] Shutting down plugin 'sha256_password' 2023-07-14T02:43:03.439825Z 0 [Note] Shutting down plugin 'mysql_native_password' 2023-07-14T02:43:03.439939Z 0 [Note] Shutting down plugin 'binlog' 2023-07-14T02:43:03.440752Z 0 [Note] /www/server/mysql/bin/mysqld: Shutdown complete

2023-07-15 上传

2023-07-14T05:18:42.813472Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.33) starting as process 30413 2023-07-14T05:18:42.827066Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: e64114ff-2205-11ee-870e-080027f67bef. 2023-07-14T05:18:42.831208Z 1 [System] [MY-011012] [Server] Starting upgrade of data directory. 2023-07-14T05:18:42.831274Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-07-14T05:18:43.035632Z 1 [ERROR] [MY-013090] [InnoDB] Unsupported redo log format (v0). The redo log was created before MySQL 5.7.9 2023-07-14T05:18:43.035675Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error. 2023-07-14T05:18:43.468804Z 1 [ERROR] [MY-011013] [Server] Failed to initialize DD Storage Engine. 2023-07-14T05:18:43.469038Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed. 2023-07-14T05:18:43.469053Z 0 [ERROR] [MY-010119] [Server] Aborting 2023-07-14T05:18:43.469908Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.33) MySQL Community Server - GPL. 2023-07-14T05:25:59.141609Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.33) starting as process 6178 2023-07-14T05:25:59.156401Z 1 [System] [MY-011012] [Server] Starting upgrade of data directory. 2023-07-14T05:25:59.156446Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-07-14T05:25:59.356548Z 1 [ERROR] [MY-013090] [InnoDB] Unsupported redo log format (v0). The redo log was created before MySQL 5.7.9 2023-07-14T05:25:59.356610Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error. 2023-07-14T05:25:59.793224Z 1 [ERROR] [MY-011013] [Server] Failed to initialize DD Storage Engine. 2023-07-14T05:25:59.793509Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed. 2023-07-14T05:25:59.793525Z 0 [ERROR] [MY-010119] [Server] Aborting @

2023-07-15 上传