Lin Hong's TECH Blog! 刀不磨要生锈,人不学习要落后 - Thinking ahead

MySQL 8.0 Study 006 Tips

2023-01-30

MySQL 8.0 Study 006 Tips

学习系列

  • MySQL Using temporary
  • MySQL 分区表

MySQL Using temporary

Using temporary表示由于排序没有走索引、使用union、子查询连接查询,group_concat()或count(distinct)表达式的求值等等创建了一个内部临时表。

注意这里的临时表可能是内存上的临时表,也有可能是硬盘上的临时表,理所当然基于内存的临时表的时间消耗肯定要比基于硬盘的临时表的实际消耗小。

但不是说多大临时数据都可以直接存在内存的临时表,而是当超过最大内存临时表的最大容量就是转为存入磁盘临时表

当mysql需要创建临时表时,选择内存临时表还是硬盘临时表取决于参数tmp_table_sizemax_heap_table_size,当所需临时表的容量大于两者的最小值时,mysql就会使用硬盘临时表存放数据。

用户可以在mysql的配置文件里修改该两个参数的值,两者的默认值均为16M。

mysql [localhost:8032] {msandbox} (test) > show global variables like 'max_heap_table_size';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| max_heap_table_size | 16777216 |
+---------------------+----------+
1 row in set (0.01 sec)

mysql [localhost:8032] {msandbox} (test) > show global variables like 'tmp_table_size';
+----------------+----------+
| Variable_name  | Value    |
+----------------+----------+
| tmp_table_size | 16777216 |
+----------------+----------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > explain select job,count(*) from emp group by job;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

mysql 的 sql_mode 开启了 ONLY_FULL_GROUP_BY 模式

该模式的含义就是: 对于group by聚合操作,如果在select中的列,没有在group by中出现,那么这个sql是不合法的,因为列不在group by从句中。

group by 优化:

1.分组字段加索引
2.order by null 不排序(如果是已经走了索引,或者说8.0的版本,那都不需要加 order by null,因为上面也说了8.0默认就是不排序的了)
3.尽量使用内存临时表(内存临时表的大小是有限制的,mysql 中 tmp_table_size 代表的就是内存临时表的大小,默认是 16M->适当调整)
4.SQL_BIG_RESULT

SQL_BIG_RESULT使用示例:

mysql [localhost:8032] {msandbox} (test) > explain select sql_big_result job,count(*) from emp group by job;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

order by 优化

1.排序字段加索引
2.只select需要的字段
  - 查询的字段较多可能导致数据会超出sort_buffer的容量,超出之后就需要用到磁盘临时文件,排序的性能会很差
  - 当select的字段大小总和>max_length_for_sort_data,排序算法会将 全字段排序 改为 rowid排序 增加一次回表查询
3.尝试提高 sort_buffer_size
4.尝试提高 max_length_for_sort_data
  - 如果设的太高,数据总容量超出sort_buffer_size的概率就增大,超出之后就需要用到磁盘临时文件,排序的性能会很差
mysql [localhost:8032] {msandbox} (test) > explain select * from sbtest1 where k<200000 order by c limit 10;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | range | k_1           | k_1  | 4       | NULL |   29 |   100.00 | Using index condition; Using filesort |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

说明:

Extra 这个字段的Using index condition 表示该查询走了索引,但需要回表查询

Extra 这个字段的 Using filesort 表示使用了内部排序

一般出现Using filesort 也是我们需要考虑优化的点。

Using filesort: 表示没有走索引排序,而是走了内部排序,这时MySQL会给每个线程分配一块内存用于排序,称为sort_buffer。

实际上,sort_buffer的大小是由一个参数控制的:sort_buffer_size。

如果要排序的数据小于sort_buffer_size,排序在sort_buffer 内存中完成,如果要排序的数据大于sort_buffer_size,则借助磁盘文件来进行排序

我们可以通过下面的方法来确定一个排序语句是否使用了临时文件。

/* 打开optimizer_trace,只对本线程有效 */
SET optimizer_trace='enabled=on'; 

/* 执行语句 */
select * from sbtest1 where k<200000 order by c limit 10;

/* 查看 OPTIMIZER_TRACE 输出 */
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`;


number_of_tmp_files 表示使用来排序的磁盘临时文件数。

如果number_of_tmp_files>0,则表示使用了磁盘文件来进行排序。

rowid排序

就是只把查询SQL需要用于排序的字段和主键id,放到sort_buffer中。

什么情况走全字段排序,什么情况下走rowid排序?

它们的切换通过一个参数控制的这个参数就是max_length_for_sort_data 它表示MySQL用于排序行数据的长度的一个参数,如果单行的长度超过这个值,MySQL 就认为单行太大,就换rowid 排序。

全字段排序:如果sort_buffer内存足够,那效率是最高的,但如果sort_buffer内存不够的话,就需要用到磁盘临时文件,排序的性能会很差。
rowid排序:虽然sort_buffer可以放更多数据了,相对于全字段排序而言,rowid排序会多一次回表查询。

MySQL的一个设计思想:如果内存够,就要多利用内存,尽量减少磁盘访问。

MySQL 分区表

MySQL在创建表的时候可以通过使用PARTITION BY子句定义每个分区存放的数据。

在执行查询的时候,优化器根据分区定义过滤那些没有我们需要的数据的分区,这样查询就可以无需扫描所有分区,只需要查找包含需要数据的分区即可。

分区的另一个目的是将数据按照一个较粗的粒度分别存放在不同的表中。

这样做可以将相关的数据存放在一起,另外,当我们想要一次批量删除整个分区的数据也会变得很方便。

下面简单介绍下四种常见的分区类型:

  • RANGE分区:最为常用,基于属于一个给定连续区间的列值,把多行分配给分区。最常见的是基于时间字段。
  • LIST分区:LIST分区和RANGE分区类似,区别在于LIST是枚举值列表的集合,RANGE是连续的区间值的集合。
  • HASH分区:基于用户定义的表达式的返回值来进行选择的分区,该表达式使用将要插入到表中的这些行的列值进行计算。这个函数可以包含MySQL中有效的、产生非负整数值的任何表达式。
  • KEY分区:类似于按HASH分区,区别在于KEY分区只支持计算一列或多列,且MySQL服务器提供其自身的哈希函数。必须有一列或多列包含整数值。

上述四种分区类型中,RANGE分区即范围分区是最常用的。RANGE分区的特点是多个分区的范围要连续,但是不能重叠,默认情况下使用VALUES LESS THAN属性,即每个分区不包括指定的那个值。

示例:

CREATE TABLE members (
    firstname VARCHAR(25) NOT NULL,
    lastname VARCHAR(25) NOT NULL,
    username VARCHAR(16) NOT NULL,
    email VARCHAR(35),
    joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
    PARTITION p0 VALUES LESS THAN (1960),
    PARTITION p1 VALUES LESS THAN (1970),
    PARTITION p2 VALUES LESS THAN (1980),
    PARTITION p3 VALUES LESS THAN (1990),
    PARTITION p4 VALUES LESS THAN MAXVALUE
);

INSERT INTO members VALUES
('a','b','ab','ab@bigdatalyn.com','1959-11-30'),
('b','c','bc','bc@bigdatalyn.com','1959-12-30'),
('c','d','cd','cd@bigdatalyn.com','1960-10-30'),
('d','e','de','de@bigdatalyn.com','1971-12-30'),
('e','f','ef','ef@bigdatalyn.com','1985-1-30'),
('f','g','fg','fg@bigdatalyn.com','1999-2-1');

select * from members;

-- 查看某个分区的数据
select * from members partition(p1);
select * from members partition(p2);

-- 删除分区
alter table members drop partition p4;
-- 删除分区的同时也把partition的数据一起删除了

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p4);
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| f         | g        | fg       | fg@bigdatalyn.com | 1999-02-01 |
+-----------+----------+----------+-------------------+------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > alter table members drop partition p4;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (test) > 
mysql [localhost:8032] {msandbox} (test) > select * from members;
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| a         | b        | ab       | ab@bigdatalyn.com | 1959-11-30 |
| b         | c        | bc       | bc@bigdatalyn.com | 1959-12-30 |
| c         | d        | cd       | cd@bigdatalyn.com | 1960-10-30 |
| d         | e        | de       | de@bigdatalyn.com | 1971-12-30 |
| e         | f        | ef       | ef@bigdatalyn.com | 1985-01-30 |
+-----------+----------+----------+-------------------+------------+
5 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 


-- 增加分区
alter table members add partition(
  PARTITION p4 VALUES LESS THAN (2000),
  PARTITION p4 VALUES LESS THAN MAXVALUE
);

-- 合并分区
alter table members reorganize partition p0,p1 into(
  partition p1 values less than (1970)
);

mysql [localhost:8032] {msandbox} (test) > alter table members reorganize partition p0,p1 into(
    ->   partition p1 values less than (1970)
    -> );
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p1);
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| a         | b        | ab       | ab@bigdatalyn.com | 1959-11-30 |
| b         | c        | bc       | bc@bigdatalyn.com | 1959-12-30 |
| c         | d        | cd       | cd@bigdatalyn.com | 1960-10-30 |
+-----------+----------+----------+-------------------+------------+
3 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

-- 拆分分区
alter table members reorganize partition p1 into(
  partition p0 values less than (1960),
  partition p1 values less than (1970)
);

mysql [localhost:8032] {msandbox} (test) > alter table members reorganize partition p1 into(
    ->   partition p0 values less than (1960),
    ->   partition p1 values less than (1970)
    -> );
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p0);
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| a         | b        | ab       | ab@bigdatalyn.com | 1959-11-30 |
| b         | c        | bc       | bc@bigdatalyn.com | 1959-12-30 |
+-----------+----------+----------+-------------------+------------+
2 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p1);
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| c         | d        | cd       | cd@bigdatalyn.com | 1960-10-30 |
+-----------+----------+----------+-------------------+------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

-- 清空分区数据

alter table members truncate partition(p1);

+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| c         | d        | cd       | cd@bigdatalyn.com | 1960-10-30 |
+-----------+----------+----------+-------------------+------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > alter table members truncate partition p1;
Query OK, 0 rows affected (0.05 sec)

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p1);
Empty set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

-- 交换分区
-- 创建相同字段的表

CREATE TABLE members_archive (
    firstname VARCHAR(25) NOT NULL,
    lastname VARCHAR(25) NOT NULL,
    username VARCHAR(16) NOT NULL,
    email VARCHAR(35),
    joined DATE NOT NULL
);

select * from members partition(p0);

alter table members exchange partition p0 with table members_archive;

mysql [localhost:8032] {msandbox} (test) > select * from members partition(p0);
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| a         | b        | ab       | ab@bigdatalyn.com | 1959-11-30 |
| b         | c        | bc       | bc@bigdatalyn.com | 1959-12-30 |
+-----------+----------+----------+-------------------+------------+
2 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > CREATE TABLE members_archive (
    ->     firstname VARCHAR(25) NOT NULL,
    ->     lastname VARCHAR(25) NOT NULL,
    ->     username VARCHAR(16) NOT NULL,
    ->     email VARCHAR(35),
    ->     joined DATE NOT NULL
    -> );
Query OK, 0 rows affected (0.05 sec)

mysql [localhost:8032] {msandbox} (test) > alter table members exchange partition p0 with table members_archive;
Query OK, 0 rows affected (0.06 sec)

mysql [localhost:8032] {msandbox} (test) > 
mysql [localhost:8032] {msandbox} (test) > select * from members partition(p0);
Empty set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > select * from members_archive;
+-----------+----------+----------+-------------------+------------+
| firstname | lastname | username | email             | joined     |
+-----------+----------+----------+-------------------+------------+
| a         | b        | ab       | ab@bigdatalyn.com | 1959-11-30 |
| b         | c        | bc       | bc@bigdatalyn.com | 1959-12-30 |
+-----------+----------+----------+-------------------+------------+
2 rows in set (0.01 sec)

mysql [localhost:8032] {msandbox} (test) > 
分区字段必须是整数类型或解析为整数的表达式。
分区字段建议设置为NOT NULL,若某行数据分区字段为null,在RANGE分区中,该行数据会划分到最小的分区里。
MySQL分区中如果存在主键或唯一键,则分区列必须包含在其中。
Innodb分区表不支持外键。
更改sql_mode模式可能影响分区表的表现。
分区表不影响自增列。

Partition 数据字典:

mysql [localhost:8032] {msandbox} (test) > desc information_schema.partitions;
+-------------------------------+-----------------+------+-----+---------+-------+
| Field                         | Type            | Null | Key | Default | Extra |
+-------------------------------+-----------------+------+-----+---------+-------+
| TABLE_CATALOG                 | varchar(64)     | NO   |     | NULL    |       |
| TABLE_SCHEMA                  | varchar(64)     | NO   |     | NULL    |       |
| TABLE_NAME                    | varchar(64)     | NO   |     | NULL    |       |
| PARTITION_NAME                | varchar(64)     | YES  |     | NULL    |       |
| SUBPARTITION_NAME             | varchar(64)     | YES  |     | NULL    |       |
| PARTITION_ORDINAL_POSITION    | int unsigned    | YES  |     | NULL    |       |
| SUBPARTITION_ORDINAL_POSITION | int unsigned    | YES  |     | NULL    |       |
| PARTITION_METHOD              | varchar(13)     | YES  |     | NULL    |       |
| SUBPARTITION_METHOD           | varchar(13)     | YES  |     | NULL    |       |
| PARTITION_EXPRESSION          | varchar(2048)   | YES  |     | NULL    |       |
| SUBPARTITION_EXPRESSION       | varchar(2048)   | YES  |     | NULL    |       |
| PARTITION_DESCRIPTION         | text            | YES  |     | NULL    |       |
| TABLE_ROWS                    | bigint unsigned | YES  |     | NULL    |       |
| AVG_ROW_LENGTH                | bigint unsigned | YES  |     | NULL    |       |
| DATA_LENGTH                   | bigint unsigned | YES  |     | NULL    |       |
| MAX_DATA_LENGTH               | bigint unsigned | YES  |     | NULL    |       |
| INDEX_LENGTH                  | bigint unsigned | YES  |     | NULL    |       |
| DATA_FREE                     | bigint unsigned | YES  |     | NULL    |       |
| CREATE_TIME                   | timestamp       | NO   |     | NULL    |       |
| UPDATE_TIME                   | datetime        | YES  |     | NULL    |       |
| CHECK_TIME                    | datetime        | YES  |     | NULL    |       |
| CHECKSUM                      | bigint          | YES  |     | NULL    |       |
| PARTITION_COMMENT             | text            | NO   |     | NULL    |       |
| NODEGROUP                     | varchar(256)    | YES  |     | NULL    |       |
| TABLESPACE_NAME               | varchar(268)    | YES  |     | NULL    |       |
+-------------------------------+-----------------+------+-----+---------+-------+
25 rows in set (0.01 sec)

mysql [localhost:8032] {msandbox} (test) > 

MySQL 分区表应用

应用: 把单表按照时间转为分区表(按照时间年份去做分区表)

准备数据

2010-2023之间测试数据

注意下面结束时间和开始时间点2023-01-31 23:59:59/2010-01-01 00:00:00

并且10000行提交一次

  SET v_logdate = FROM_UNIXTIME(FLOOR((@s := UNIX_TIMESTAMP('2023-01-31 23:59:59')) + RAND() * (UNIX_TIMESTAMP('2010-01-01 00:00:00') - @s + 1)));
drop table t_log;
drop procedure add_t_log;

create table t_log(
	id int unsigned primary key auto_increment,
	user_no int unsigned not null default 0,
	user_name varchar(20) not null default "",
	logdate datetime not null
);

DELIMITER //
create PROCEDURE add_t_log(in num INT)
BEGIN
DECLARE rowid INT DEFAULT 0;
DECLARE v_user_no INT DEFAULT 0;
DECLARE v_user_name VARCHAR(20);
DECLARE v_logdate datetime;

WHILE rowid < num DO
  SET v_user_no = FLOOR(10000 + RAND() * 10000);
  SET v_user_name = SUBSTRING(md5(rand()),1,15); 
  SET v_logdate = FROM_UNIXTIME(FLOOR((@s := UNIX_TIMESTAMP('2023-01-31 23:59:59')) + RAND() * (UNIX_TIMESTAMP('2010-01-01 00:00:00') - @s + 1)));
  SET rowid = rowid + 1;
  insert INTO t_log(user_no,user_name,logdate) VALUES (v_user_no,v_user_name,v_logdate);
  IF MOD(rowid,10000)=0 THEN 
  commit;
END IF;

END WHILE;
END //
DELIMITER ;

set global innodb_flush_log_at_trx_commit=2;
call add_t_log(120000);

select min(logdate),max(logdate),count(*) from t_log;

+---------------------+---------------------+----------+
| min(logdate)        | max(logdate)        | count(*) |
+---------------------+---------------------+----------+
| 2010-01-01 01:34:53 | 2023-01-31 23:50:29 |   120000 |
+---------------------+---------------------+----------+

select count(*) from t_log where logdate between '2010-01-01' and '2010-12-31';
select count(*) from t_log where logdate between '2011-01-01' and '2011-12-31';
select count(*) from t_log where logdate between '2022-01-01' and '2022-12-31';
select count(*) from t_log where logdate between '2023-01-01' and '2023-12-31';

导出数据(按照年份来导出)

按照年份区间数据脚本化获取csv文件

[root@ol8mysql01 ~]# cat .my.cnf
[client]
user=root
password=msandbox
socket=/tmp/mysql_sandbox8032.sock
[root@ol8mysql01 ~]# mkdir /vagrant/csv/
[root@ol8mysql01 ~]# cat t_log_export.sh
#!/bin/sh
for i in `seq 2010 2023` 
do 
    {
        mysql -D test -e "select * from t_log where logdate between '$i-01-01' and '$i-12-31' into outfile '/vagrant/csv/t_log_$i.csv' fields terminated by ',' " 
    } &
done
wait
[root@ol8mysql01 ~]# sh t_log_export.sh
[root@ol8mysql01 ~]# ls -tlr /vagrant/csv
total 5648
-rw-r----- 1 vagrant vagrant  38218 Feb  1 11:54 t_log_2023.csv
-rw-r----- 1 vagrant vagrant 436882 Feb  1 11:54 t_log_2022.csv
-rw-r----- 1 vagrant vagrant 438377 Feb  1 11:54 t_log_2021.csv
-rw-r----- 1 vagrant vagrant 429838 Feb  1 11:54 t_log_2020.csv
-rw-r----- 1 vagrant vagrant 449970 Feb  1 11:54 t_log_2019.csv
-rw-r----- 1 vagrant vagrant 447836 Feb  1 11:54 t_log_2018.csv
-rw-r----- 1 vagrant vagrant 443302 Feb  1 11:54 t_log_2017.csv
-rw-r----- 1 vagrant vagrant 435839 Feb  1 11:54 t_log_2016.csv
-rw-r----- 1 vagrant vagrant 433231 Feb  1 11:54 t_log_2015.csv
-rw-r----- 1 vagrant vagrant 430470 Feb  1 11:54 t_log_2014.csv
-rw-r----- 1 vagrant vagrant 437524 Feb  1 11:54 t_log_2013.csv
-rw-r----- 1 vagrant vagrant 442675 Feb  1 11:54 t_log_2012.csv
-rw-r----- 1 vagrant vagrant 443702 Feb  1 11:54 t_log_2011.csv
-rw-r----- 1 vagrant vagrant 444992 Feb  1 11:54 t_log_2010.csv
[root@ol8mysql01 ~]#  

按照时间(年份)创建分区表

分别以年为粒度,建立13张表,其中表 t_log_2022 为分区表:

for i in `seq 2010 2023`;
do 
  mysql -e"use test;drop table if exists t_log_$i;";
done;

for i in `seq 2010 2023`;
do 
  mysql -e"use test;create table t_log_$i like t_log;";
done;


mysql [localhost:8032] {msandbox} (test) > show tables like 't_log%';
+-------------------------+
| Tables_in_test (t_log%) |
+-------------------------+
| t_log                   |
| t_log_2010              |
| t_log_2011              |
| t_log_2012              |
| t_log_2013              |
| t_log_2014              |
| t_log_2015              |
| t_log_2016              |
| t_log_2017              |
| t_log_2018              |
| t_log_2019              |
| t_log_2020              |
| t_log_2021              |
| t_log_2022              |
| t_log_2023              |
+-------------------------+
15 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

由于 MySQL 分区表硬性规定,分区键必须为主键或者主键的一部分,把时间字段加到主键里。

alter table t_log_2023 drop primary key, add primary key(id,logdate);

--   PRIMARY KEY (`id`,`logdate`) /   PRIMARY KEY (`id`) 不同

mysql [localhost:8032] {msandbox} (test) > alter table t_log_2023 drop primary key, add primary key(id,logdate);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (test) > show create table t_log_2023\G
*************************** 1. row ***************************
       Table: t_log_2023
Create Table: CREATE TABLE `t_log_2023` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `user_no` int unsigned NOT NULL DEFAULT '0',
  `user_name` varchar(20) NOT NULL DEFAULT '',
  `logdate` datetime NOT NULL,
  PRIMARY KEY (`id`,`logdate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > show create table t_log_2022\G
*************************** 1. row ***************************
       Table: t_log_2022
Create Table: CREATE TABLE `t_log_2022` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `user_no` int unsigned NOT NULL DEFAULT '0',
  `user_name` varchar(20) NOT NULL DEFAULT '',
  `logdate` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

给表 t_log_2023 添加分区(有可能存放当年以及去年的数据,因此要按照天来分区,并且分成两年,这样到了新的一年,就直接把老旧数据迁移出去),

修改下之前的存储过程如下:

DELIMITER $$

USE `test`$$

DROP PROCEDURE IF EXISTS `sp_add_partition_t_log`$$

CREATE PROCEDURE `sp_add_partition_t_log`(
IN f_year_start YEAR,
IN f_year_end YEAR,
IN f_tbname VARCHAR(64)
)
BEGIN
DECLARE v_days INT UNSIGNED DEFAULT 365;
DECLARE v_year DATE DEFAULT '2021-01-01';
DECLARE v_partition_name VARCHAR(64) DEFAULT '';
DECLARE v_logdate DATE;
DECLARE i,j INT UNSIGNED DEFAULT 1;
  SET @stmt = '';
  SET @stmt_begin = CONCAT('ALTER TABLE ',f_tbname,' PARTITION BY RANGE COLUMNS(logdate)(');
  SET i = f_year_start;
  WHILE i <= f_year_end DO
    SET v_year = CONCAT(i,'-01-01');
    SET v_days = DATEDIFF(DATE_ADD(v_year,INTERVAL 1 YEAR),v_year);
    SET j = 1;
    WHILE j <= v_days DO
      SET v_logdate = DATE_ADD(v_year,INTERVAL j DAY);
      SET v_partition_name = CONCAT('p',i,'_',LPAD(j,3,'0'));
      SET @stmt = CONCAT(@stmt,'PARTITION ',v_partition_name,' VALUES LESS THAN(''',v_logdate,'''),');
      SET j = j + 1;
    END WHILE;
    SET i = i + 1;
  END WHILE;
  SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue))';
  SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
  PREPARE s1 FROM @stmt;
  EXECUTE s1;
  DROP PREPARE s1;
  SELECT NULL,NULL,NULL INTO @stmt,@stmt_begin,@stmt_end;
END$$

DELIMITER ;

call sp_add_partition_t_log(2022,2023,'t_log_2023');

mysql [localhost:8032] {msandbox} (test) > call sp_add_partition_t_log(2022,2023,'t_log_2022');
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function (prefixed columns are not considered).
mysql [localhost:8032] {msandbox} (test) > call sp_add_partition_t_log(2022,2023,'t_log_2023');

Query OK, 1 row affected (10.42 sec)

mysql [localhost:8032] {msandbox} (test) > 

导入原始数据

分别导入原始数据:2023年的数据导入表 t_log_2023,其他数据导入到历史表 t_log_2010 到 t_log_2022

[root@ol8mysql01 ~]# cat t_log_import_2010_2023.sh 
#!/bin/sh
for i in `seq 2010 2023` 
do 
    {
        mysql -D test -e "load data infile '/vagrant/csv/t_log_$i.csv' into table t_log_$i fields terminated by ',' " 
    } &
done
wait
[root@ol8mysql01 ~]# sh t_log_import_2010_2023.sh 
[root@ol8mysql01 ~]# 

修改分区表 t_log_2023 为当前表 t_log_current

mysql [localhost:8032] {msandbox} (test) > select count(*) from t_log_2023;
+----------+
| count(*) |
+----------+
|      795 |
+----------+
1 row in set (0.04 sec)

mysql [localhost:8032] {msandbox} (test) > alter table t_log_2023 rename to t_log_current;
Query OK, 0 rows affected (2.51 sec)

mysql [localhost:8032] {msandbox} (test) > 

测试

指定最近几天的结果集查询 0.05 sec vs 0.00 sec

select * from t_log where logdate between '2023-01-20' and '2023-01-24';
96 rows in set (0.05 sec)

select * from t_log_current where logdate between '2023-01-20' and '2023-01-24';
96 rows in set (0.00 sec)

维护

表 t_log_current 的分区数据

select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,5);

select left(partition_name,11) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,11);

t_log_2022数据没导入到t_log_current

mysql [localhost:8032] {msandbox} (test) > select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,5);
+-------+------+
| p     | cnt  |
+-------+------+
| p2022 |    0 |
| p2023 |  795 |
| p_max |    0 |
+-------+------+
3 rows in set (0.01 sec)

mysql [localhost:8032] {msandbox} (test) > select count(*) from t_log_2022;
+----------+
| count(*) |
+----------+
|     9087 |
+----------+
1 row in set (0.04 sec)

mysql [localhost:8032] {msandbox} (test) > insert into t_log_current select * from t_log_2022;
Query OK, 9087 rows affected (0.19 sec)
Records: 9087  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (test) >
mysql [localhost:8032] {msandbox} (test) > select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,5);
+-------+------+
| p     | cnt  |
+-------+------+
| p2022 | 9087 |
| p2023 |  795 |
| p_max |    0 |
+-------+------+
3 rows in set (0.02 sec)

mysql [localhost:8032] {msandbox} (test) > 

目前只有到2023年01月的数据,到2023年末记录则会自动加入到分区 p_max 里。所以应该在2023年1月1日凌晨前得把2022整年的数据挪出去变为 t_log_2022,并把2023年的分区定义加进去。

写一个自动扩充分区的存储过程,可以配合OS的JOB或者MySQL的EVENT来自动运行,代码如下:

DELIMITER $$

USE `test`$$

DROP PROCEDURE IF EXISTS `sp_autoextend_partition_t_log`$$

CREATE PROCEDURE `sp_autoextend_partition_t_log`(
IN f_year YEAR
)
BEGIN
    DECLARE v_days INT UNSIGNED DEFAULT 365;
    DECLARE v_days_interval DATE DEFAULT '2022-12-31';
    DECLARE i INT UNSIGNED DEFAULT 1;
    SET @stmt = '';
    SET v_days =  DATEDIFF(CONCAT(f_year+1,'-01-01'),CONCAT(f_year,'-01-01'));
     
    SET @stmt_begin = 'ALTER TABLE t_log_current REORGANIZE PARTITION p_max into(';
        WHILE i <= v_days DO
           SET v_days_interval = DATE_ADD(CONCAT(f_year,'-01-01'),INTERVAL i DAY);
           SET @stmt = CONCAT(@stmt,'PARTITION p',f_year,'_',LPAD(i,3,"0"),' VALUES LESS THAN (''',v_days_interval,'''),');     
           SET i = i + 1;        
        END WHILE;  
    SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue))';
        SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
        PREPARE s1 FROM @stmt;
        EXECUTE s1;
        DROP PREPARE s1;
        SET @stmt = NULL;
        SET @stmt_begin = NULL;
        SET @stmt_end = NULL;   
    END$$

DELIMITER ;

-- 扩充2024年的分区数据
call sp_autoextend_partition_t_log(2024);

select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,5);


mysql [localhost:8032] {msandbox} (test) > call sp_autoextend_partition_t_log(2024);
Query OK, 0 rows affected (4.00 sec)

mysql [localhost:8032] {msandbox} (test) > select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't_log_current' group by left(partition_name,5);
+-------+------+
| p     | cnt  |
+-------+------+
| p2022 | 3975 |
| p2023 |  795 |
| p2024 |    0 |
| p_max |    0 |
+-------+------+
4 rows in set (0.04 sec)

mysql [localhost:8032] {msandbox} (test) > 

再把2022年的数据挪到历史表:(由于分区表中每年的分区数目较多,为了写法方便,这里没有用分区置换功能。)

需要注意: 分区定义一定要有规则,这样有利于后期清理过期数据。

-- 根据原始表结构创建
create table t_log_2022 like t_log; 
-- 插入2022年数据
insert into t_log_2022 select * from t_log_current where logdate between '2022-01-01' and '2022-12-31';
-- 构造删除partition的sql语句

SELECT CONCAT('alter table test.t_log_current drop partition ',partition_name,';') FROM information_schema.`PARTITIONS`  WHERE table_schema = 'test' AND table_name = 't_log_current'  AND partition_name like 'p2022%' into outfile '/vagrant/drop_expire_partition_2022.sql';

mysql> \. /vagrant/drop_expire_partition_2022.sql
Query OK, 0 rows affected (0.62 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.53 sec)
Records: 0  Duplicates: 0  Warnings: 0

参考:

Chapter 24 Partitioning

Mysql数据库表分区深入详解

MySQL 随机函数

2022年某一天

mysql [localhost:8032] {msandbox} (test) > select makedate(2022,floor(rand()*365));
+----------------------------------+
| makedate(2022,floor(rand()*365)) |
+----------------------------------+
| 2022-11-11                       |
+----------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

DELIMITER $$
 

SELECT UNIX_TIMESTAMP('2020-01-01 00:00:00') timestamp;

SELECT FROM_UNIXTIME(1577811600) date, UNIX_TIMESTAMP('2020-01-01 00:00:00') timestamp;


SELECT UNIX_TIMESTAMP('2010-01-01 00:00:00')                                             start_time,
       UNIX_TIMESTAMP('2020-12-30 23:59:59')                                             end_time,
       UNIX_TIMESTAMP('2020-12-30 23:59:59') - UNIX_TIMESTAMP('2010-01-01 00:00:00') + 1 time_range;

+------------+------------+------------+
| start_time | end_time   | time_range |
+------------+------------+------------+
| 1262275200 | 1609343999 |  347068800 |
+------------+------------+------------+

SELECT FROM_UNIXTIME(FLOOR(1262275200 + RAND() * 347068800)) rand_time;

SELECT FROM_UNIXTIME(
         FLOOR((@s := UNIX_TIMESTAMP('2020-12-30 23:59:59')) + RAND() * (UNIX_TIMESTAMP('2010-01-01 00:00:00') - @s + 1)
)) rand_time;


SELECT FLOOR(10000 + RAND() * 10000);

Referece

参考:

MySQL 分区表案例分享

Have a good work&life! 2023/01 via LinHong


Similar Posts

上一篇 Exadata CoD Tips

Comments