Lin Hong's TECH Blog! 刀不磨要生锈,人不学习要落后 - Thinking ahead

MySQL 8.0 Study 004 Tips

2023-01-27

MySQL 8.0 Study 004 Tips

学习系列

  • MySQL 慢日志
  • 清理 binlog
  • show profiles
  • 大数据量分页 limit m, n
  • MySQL 定期删除数据
  • MySQL 8.0 IN 查询测试
  • MySQL 优化某些案例

MySQL 慢日志

配置MySQL自动记录慢日志

slow_query_log = OFF #### 是否开启慢日志记录
long_query_time = 2  #### 时间限制,超过此时间,则记录
slow_query_log_file = /data/slow.log  ####   日志文件
log_queries_not_using_indexes = OFF   ####   为使用索引的搜索是否记录
log_slow_admin_statements = ON #### 是否记录管理语句,如alter、create、drop

long_query_time 记录的是 SQL 实际执行时间,不包含锁等待时间,虽然有的 SQL 执行时间很长,但可能是由于锁等待引起的,实际上这种SQL不会被记录到慢查询日志。

Query_time:SQL总执行时间,即实际执行时间 + 锁等待时间
Lock_time:锁等待时间
Rows_sent:发送给客户端的行数
Rows_examined:表示执行过程中扫描了多少行,这个值是在执行器每次调用存储引擎获取数据行的时候累加的,是在Server层统计的

查看:

注:查看当前配置信息:
show variables like '%query%'
修改当前配置:
set global 变量名 = 值

mysqldumpslow -s at -a /data/mysql-slow.log

"""
--verbose    版本
--debug      调试
--help       帮助
 
-v           版本
-d           调试模式
-s ORDER     排序方式
             what to sort by (al, at, ar, c, l, r, t), 'at' is default
              al: average lock time
              ar: average rows sent
              at: average query time
               c: count
               l: lock time
               r: rows sent
               t: query time
-r           反转顺序,默认文件倒序拍。reverse the sort order (largest last instead of first)
-t NUM       显示前N条just show the top n queries
-a           不要将SQL中数字转换成N,字符串转换成S。don't abstract all numbers to N and strings to 'S'
-n NUM       abstract numbers with at least n digits within names
-g PATTERN   正则匹配;grep: only consider stmts that include this string
-h HOSTNAME  mysql机器名或者IP;hostname of db server for *-slow.log filename (can be wildcard),
             default is '*', i.e. match all
-i NAME      name of server instance (if using mysql.server startup script)
-l           总时间中不减去锁定时间;don't subtract lock time from total time
"""

清理 binlog

MySQL server的binlog清理

1.1 使用MySQL参数控制

expire_logs_days

设置二进制日志的过期天数,过了指定天数的日志将被自动删除,可动态修改
如果设置了非0值,则在mysqld启动和日志刷新时,可能执行清理超过定义天数的binlog file
全局变量,动态变量,默认值为0(代表不会自动清理binlog),整型值,取值范围为0~99
自动清理的具体实现是:当binlog文件达到 max_binlog_size自动切换或者手动切换(flush)或者MySQL启动(startup)时,会遍历index文件,找到第一个“最后修改时间”在N天内的binlog文件,然后将该binlog文件之前的所有binlog文件删除掉。

1.2 手动purge清理

通常手动清理binlog是使用MySQL提供的purge命令。purge命令的定义如下:

purge {binary | master} logs to "binlog-file-name"
purge {binary | master} logs before "datetime-expr"

示例:

mysql [localhost:8032] {msandbox} ((none)) > show binary logs;
+------------------+-----------+-----------+
| Log_name         | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000001 | 235503643 | No        |
| mysql-bin.000002 |  67737820 | No        |
| mysql-bin.000003 | 704494646 | No        |
| mysql-bin.000004 |      2016 | No        |
| mysql-bin.000005 |  96898437 | No        |
| mysql-bin.000006 |  15330355 | No        |
| mysql-bin.000007 |       220 | No        |
| mysql-bin.000008 |       220 | No        |
| mysql-bin.000009 | 169453946 | No        |
+------------------+-----------+-----------+
9 rows in set (0.01 sec)

mysql [localhost:8032] {msandbox} ((none)) > PURGE BINARY LOGS TO 'mysql-bin.000007';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost:8032] {msandbox} ((none)) > show binary logs;
+------------------+-----------+-----------+
| Log_name         | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000007 |       220 | No        |
| mysql-bin.000008 |       220 | No        |
| mysql-bin.000009 | 169453946 | No        |
+------------------+-----------+-----------+
3 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} ((none)) > 

-- select now();
-- PURGE BINARY LOGS BEFORE '2023-01-01 00:00:00';

1.3 脚本清理

按binlog保留个数持续清理

#!/bin/bash

# binlog index文件路径和名称
binlog_index_name=/var/lib/mysql/archive/mysql-bin.index

# 需要保留的binlog个数
reserve_num=10

# 执行检查间隔,循环检查,发现在该参数指定的时间间隔内有超过reserve_num参数指定的binlog个数,即执行删除,否则等待下一次检查
check_interv=60

# 单个文件删除间隔,在reserve_num参数指定的间隔内发现binlog数量超过reserve_num参数指定的数量就会执行删除,在有多个文件需要删除时,按照rm_interv参数指定的时间间隔逐个删除文件
rm_interv=0

# 执行日志
log_file=/tmp/`basename $0`.log

rm -f $log_file

# 执行清理binlog函数
exec_purge_binlog() {
    total_num=`sed -n '$=' $binlog_index_name`
    let purge_num=${total_num}-${reserve_num}
    if [ "$purge_num" -gt "0" ];then
        purge_files=(`sed -n "1,${purge_num}p" $binlog_index_name`)
        start_file=$(basename $(sed -n '1p' $binlog_index_name))
        end_file=$(basename $(sed -n "${purge_num}p" $binlog_index_name))
        echo
        echo '=================================================================================================================================='
        echo "发现有需要清理的binlog:binlog文件总个数:${total_num},需要保留文件个数:${reserve_num},需要清理的文件个数:${purge_num},将执行清理的文件名称范围:${start_file}-${end_file}"
        echo "---------------------------`date`----------------------------------正在执行binlog的清理"
        
        i=0
        for purge_file in "${purge_files[@]}"
        do
            echo
            echo "正在清理binlog文件:${purge_file} ----------------------`date`-----------------"
            \rm -f "$purge_file"
            sleep "$rm_interv"
            let i++
            echo "完成清理binlog文件:${purge_file} 时间:`date`,标文件名:${end_file},目标文件数:${purge_num},已清理文件个数:${i}"
        done
        
        sed -i "1,${purge_num}d" $binlog_index_name
        echo "本次清理binlog执行完成------------------`date`--------------------"
    else
        echo
        echo '=================================================================================================================================='
        echo "本次未检测到需要清理的binlog,等待下次执行检查中......"
    fi
}


# 执行函数调用
if [ ! -f "$binlog_index_name" ];then
    echo "警告,指定的binlog index文件不存在,脚本退出!!" |tee -a $log_file
    exit 1
else
    while :
    do
        exec_purge_binlog |tee -a $log_file
        sleep "$check_interv"
    done
fi

MySQL binlog server的binlog脚本清理

#!/bin/bash

# binlog server文件路径和名称
binlog_dir=/opt/backup/binlog/

# 需要保留的binlog个数
reserve_num=10

# 执行检查间隔,循环检查,发现在该参数指定的时间间隔内有超过reserve_num参数指定的binlog个数,即执行删除,否则等待下一次检查
check_interv=60

# 单个文件删除间隔,在reserve_num参数指定的间隔内发现binlog数量超过reserve_num参数指定的数量就会执行删除,在有多个文件需要删除时,按照rm_interv参数指定的时间间隔逐个删除文件
rm_interv=0

# 执行日志
log_file=/tmp/`basename $0`.log

rm -f $log_file

# 执行清理binlog函数
exec_purge_binlog() {
    total_num=`ls $binlog_dir | wc -l`
    let purge_num=${total_num}-${reserve_num}
    if [ "$purge_num" -gt "0" ];then
        purge_files=(`ls $binlog_dir | sort -k2 | head -$purge_num`)
        start_file=(`head -1 $purge_files 2>/dev/null`)
        end_file=(`tail -1 $purge_files 2>/dev/null`)
        echo
        echo '=================================================================================================================================='
        echo "发现有需要清理的binlog:binlog文件总个数:${total_num},需要保留文件个数:${reserve_num},需要清理的文件个数:${purge_num},将执行清理的文件名称范围:${start_file}-${end_file}"
        echo "---------------------------`date`----------------------------------正在执行binlog的清理"

        i=0
        for purge_file in "${purge_files[@]}"
        do
            echo
            echo "正在清理binlog文件:${purge_file} ----------------------`date`-----------------"
            \rm -f "$binlog_dir$purge_file"
            sleep "$rm_interv"
            let i++
            echo "完成清理binlog文件:${purge_file} 时间:`date`,标文件名:${end_file},目标文件数:${purge_num},已清理文件个数:${i}"
        done

        echo "本次清理binlog执行完成------------------`date`--------------------"
    else
        echo
        echo '=================================================================================================================================='
        echo "本次未检测到需要清理的binlog,等待下次执行检查中......"
    fi
}


# 执行函数调用
if [ ! -d "$binlog_dir" ];then
    echo "警告,指定的binlog文件不存在,脚本退出!!" |tee -a $log_file
    exit 1
else
    while :
    do
        exec_purge_binlog |tee -a $log_file
        sleep "$check_interv"
    done
fi

参考:

MySQL server的relay log清理

MySQL中binlog和relay log清理方式

show profiles

show profile 是MySQL提供用来分析当前会话中语句执行的资源消耗情况,可以用于SQL的调优的测量。

默认情况下,该参数处于关闭状态,并保持最近15次的运行结果。

在MySQL5.7中, show profile 命令已经开始不推荐使用,

MySQL使用performance_schema 中系统表的信息来替代show profile命令。

show profile中记录的信息实际上是存储在INFORMATION_SCHEMA.PROFILING 这个系统表中的,各种show profile只不过是相当于一个马甲,换一种方式来展现INFORMATION_SCHEMA.PROFILING 中的信息。

设置和查看:

-- 查看配置情况
show variables like 'profiling';
-- 打开当前Session的profiling
set profiling = on; //开启
-- 查询所有的profile信息
show profiles;
-- 查询某个Id的profile信息
show profile cpu,block io for query 问题SQL数字号码
show profile cpu,block io for query 5;

profile 可选查看项目:

all	显示所有的开销信息
cpu	显示CPU相关开销信息
Block io	显示块IO相关开销
ipc	显示发送和接收相关开销信息
memory	显示内存相关开销信息
Page faults	显示页面错误相关开销信息
swaps	显示交换次数相关开销的信息
source	显示和source_funciton、source_file、source_line相关的开销信息
Context switches	上下文切换相关开销

如果在 Status 字段中出现以下描述,则需要注意:

(1)converting HEAP to MYISAM:查询结果太大,内存都不够用了,往磁盘上搬了;
(2)Creating tmp table :创建临时表(拷贝数据到临时表,用完再删除);
(3)Copying to tmp table on disk:把内存中临时表复制到磁盘(很危险!!!)
(4)Locked

Profiling MySQL queries from Performance Schema

25.19.1 Query Profiling Using Performance Schema

27.19.1 Query Profiling Using Performance Schema

SELECT THREAD_ID INTO @my_thread_id
FROM performance_schema.threads WHERE PROCESSLIST_ID = CONNECTION_ID();

SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID
FROM performance_schema.events_statements_history_long
WHERE THREAD_ID = @my_thread_id
  AND EVENT_NAME = 'statement/sql/select'
ORDER BY EVENT_ID DESC LIMIT 3;


SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'
FROM performance_schema.events_stages_history_long
WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392;


--历史记录数据量的配置
show variables like 'performance_schema%history%size';

SELECT 
    EVENT_ID,
    date_sub(now(),INTERVAL (select VARIABLE_VALUE from performance_schema.global_status where variable_name='UPTIME')-TIMER_START*10e-13 second) `start_time` ,
    date_sub(now(),INTERVAL (select VARIABLE_VALUE from performance_schema.global_status where variable_name='UPTIME')-timer_end*10e-13 second) `start_time` , 
    TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, 
    SQL_TEXT
FROM performance_schema.events_statements_history_long 
WHERE SQL_TEXT like '%count%';
 

SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
FROM performance_schema.events_stages_history_long 
WHERE NESTING_EVENT_ID=48903;

-- 27.19.1 Query Profiling Using Performance Schema
SELECT * FROM performance_schema.setup_actors;

UPDATE performance_schema.setup_actors
       SET ENABLED = 'NO', HISTORY = 'NO'
       WHERE HOST = '%' AND USER = '%';

INSERT INTO performance_schema.setup_actors
       (HOST,USER,ROLE,ENABLED,HISTORY)
       VALUES('localhost','root','%','YES','YES');

SELECT * FROM performance_schema.setup_actors;

UPDATE performance_schema.setup_instruments
       SET ENABLED = 'YES', TIMED = 'YES'
       WHERE NAME LIKE '%statement/%';

UPDATE performance_schema.setup_instruments
       SET ENABLED = 'YES', TIMED = 'YES'
       WHERE NAME LIKE '%stage/%';

UPDATE performance_schema.setup_consumers
       SET ENABLED = 'YES'
       WHERE NAME LIKE '%events_statements_%';

UPDATE performance_schema.setup_consumers
       SET ENABLED = 'YES'
       WHERE NAME LIKE '%events_stages_%';

SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
       FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%count%';

SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
       FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%call%';

SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
       FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=1600303;

大数据量分页 limit m, n

关于 MySQL 深分页优化常见的大概有以下三种策略:

子查询优化
延迟关联
书签记录

核心思想就是让 MySQL 尽可能扫描更少的页面,获取需要访问的记录后再根据关联列回原表查询所需要的列

如果主键id有序,可以用子查询优化

如果主键id是无序的,可以用inner join来优化
set profiling=ON;
show variables like 'profi%';

+----------+------------+------------------------------------+
| Query_ID | Duration   | Query                              |
+----------+------------+------------------------------------+
|        1 | 0.00119725 | show variables like 'profi%'       |
|        2 | 1.03761425 | select count(*) from emp           |
|        3 | 0.03029975 | select * from emp limit 100000,10  |
|        4 | 0.63729425 | select * from emp limit 2000000,10 |
|        5 | 1.30070800 | select * from emp limit 4000000,10 |
|        6 | 1.96291875 | select * from emp limit 6000000,10 |
|        7 | 2.91441175 | select * from emp limit 9000000,10 |
+----------+------------+------------------------------------+


|       31 | 3.09750975 | select * from emp limit 9000000,10                                                                 |
|       32 | 1.50881575 | select * from emp t1 inner join (select id from emp t2 limit 9000000,10) t2 on t1.id=t2.id         |
|       33 | 0.00067825 | select * from emp t1 where id>(select id from emp where id>=9000000 limit 1) limit 10              |

子查询

mysql [localhost:8032] {msandbox} (test) > explain select * from emp limit 9000000,10 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 9731479 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > explain select * from emp t1 inner join (select id from emp t2 limit 9000000,10) t2 on t1.id=t2.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL  | 9000010 |   100.00 | NULL        |
|  1 | PRIMARY     | t1         | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | t2.id |       1 |   100.00 | NULL        |
|  2 | DERIVED     | t2         | NULL       | index  | NULL          | PRIMARY | 4       | NULL  | 9731479 |   100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

https://blog.csdn.net/qq_37781649/article/details/111689193

当 LIMIT OFFSET 过深时,会使 ORDER BY 普通索引失效(联合、唯一这些索引没有测试)

SORT_BUFFER_SIZE 是 MySQL 为排序开辟的内存。如果排序数据量小于 SORT_BUFFER_SIZE,排序会在内存中完成。如果数据量过大,内存放不下,则会利用磁盘临时文件排序

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 9731479 |    33.33 | Using where; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

Using filesort 表示在索引之外,需要额外进行外部的排序动作,性能必将受到严重影响

MySQL 定期删除数据

create table testdate(
	id int unsigned primary key auto_increment,
    id_date date not null
);

insert into testdate(id_date) select date_sub(now(), interval 1 day);
insert into testdate(id_date) select date_sub(now(), interval 2 day);
insert into testdate(id_date) select date_sub(now(), interval 3 day);
insert into testdate(id_date) select date_sub(now(), interval 4 day);
commit;
select * from testdate;

删除三天前的数据的sql

DELIMITER // 
create procedure del_data()
BEGIN
	DELETE FROM testdate WHERE id_date < DATE_SUB(CURDATE(),INTERVAL 3 DAY);
END//
DELIMITER ;

-- DELIMITER是分割符的意思,声明存储过程前将"//"声明为分隔符,这样存储过程中的“;”才不会被当作分隔符处理。声明结束后再还原分隔符。
-- 存储过程也可以带参数,存储过程名(参数)
-- 在声明存储过程前要先用use database_name切换到想要应用的数据库,否则存储过程会应用到默认数据库中

-- select * from information_schema.ROUTINES where ROUTINE_SCHEMA = 'test';
-- call del_data();

-- 开启事件调度器
SET GLOBAL event_scheduler = ON;

-- 创建事件
create event del_event  
on schedule 
EVERY 1 day  
STARTS '2023-1-28 00:00:00'  
do call del_data()

-- 从明天的零点开始,每隔一天执行del_data存储过程

-- 查看 event 信息
select * from information_schema.events\G

测试存储过程:

mysql [localhost:8032] {msandbox} (test) > select * from testdate;
+----+------------+
| id | id_date    |
+----+------------+
|  1 | 2023-01-26 |
|  2 | 2023-01-25 |
|  3 | 2023-01-24 |
|  5 | 2023-01-23 |
+----+------------+
4 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > call del_data();
Query OK, 1 row affected (0.01 sec)

mysql [localhost:8032] {msandbox} (test) > select * from testdate;
+----+------------+
| id | id_date    |
+----+------------+
|  1 | 2023-01-26 |
|  2 | 2023-01-25 |
|  3 | 2023-01-24 |
+----+------------+
3 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 
mysql [localhost:8032] {msandbox} (information_schema) > select * from information_schema.events\G
*************************** 1. row ***************************
       EVENT_CATALOG: def
        EVENT_SCHEMA: test
          EVENT_NAME: del_event
             DEFINER: msandbox@localhost
           TIME_ZONE: SYSTEM
          EVENT_BODY: SQL
    EVENT_DEFINITION: call del_data()
          EVENT_TYPE: RECURRING
          EXECUTE_AT: NULL
      INTERVAL_VALUE: 1
      INTERVAL_FIELD: DAY
            SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ALLOW_INVALID_DATES,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
              STARTS: 2023-01-28 00:00:00
                ENDS: NULL
              STATUS: ENABLED
       ON_COMPLETION: NOT PRESERVE
             CREATED: 2023-01-27 21:13:16
        LAST_ALTERED: 2023-01-27 21:13:16
       LAST_EXECUTED: NULL
       EVENT_COMMENT: 
          ORIGINATOR: 8032
CHARACTER_SET_CLIENT: utf8mb4
COLLATION_CONNECTION: utf8mb4_0900_ai_ci
  DATABASE_COLLATION: utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (information_schema) > 

MySql取得日期(前一天、某一天)

附日期格式如下:

%M 月名字(January……December)  
%W 星期名字(Sunday……Saturday)  
%D 有英语前缀的月份的日期(1st, 2nd, 3rd, 等等。)  
%Y 年, 数字, 4 位  
%y 年, 数字, 2 位  
%a 缩写的星期名字(Sun……Sat)  
%d 月份中的天数, 数字(00……31)  
%e 月份中的天数, 数字(0……31)  
%m 月, 数字(01……12)  
%c 月, 数字(1……12)  
%b 缩写的月份名字(Jan……Dec)  
%j 一年中的天数(001……366)  
%H 小时(00……23)  
%k 小时(0……23)  
%h 小时(01……12)  
%I 小时(01……12)  
%l 小时(1……12)  
%i 分钟, 数字(00……59)  
%r 时间,12 小时(hh:mm:ss [AP]M)  
%T 时间,24 小时(hh:mm:ss)  
%S 秒(00……59)  
%s 秒(00……59)  
%p AM或PM  
%w 一个星期中的天数(0=Sunday ……6=Saturday )  
%U 星期(0……52), 这里星期天是星期的第一天
-- 存储过程基本语法
-- 创建
create procedure p1()
begin
select count(*) from t1;
end;
-- 调用
call p1();
-- 查看
select * from information_schema.ROUTINES where ROUTINE_SCHEMA = 'test';
show create procedure p1;
-- 删除
drop procedure if exists p1;

MySQL 8.0 IN查询的测试

MySQL中IN子查询会导致无法使用索引

5.7 版本的IN查询肯能用不到索引 8.0测试则可以使用

create table test1
(
    id int auto_increment primary key,
    pay_id int,
    pay_time datetime,
    other_col varchar(100)
);

-- 测试数据的特点是pay_id可重复,这里在存储过程处理成,循环插入300W条数据的过程中,每隔100条数据插入一条重复的pay_id,时间字段在一定范围内随机

DELIMITER //

CREATE DEFINER=`root`@`%` PROCEDURE `test1_insert`(IN `loopcount` INT)
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY DEFINER
    COMMENT ''
BEGIN
    declare cnt int;
    set  cnt = 0;
    while cnt< loopcount do
        insert into test1 (pay_id,pay_time,other_col) values  (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
        if (cnt mod 100 = 0) then
            insert into test1 (pay_id,pay_time,other_col) values  (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
        end if;
        set cnt = cnt + 1;    
    end while;

END;
//
DELIMITER ;

DELIMITER //

CREATE PROCEDURE `test1_insert`(IN `loopcount` INT)
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY DEFINER
    COMMENT ''
BEGIN
    declare cnt int;
    set  cnt = 0;
    while cnt< loopcount do
        insert into test1 (pay_id,pay_time,other_col) values  (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
        if (cnt mod 100 = 0) then
            insert into test1 (pay_id,pay_time,other_col) values  (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
        end if;
        set cnt = cnt + 1;    
    end while;

END;
//
DELIMITER ;


-- 插入30300行数据
-- set global innodb_flush_log_at_trx_commit=2;
-- 设置为1,也就是说每次事务提交,都会将innodb日志缓存写入磁盘,对磁盘效率影响很大,将它设置为2,每次事务提交时mysql都会把log buffer的数据写入log file,但是flush(刷到磁盘)操作并不会同时进行。该模式下,MySQL会每秒执行一次 flush(刷到磁盘)操作
-- iostat -k -d -x 1 10

call test1_insert(300000); 

-- 创建索引
create index idx_pay_id on test1(pay_id);

两种子查询的写法

查询大概的意思是查询某个时间段之内的业务Id大于1的数据,于是就出现两种写法。

第一种写法如下:

IN子查询中是某段时间内业务统计行数大于1的业务Id,
外层按照IN子查询的结果进行查询,
业务Id的列pay_id上有索引,

这种写法,5.7 在数据量大的时候确实效率比较低,用不到索引

select * from test1 force index(primary)
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);


explain select * from test1 force index(primary)
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);

select * from test1 force index(idx_pay_id)
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);

explain select * from test1 force index(idx_pay_id)
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);


select * from test1
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);

explain select * from test1
where pay_id in (
select pay_id from test1 
where pay_time>="2023-06-01 00:00:00" 
and pay_time<="2023-07-03 12:59:59" 
group by pay_id 
having count(pay_id) > 1
);

第二种写法,与子查询进行join关联,这种写法相当于上面的IN子查询写法,下面测试发现,效率确实有不少的提高

select tpp1.* from test1 tpp1, 
(
     select pay_id 
     from test1 
       where pay_time>="2023-06-01 00:00:00" 
       and pay_time<="2023-07-03 12:59:59" 
     group by pay_id 
     having count(pay_id) > 1
) tpp2 
where tpp1.pay_id=tpp2.pay_id;

explain select tpp1.* from test1 tpp1, 
(
     select pay_id 
     from test1 
       where pay_time>="2023-06-01 00:00:00" 
       and pay_time<="2023-07-03 12:59:59" 
     group by pay_id 
     having count(pay_id) > 1
) tpp2 
where tpp1.pay_id=tpp2.pay_id;

第一种写法:

(force主键不能用,但并没有使用index---> 如果没 create index idx_pay_id on test1(pay_id);)
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
|  1 | PRIMARY     | test1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 301979 |   100.00 | Using where                  |
|  2 | SUBQUERY    | test1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 301979 |    11.11 | Using where; Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
(force idx_pay_id 能用)->默认是这种执行计划
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | test1 | NULL       | ALL   | NULL          | NULL       | NULL    | NULL | 301930 |   100.00 | Using where |
|  2 | SUBQUERY    | test1 | NULL       | index | idx_pay_id    | idx_pay_id | 5       | NULL | 301930 |    11.11 | Using where |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+

第二种写法:
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref              | rows   | filtered | Extra                        |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+
|  1 | PRIMARY     | tpp1       | NULL       | ALL  | NULL          | NULL        | NULL    | NULL             | 301979 |   100.00 | Using where                  |
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 5       | test.tpp1.pay_id |     10 |   100.00 | Using index                  |
|  2 | DERIVED     | test1      | NULL       | ALL  | NULL          | NULL        | NULL    | NULL             | 301979 |    11.11 | Using where; Using temporary |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+

Tips:

比如这样的语句是不能正确执行的。 
select * from table where id in (select id from table limit 12); 

但是,只要你再加一层就行。如: 
select * from table where id in (select t.id from (select * from table limit 12)as t) 

这样就可以绕开limit子查询的问题。 

MySQL 某些优化案例

mysql优化案例分析

例子1:

SELECT ID FROM SENDLOG WHERE TO_DAYS(NOW())-TO_DAYS(GMT_CREATE) > 7;

问题:对索引列GMT_CREATE进行了运算,无法使用索引

优化后sql:

select id from sendlog where gmt_create < now() - 7

问题:result列含有索引,但mysql执行计划没有用到。

分析:result类型为char,传递的值1为整型,数据类型不一致,导致没法用索引,对于时间类型gmt_create > ‘2013-10-29 12:40:44’,可以直接使用。

优化后sql:

SELECT * FROM SENDLOG where result = '1' and gmt_create > '2013-10-29 12:40:44' limit 2000;

例子3: 场景:获取某个卖家未读的消息。

select count(*) from mc_msg where receiver='sun098' and status='UNREAD' and title is not null;

问题:有时候db负载飙高,sql响应时间变慢。

分析:导致db负载飙高的原因是多个大卖家并发查询的时,cpu和逻辑读增加,load飙高。由于receiver,status已有索引,sql本身已经没有优化空间,了解业务后发现其实业务不需要精确值,如果大于99条,页面就直接显示为99+

优化后sql:

select count(*) from (select id from mc_msg where receiver='sun098' and status='UNREAD' and title is not null limit 100) a;

例子4:

场景:查看历史订单留言记录,未读留言的放在前面,已读的放在后面,并且按时间递减排序

select * from(
select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ 
from message_relation_sender
WHERE SENDER_ALI_ID = 119545671  and UNREAD_COUNT > 0 
order by LAST_MESSAGE_ID desc) m 
union all 
select * from(
select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender
WHERE SENDER_ALI_ID = 119545671  and UNREAD_COUNT = 0 
order by LAST_MESSAGE_ID desc) n limit 5000,15;

分析: (1)unread_count表示未读的订单留言记录数目; (2)第一个子查询获取未读留言记录,第二子查询获取已读留言记录; (3)LAST_MESSAGE_ID 递增,最新的订单留言,LAST_MESSAGE_ID最大。 (4)limit 5000,15是分页查询 这里使用union all,主要特点在于union all 不对结果集排序,直接合并,达到了“未读留言的放在前面,已读的放在后面”的效果,但同时也造成了两次扫描索引的结果,每个子查询都需要排序;而且union all还会产生临时表,执行代价会更大。

优化: 这里看到unread_count实际值对这个查询没有实际意义,我们只需要区分已读和未读即可。由于sql本身已经没有优化余地,考虑对表结构进行修改,加一个字段is_read,表示已读和未读。is_read=2表示未读;is_read=1表示已读。通过组合索引(SENDER_ALI_ID,is_read, LAST_MESSAGE_ID),既可以完成过滤,还可以完成排序。

优化后sql:

select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender
where SENDER_ALI_ID = 119545671  order by is_read desc, LAST_MESSAGE_ID desc limit 5000,15

第二次优化: 由于索引不包含所有的返回字段,因此需要回表,而mysql对于limit 5000,15的查询却需要返回5015次,这种无效的返回很影响查询效率。 分页的优化写法:

select t1.ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender t1,
(select id 
 from message_relation_sender
 where SENDER_ALI_ID = 119545671 order by is_read desc, LAST_MESSAGE_ID desc limit 5000,15
)t2 where t1.id = t2.id

分析:由于id是主键,不需要回表,通过连接查询,最终只需要15次回表即可。

Referece

参考:

MySQL 千万数据量深分页优化, 拒绝线上故障

MySQL大数据量分页查询方法及其优化

Have a good work&life! 2023/01 via LinHong


Similar Posts

下一篇 OCI SSH Key Tips

Comments