- MySQL 慢日志
- 清理 binlog
- show profiles
- 大数据量分页 limit m, n
- MySQL 定期删除数据
- MySQL 8.0 IN查询的测试
- MySQL 某些优化案例
- Referece
MySQL 8.0 Study 004 Tips
学习系列
- MySQL 慢日志
- 清理 binlog
- show profiles
- 大数据量分页 limit m, n
- MySQL 定期删除数据
- MySQL 8.0 IN 查询测试
- MySQL 优化某些案例
MySQL 慢日志
配置MySQL自动记录慢日志
slow_query_log = OFF #### 是否开启慢日志记录
long_query_time = 2 #### 时间限制,超过此时间,则记录
slow_query_log_file = /data/slow.log #### 日志文件
log_queries_not_using_indexes = OFF #### 为使用索引的搜索是否记录
log_slow_admin_statements = ON #### 是否记录管理语句,如alter、create、drop
long_query_time 记录的是 SQL 实际执行时间,不包含锁等待时间,虽然有的 SQL 执行时间很长,但可能是由于锁等待引起的,实际上这种SQL不会被记录到慢查询日志。
Query_time:SQL总执行时间,即实际执行时间 + 锁等待时间
Lock_time:锁等待时间
Rows_sent:发送给客户端的行数
Rows_examined:表示执行过程中扫描了多少行,这个值是在执行器每次调用存储引擎获取数据行的时候累加的,是在Server层统计的
查看:
注:查看当前配置信息:
show variables like '%query%'
修改当前配置:
set global 变量名 = 值
mysqldumpslow -s at -a /data/mysql-slow.log
"""
--verbose 版本
--debug 调试
--help 帮助
-v 版本
-d 调试模式
-s ORDER 排序方式
what to sort by (al, at, ar, c, l, r, t), 'at' is default
al: average lock time
ar: average rows sent
at: average query time
c: count
l: lock time
r: rows sent
t: query time
-r 反转顺序,默认文件倒序拍。reverse the sort order (largest last instead of first)
-t NUM 显示前N条just show the top n queries
-a 不要将SQL中数字转换成N,字符串转换成S。don't abstract all numbers to N and strings to 'S'
-n NUM abstract numbers with at least n digits within names
-g PATTERN 正则匹配;grep: only consider stmts that include this string
-h HOSTNAME mysql机器名或者IP;hostname of db server for *-slow.log filename (can be wildcard),
default is '*', i.e. match all
-i NAME name of server instance (if using mysql.server startup script)
-l 总时间中不减去锁定时间;don't subtract lock time from total time
"""
清理 binlog
MySQL server的binlog清理
1.1 使用MySQL参数控制
expire_logs_days
设置二进制日志的过期天数,过了指定天数的日志将被自动删除,可动态修改
如果设置了非0值,则在mysqld启动和日志刷新时,可能执行清理超过定义天数的binlog file
全局变量,动态变量,默认值为0(代表不会自动清理binlog),整型值,取值范围为0~99
自动清理的具体实现是:当binlog文件达到 max_binlog_size自动切换或者手动切换(flush)或者MySQL启动(startup)时,会遍历index文件,找到第一个“最后修改时间”在N天内的binlog文件,然后将该binlog文件之前的所有binlog文件删除掉。
1.2 手动purge清理
通常手动清理binlog是使用MySQL提供的purge命令。purge命令的定义如下:
purge {binary | master} logs to "binlog-file-name"
purge {binary | master} logs before "datetime-expr"
示例:
mysql [localhost:8032] {msandbox} ((none)) > show binary logs;
+------------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000001 | 235503643 | No |
| mysql-bin.000002 | 67737820 | No |
| mysql-bin.000003 | 704494646 | No |
| mysql-bin.000004 | 2016 | No |
| mysql-bin.000005 | 96898437 | No |
| mysql-bin.000006 | 15330355 | No |
| mysql-bin.000007 | 220 | No |
| mysql-bin.000008 | 220 | No |
| mysql-bin.000009 | 169453946 | No |
+------------------+-----------+-----------+
9 rows in set (0.01 sec)
mysql [localhost:8032] {msandbox} ((none)) > PURGE BINARY LOGS TO 'mysql-bin.000007';
Query OK, 0 rows affected (0.01 sec)
mysql [localhost:8032] {msandbox} ((none)) > show binary logs;
+------------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000007 | 220 | No |
| mysql-bin.000008 | 220 | No |
| mysql-bin.000009 | 169453946 | No |
+------------------+-----------+-----------+
3 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} ((none)) >
-- select now();
-- PURGE BINARY LOGS BEFORE '2023-01-01 00:00:00';
1.3 脚本清理
按binlog保留个数持续清理
#!/bin/bash
# binlog index文件路径和名称
binlog_index_name=/var/lib/mysql/archive/mysql-bin.index
# 需要保留的binlog个数
reserve_num=10
# 执行检查间隔,循环检查,发现在该参数指定的时间间隔内有超过reserve_num参数指定的binlog个数,即执行删除,否则等待下一次检查
check_interv=60
# 单个文件删除间隔,在reserve_num参数指定的间隔内发现binlog数量超过reserve_num参数指定的数量就会执行删除,在有多个文件需要删除时,按照rm_interv参数指定的时间间隔逐个删除文件
rm_interv=0
# 执行日志
log_file=/tmp/`basename $0`.log
rm -f $log_file
# 执行清理binlog函数
exec_purge_binlog() {
total_num=`sed -n '$=' $binlog_index_name`
let purge_num=${total_num}-${reserve_num}
if [ "$purge_num" -gt "0" ];then
purge_files=(`sed -n "1,${purge_num}p" $binlog_index_name`)
start_file=$(basename $(sed -n '1p' $binlog_index_name))
end_file=$(basename $(sed -n "${purge_num}p" $binlog_index_name))
echo
echo '=================================================================================================================================='
echo "发现有需要清理的binlog:binlog文件总个数:${total_num},需要保留文件个数:${reserve_num},需要清理的文件个数:${purge_num},将执行清理的文件名称范围:${start_file}-${end_file}"
echo "---------------------------`date`----------------------------------正在执行binlog的清理"
i=0
for purge_file in "${purge_files[@]}"
do
echo
echo "正在清理binlog文件:${purge_file} ----------------------`date`-----------------"
\rm -f "$purge_file"
sleep "$rm_interv"
let i++
echo "完成清理binlog文件:${purge_file} 时间:`date`,标文件名:${end_file},目标文件数:${purge_num},已清理文件个数:${i}"
done
sed -i "1,${purge_num}d" $binlog_index_name
echo "本次清理binlog执行完成------------------`date`--------------------"
else
echo
echo '=================================================================================================================================='
echo "本次未检测到需要清理的binlog,等待下次执行检查中......"
fi
}
# 执行函数调用
if [ ! -f "$binlog_index_name" ];then
echo "警告,指定的binlog index文件不存在,脚本退出!!" |tee -a $log_file
exit 1
else
while :
do
exec_purge_binlog |tee -a $log_file
sleep "$check_interv"
done
fi
MySQL binlog server的binlog脚本清理
#!/bin/bash
# binlog server文件路径和名称
binlog_dir=/opt/backup/binlog/
# 需要保留的binlog个数
reserve_num=10
# 执行检查间隔,循环检查,发现在该参数指定的时间间隔内有超过reserve_num参数指定的binlog个数,即执行删除,否则等待下一次检查
check_interv=60
# 单个文件删除间隔,在reserve_num参数指定的间隔内发现binlog数量超过reserve_num参数指定的数量就会执行删除,在有多个文件需要删除时,按照rm_interv参数指定的时间间隔逐个删除文件
rm_interv=0
# 执行日志
log_file=/tmp/`basename $0`.log
rm -f $log_file
# 执行清理binlog函数
exec_purge_binlog() {
total_num=`ls $binlog_dir | wc -l`
let purge_num=${total_num}-${reserve_num}
if [ "$purge_num" -gt "0" ];then
purge_files=(`ls $binlog_dir | sort -k2 | head -$purge_num`)
start_file=(`head -1 $purge_files 2>/dev/null`)
end_file=(`tail -1 $purge_files 2>/dev/null`)
echo
echo '=================================================================================================================================='
echo "发现有需要清理的binlog:binlog文件总个数:${total_num},需要保留文件个数:${reserve_num},需要清理的文件个数:${purge_num},将执行清理的文件名称范围:${start_file}-${end_file}"
echo "---------------------------`date`----------------------------------正在执行binlog的清理"
i=0
for purge_file in "${purge_files[@]}"
do
echo
echo "正在清理binlog文件:${purge_file} ----------------------`date`-----------------"
\rm -f "$binlog_dir$purge_file"
sleep "$rm_interv"
let i++
echo "完成清理binlog文件:${purge_file} 时间:`date`,标文件名:${end_file},目标文件数:${purge_num},已清理文件个数:${i}"
done
echo "本次清理binlog执行完成------------------`date`--------------------"
else
echo
echo '=================================================================================================================================='
echo "本次未检测到需要清理的binlog,等待下次执行检查中......"
fi
}
# 执行函数调用
if [ ! -d "$binlog_dir" ];then
echo "警告,指定的binlog文件不存在,脚本退出!!" |tee -a $log_file
exit 1
else
while :
do
exec_purge_binlog |tee -a $log_file
sleep "$check_interv"
done
fi
参考:
MySQL server的relay log清理
show profiles
show profile 是MySQL提供用来分析当前会话中语句执行的资源消耗情况,可以用于SQL的调优的测量。
默认情况下,该参数处于关闭状态,并保持最近15次的运行结果。
在MySQL5.7中, show profile 命令已经开始不推荐使用,
MySQL使用performance_schema 中系统表的信息来替代show profile命令。
show profile中记录的信息实际上是存储在INFORMATION_SCHEMA.PROFILING 这个系统表中的,各种show profile只不过是相当于一个马甲,换一种方式来展现INFORMATION_SCHEMA.PROFILING 中的信息。
设置和查看:
-- 查看配置情况
show variables like 'profiling';
-- 打开当前Session的profiling
set profiling = on; //开启
-- 查询所有的profile信息
show profiles;
-- 查询某个Id的profile信息
show profile cpu,block io for query 问题SQL数字号码
show profile cpu,block io for query 5;
profile 可选查看项目:
all 显示所有的开销信息
cpu 显示CPU相关开销信息
Block io 显示块IO相关开销
ipc 显示发送和接收相关开销信息
memory 显示内存相关开销信息
Page faults 显示页面错误相关开销信息
swaps 显示交换次数相关开销的信息
source 显示和source_funciton、source_file、source_line相关的开销信息
Context switches 上下文切换相关开销
如果在 Status 字段中出现以下描述,则需要注意:
(1)converting HEAP to MYISAM:查询结果太大,内存都不够用了,往磁盘上搬了;
(2)Creating tmp table :创建临时表(拷贝数据到临时表,用完再删除);
(3)Copying to tmp table on disk:把内存中临时表复制到磁盘(很危险!!!)
(4)Locked
Profiling MySQL queries from Performance Schema
25.19.1 Query Profiling Using Performance Schema
27.19.1 Query Profiling Using Performance Schema
SELECT THREAD_ID INTO @my_thread_id
FROM performance_schema.threads WHERE PROCESSLIST_ID = CONNECTION_ID();
SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID
FROM performance_schema.events_statements_history_long
WHERE THREAD_ID = @my_thread_id
AND EVENT_NAME = 'statement/sql/select'
ORDER BY EVENT_ID DESC LIMIT 3;
SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'
FROM performance_schema.events_stages_history_long
WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392;
--历史记录数据量的配置
show variables like 'performance_schema%history%size';
SELECT
EVENT_ID,
date_sub(now(),INTERVAL (select VARIABLE_VALUE from performance_schema.global_status where variable_name='UPTIME')-TIMER_START*10e-13 second) `start_time` ,
date_sub(now(),INTERVAL (select VARIABLE_VALUE from performance_schema.global_status where variable_name='UPTIME')-timer_end*10e-13 second) `start_time` ,
TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration,
SQL_TEXT
FROM performance_schema.events_statements_history_long
WHERE SQL_TEXT like '%count%';
SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
FROM performance_schema.events_stages_history_long
WHERE NESTING_EVENT_ID=48903;
-- 27.19.1 Query Profiling Using Performance Schema
SELECT * FROM performance_schema.setup_actors;
UPDATE performance_schema.setup_actors
SET ENABLED = 'NO', HISTORY = 'NO'
WHERE HOST = '%' AND USER = '%';
INSERT INTO performance_schema.setup_actors
(HOST,USER,ROLE,ENABLED,HISTORY)
VALUES('localhost','root','%','YES','YES');
SELECT * FROM performance_schema.setup_actors;
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE '%statement/%';
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE '%stage/%';
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%events_statements_%';
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%events_stages_%';
SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%count%';
SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%call%';
SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=1600303;
大数据量分页 limit m, n
关于 MySQL 深分页优化常见的大概有以下三种策略:
子查询优化
延迟关联
书签记录
核心思想就是让 MySQL 尽可能扫描更少的页面,获取需要访问的记录后再根据关联列回原表查询所需要的列
如果主键id有序,可以用子查询优化
如果主键id是无序的,可以用inner join来优化
set profiling=ON;
show variables like 'profi%';
+----------+------------+------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+------------------------------------+
| 1 | 0.00119725 | show variables like 'profi%' |
| 2 | 1.03761425 | select count(*) from emp |
| 3 | 0.03029975 | select * from emp limit 100000,10 |
| 4 | 0.63729425 | select * from emp limit 2000000,10 |
| 5 | 1.30070800 | select * from emp limit 4000000,10 |
| 6 | 1.96291875 | select * from emp limit 6000000,10 |
| 7 | 2.91441175 | select * from emp limit 9000000,10 |
+----------+------------+------------------------------------+
| 31 | 3.09750975 | select * from emp limit 9000000,10 |
| 32 | 1.50881575 | select * from emp t1 inner join (select id from emp t2 limit 9000000,10) t2 on t1.id=t2.id |
| 33 | 0.00067825 | select * from emp t1 where id>(select id from emp where id>=9000000 limit 1) limit 10 |
子查询
mysql [localhost:8032] {msandbox} (test) > explain select * from emp limit 9000000,10 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | emp | NULL | ALL | NULL | NULL | NULL | NULL | 9731479 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > explain select * from emp t1 inner join (select id from emp t2 limit 9000000,10) t2 on t1.id=t2.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 9000010 | 100.00 | NULL |
| 1 | PRIMARY | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | t2.id | 1 | 100.00 | NULL |
| 2 | DERIVED | t2 | NULL | index | NULL | PRIMARY | 4 | NULL | 9731479 | 100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (test) >
https://blog.csdn.net/qq_37781649/article/details/111689193
当 LIMIT OFFSET 过深时,会使 ORDER BY 普通索引失效(联合、唯一这些索引没有测试)
SORT_BUFFER_SIZE 是 MySQL 为排序开辟的内存。如果排序数据量小于 SORT_BUFFER_SIZE,排序会在内存中完成。如果数据量过大,内存放不下,则会利用磁盘临时文件排序
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| 1 | SIMPLE | emp | NULL | ALL | NULL | NULL | NULL | NULL | 9731479 | 33.33 | Using where; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
Using filesort 表示在索引之外,需要额外进行外部的排序动作,性能必将受到严重影响
MySQL 定期删除数据
create table testdate(
id int unsigned primary key auto_increment,
id_date date not null
);
insert into testdate(id_date) select date_sub(now(), interval 1 day);
insert into testdate(id_date) select date_sub(now(), interval 2 day);
insert into testdate(id_date) select date_sub(now(), interval 3 day);
insert into testdate(id_date) select date_sub(now(), interval 4 day);
commit;
select * from testdate;
删除三天前的数据的sql
DELIMITER //
create procedure del_data()
BEGIN
DELETE FROM testdate WHERE id_date < DATE_SUB(CURDATE(),INTERVAL 3 DAY);
END//
DELIMITER ;
-- DELIMITER是分割符的意思,声明存储过程前将"//"声明为分隔符,这样存储过程中的“;”才不会被当作分隔符处理。声明结束后再还原分隔符。
-- 存储过程也可以带参数,存储过程名(参数)
-- 在声明存储过程前要先用use database_name切换到想要应用的数据库,否则存储过程会应用到默认数据库中
-- select * from information_schema.ROUTINES where ROUTINE_SCHEMA = 'test';
-- call del_data();
-- 开启事件调度器
SET GLOBAL event_scheduler = ON;
-- 创建事件
create event del_event
on schedule
EVERY 1 day
STARTS '2023-1-28 00:00:00'
do call del_data()
-- 从明天的零点开始,每隔一天执行del_data存储过程
-- 查看 event 信息
select * from information_schema.events\G
测试存储过程:
mysql [localhost:8032] {msandbox} (test) > select * from testdate;
+----+------------+
| id | id_date |
+----+------------+
| 1 | 2023-01-26 |
| 2 | 2023-01-25 |
| 3 | 2023-01-24 |
| 5 | 2023-01-23 |
+----+------------+
4 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > call del_data();
Query OK, 1 row affected (0.01 sec)
mysql [localhost:8032] {msandbox} (test) > select * from testdate;
+----+------------+
| id | id_date |
+----+------------+
| 1 | 2023-01-26 |
| 2 | 2023-01-25 |
| 3 | 2023-01-24 |
+----+------------+
3 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) >
mysql [localhost:8032] {msandbox} (information_schema) > select * from information_schema.events\G
*************************** 1. row ***************************
EVENT_CATALOG: def
EVENT_SCHEMA: test
EVENT_NAME: del_event
DEFINER: msandbox@localhost
TIME_ZONE: SYSTEM
EVENT_BODY: SQL
EVENT_DEFINITION: call del_data()
EVENT_TYPE: RECURRING
EXECUTE_AT: NULL
INTERVAL_VALUE: 1
INTERVAL_FIELD: DAY
SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ALLOW_INVALID_DATES,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
STARTS: 2023-01-28 00:00:00
ENDS: NULL
STATUS: ENABLED
ON_COMPLETION: NOT PRESERVE
CREATED: 2023-01-27 21:13:16
LAST_ALTERED: 2023-01-27 21:13:16
LAST_EXECUTED: NULL
EVENT_COMMENT:
ORIGINATOR: 8032
CHARACTER_SET_CLIENT: utf8mb4
COLLATION_CONNECTION: utf8mb4_0900_ai_ci
DATABASE_COLLATION: utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (information_schema) >
附日期格式如下:
%M 月名字(January……December)
%W 星期名字(Sunday……Saturday)
%D 有英语前缀的月份的日期(1st, 2nd, 3rd, 等等。)
%Y 年, 数字, 4 位
%y 年, 数字, 2 位
%a 缩写的星期名字(Sun……Sat)
%d 月份中的天数, 数字(00……31)
%e 月份中的天数, 数字(0……31)
%m 月, 数字(01……12)
%c 月, 数字(1……12)
%b 缩写的月份名字(Jan……Dec)
%j 一年中的天数(001……366)
%H 小时(00……23)
%k 小时(0……23)
%h 小时(01……12)
%I 小时(01……12)
%l 小时(1……12)
%i 分钟, 数字(00……59)
%r 时间,12 小时(hh:mm:ss [AP]M)
%T 时间,24 小时(hh:mm:ss)
%S 秒(00……59)
%s 秒(00……59)
%p AM或PM
%w 一个星期中的天数(0=Sunday ……6=Saturday )
%U 星期(0……52), 这里星期天是星期的第一天
-- 存储过程基本语法
-- 创建
create procedure p1()
begin
select count(*) from t1;
end;
-- 调用
call p1();
-- 查看
select * from information_schema.ROUTINES where ROUTINE_SCHEMA = 'test';
show create procedure p1;
-- 删除
drop procedure if exists p1;
MySQL 8.0 IN查询的测试
5.7 版本的IN查询肯能用不到索引 8.0测试则可以使用
create table test1
(
id int auto_increment primary key,
pay_id int,
pay_time datetime,
other_col varchar(100)
);
-- 测试数据的特点是pay_id可重复,这里在存储过程处理成,循环插入300W条数据的过程中,每隔100条数据插入一条重复的pay_id,时间字段在一定范围内随机
DELIMITER //
CREATE DEFINER=`root`@`%` PROCEDURE `test1_insert`(IN `loopcount` INT)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
declare cnt int;
set cnt = 0;
while cnt< loopcount do
insert into test1 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
if (cnt mod 100 = 0) then
insert into test1 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
end if;
set cnt = cnt + 1;
end while;
END;
//
DELIMITER ;
DELIMITER //
CREATE PROCEDURE `test1_insert`(IN `loopcount` INT)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
declare cnt int;
set cnt = 0;
while cnt< loopcount do
insert into test1 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
if (cnt mod 100 = 0) then
insert into test1 (pay_id,pay_time,other_col) values (cnt,date_add(now(), interval floor(300*rand()) day),uuid());
end if;
set cnt = cnt + 1;
end while;
END;
//
DELIMITER ;
-- 插入30300行数据
-- set global innodb_flush_log_at_trx_commit=2;
-- 设置为1,也就是说每次事务提交,都会将innodb日志缓存写入磁盘,对磁盘效率影响很大,将它设置为2,每次事务提交时mysql都会把log buffer的数据写入log file,但是flush(刷到磁盘)操作并不会同时进行。该模式下,MySQL会每秒执行一次 flush(刷到磁盘)操作
-- iostat -k -d -x 1 10
call test1_insert(300000);
-- 创建索引
create index idx_pay_id on test1(pay_id);
两种子查询的写法
查询大概的意思是查询某个时间段之内的业务Id大于1的数据,于是就出现两种写法。
第一种写法如下:
IN子查询中是某段时间内业务统计行数大于1的业务Id,
外层按照IN子查询的结果进行查询,
业务Id的列pay_id上有索引,
这种写法,5.7 在数据量大的时候确实效率比较低,用不到索引
select * from test1 force index(primary)
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
explain select * from test1 force index(primary)
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
select * from test1 force index(idx_pay_id)
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
explain select * from test1 force index(idx_pay_id)
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
select * from test1
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
explain select * from test1
where pay_id in (
select pay_id from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
);
第二种写法,与子查询进行join关联,这种写法相当于上面的IN子查询写法,下面测试发现,效率确实有不少的提高
select tpp1.* from test1 tpp1,
(
select pay_id
from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
) tpp2
where tpp1.pay_id=tpp2.pay_id;
explain select tpp1.* from test1 tpp1,
(
select pay_id
from test1
where pay_time>="2023-06-01 00:00:00"
and pay_time<="2023-07-03 12:59:59"
group by pay_id
having count(pay_id) > 1
) tpp2
where tpp1.pay_id=tpp2.pay_id;
第一种写法:
(force主键不能用,但并没有使用index---> 如果没 create index idx_pay_id on test1(pay_id);)
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
| 1 | PRIMARY | test1 | NULL | ALL | NULL | NULL | NULL | NULL | 301979 | 100.00 | Using where |
| 2 | SUBQUERY | test1 | NULL | ALL | NULL | NULL | NULL | NULL | 301979 | 11.11 | Using where; Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+
(force idx_pay_id 能用)->默认是这种执行计划
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
| 1 | PRIMARY | test1 | NULL | ALL | NULL | NULL | NULL | NULL | 301930 | 100.00 | Using where |
| 2 | SUBQUERY | test1 | NULL | index | idx_pay_id | idx_pay_id | 5 | NULL | 301930 | 11.11 | Using where |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
第二种写法:
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+
| 1 | PRIMARY | tpp1 | NULL | ALL | NULL | NULL | NULL | NULL | 301979 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 5 | test.tpp1.pay_id | 10 | 100.00 | Using index |
| 2 | DERIVED | test1 | NULL | ALL | NULL | NULL | NULL | NULL | 301979 | 11.11 | Using where; Using temporary |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------+--------+----------+------------------------------+
Tips:
比如这样的语句是不能正确执行的。
select * from table where id in (select id from table limit 12);
但是,只要你再加一层就行。如:
select * from table where id in (select t.id from (select * from table limit 12)as t)
这样就可以绕开limit子查询的问题。
MySQL 某些优化案例
例子1:
SELECT ID FROM SENDLOG WHERE TO_DAYS(NOW())-TO_DAYS(GMT_CREATE) > 7;
问题:对索引列GMT_CREATE进行了运算,无法使用索引
优化后sql:
select id from sendlog where gmt_create < now() - 7
问题:result列含有索引,但mysql执行计划没有用到。
分析:result类型为char,传递的值1为整型,数据类型不一致,导致没法用索引,对于时间类型gmt_create > ‘2013-10-29 12:40:44’,可以直接使用。
优化后sql:
SELECT * FROM SENDLOG where result = '1' and gmt_create > '2013-10-29 12:40:44' limit 2000;
例子3: 场景:获取某个卖家未读的消息。
select count(*) from mc_msg where receiver='sun098' and status='UNREAD' and title is not null;
问题:有时候db负载飙高,sql响应时间变慢。
分析:导致db负载飙高的原因是多个大卖家并发查询的时,cpu和逻辑读增加,load飙高。由于receiver,status已有索引,sql本身已经没有优化空间,了解业务后发现其实业务不需要精确值,如果大于99条,页面就直接显示为99+
优化后sql:
select count(*) from (select id from mc_msg where receiver='sun098' and status='UNREAD' and title is not null limit 100) a;
例子4:
场景:查看历史订单留言记录,未读留言的放在前面,已读的放在后面,并且按时间递减排序
select * from(
select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ
from message_relation_sender
WHERE SENDER_ALI_ID = 119545671 and UNREAD_COUNT > 0
order by LAST_MESSAGE_ID desc) m
union all
select * from(
select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender
WHERE SENDER_ALI_ID = 119545671 and UNREAD_COUNT = 0
order by LAST_MESSAGE_ID desc) n limit 5000,15;
分析: (1)unread_count表示未读的订单留言记录数目; (2)第一个子查询获取未读留言记录,第二子查询获取已读留言记录; (3)LAST_MESSAGE_ID 递增,最新的订单留言,LAST_MESSAGE_ID最大。 (4)limit 5000,15是分页查询 这里使用union all,主要特点在于union all 不对结果集排序,直接合并,达到了“未读留言的放在前面,已读的放在后面”的效果,但同时也造成了两次扫描索引的结果,每个子查询都需要排序;而且union all还会产生临时表,执行代价会更大。
优化: 这里看到unread_count实际值对这个查询没有实际意义,我们只需要区分已读和未读即可。由于sql本身已经没有优化余地,考虑对表结构进行修改,加一个字段is_read,表示已读和未读。is_read=2表示未读;is_read=1表示已读。通过组合索引(SENDER_ALI_ID,is_read, LAST_MESSAGE_ID),既可以完成过滤,还可以完成排序。
优化后sql:
select ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender
where SENDER_ALI_ID = 119545671 order by is_read desc, LAST_MESSAGE_ID desc limit 5000,15
第二次优化: 由于索引不包含所有的返回字段,因此需要回表,而mysql对于limit 5000,15的查询却需要返回5015次,这种无效的返回很影响查询效率。 分页的优化写法:
select t1.ID,GMT_CREATE,GMT_MODIFIED,SENDER_ALI_ID,RECEIVER_ALI_ID,UNREAD_COUNT,STATUS,LAST_MESSAGE_ID,RELATION_ID,SELLER_ADMIN_SEQ,IS_READ from message_relation_sender t1,
(select id
from message_relation_sender
where SENDER_ALI_ID = 119545671 order by is_read desc, LAST_MESSAGE_ID desc limit 5000,15
)t2 where t1.id = t2.id
分析:由于id是主键,不需要回表,通过连接查询,最终只需要15次回表即可。
Referece
参考:
Have a good work&life! 2023/01 via LinHong