MySQL 8.0 Study 007 Tips
学习系列
- MySQL 哈希分区
- MySQL 涵盖索引测试
- MySQL 用户管理一些Tips
- MySQL 半链接(Semi join)
MySQL 哈希分区
分区表优势:
分区表主要有以下几种优势:
大幅提升某些查询的性能。
简化日常数据运维工作量、提升运维效率。
并行查询、均衡写 IO 。
对应用透明,不需要在应用层部署路由或者中间层。
哈希分区示例:
create table t1_p_hash_id(
id bigint not null auto_increment,
col1 int default null,
logdate date default null,
primary key(id)
)
partition by hash(id)
partitions 100
;
select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't1_p_hash_id' group by left(partition_name,5);
表 t1_p_hash_id 的分区方式很好理解,按照 ID 来对数据进行 HASH 拆分,也就是按照分区数量求模, 类似于 hash(mod(id,100)) ,数据分布非常均匀。
SQL 1:select count(*) from t1_p_hash_id where id = 1;
SQL 2:select count(*) from t1_p_hash_id where id in (1,2,3,4,5);
SQL 3:select count(*) from t1_p_hash_id where id <=1;
SQL 4:select count(*) from t1_p_hash_id where id <=5;
SQL 1 和 SQL 2 非常适合检索哈希分区表,SQL 3 和 SQL 4 就不太适合。
mysql> explain select count(*) from t1_p_hash_id where id in (1,2,3,4,5)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1_p_hash_id
partitions: p1,p2,p3,p4,p5
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from t1_p_hash_id where id <=25\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1_p_hash_id
partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
mysql>
非等值的 SQL 3 和 SQL 4 需要扫描全部分区,类似全表扫。
所以需要注意的是哈希分区表仅限于等值过滤检索,类似对普通表基于哈希索引的检索。
MySQL 涵盖索引测试
CREATE TABLE t1 (
id bigint(20) not null,
col1 varchar(64) collate utf8_bin not null,
col2 tinyint(4) not null,
col3 bigint(20) default null,
path_name varchar(64) not null,
primary key(id),
key idx_t1_01(col1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
SELECT col1,
SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
SUM(col3)
FROM t1
GROUP BY col1;
下面语句能用到idx_t1_01
索引,但Extra是NULL
,需要回表通过主键再去扫数据
mysql [localhost:8032] {msandbox} (testsql) > explain SELECT col1,
-> SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
-> SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
-> SUM(col3)
-> FROM t1
-> GROUP BY col1;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
| 1 | SIMPLE | t1 | NULL | index | idx_t1_01 | idx_t1_01 | 194 | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (testsql) > explain analyze SELECT col1, SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders, SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
SUM(col3) FROM t1 GROUP BY col1 \G
*************************** 1. row ***************************
EXPLAIN: -> Group aggregate: sum((case when (t1.col2 = 0) then 1 else 0 end)), sum((case when (t1.col2 = 1) then 1 else 0 end)), sum(t1.col3) (cost=0.45 rows=1) (actual time=0.024..0.024 rows=0 loops=1)
-> Index scan on t1 using idx_t1_01 (cost=0.35 rows=1) (actual time=0.022..0.022 rows=0 loops=1)
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox}
通过创建联合索引,不需要回表了 Extra 是Using index
/ 用 Covering index scan
create index idx_t1_02 on (col1,col2,col3);
mysql [localhost:8032] {msandbox} (testsql) > create index idx_t1_02 on t1(col1,col2,col3);
Query OK, 0 rows affected (0.07 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql [localhost:8032] {msandbox} (testsql) > explain SELECT col1,
-> SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
-> SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
-> SUM(col3)
-> FROM t1
-> GROUP BY col1;
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | idx_t1_01,idx_t1_02 | idx_t1_02 | 204 | NULL | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (testsql) > explain analyze SELECT col1, SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders, SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
SUM(col3) FROM t1 GROUP BY col1\G
*************************** 1. row ***************************
EXPLAIN: -> Group aggregate: sum((case when (t1.col2 = 0) then 1 else 0 end)), sum((case when (t1.col2 = 1) then 1 else 0 end)), sum(t1.col3) (cost=0.45 rows=1) (actual time=0.023..0.023 rows=0 loops=1)
-> Covering index scan on t1 using idx_t1_02 (cost=0.35 rows=1) (actual time=0.022..0.022 rows=0 loops=1)
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (testsql) >
MySQL 用户管理一些Tips
创建用户
-- 1.创建无密码用户
mysql> create user hong@'10.0.0.%';
-- 2.创建有密码用户( identified by)
mysql> create user hong@'10.0.0.%' identified by '12345678';
-- 3.创建有密码用户时指定加密插件(with)
mysql> create user hong@'10.0.0.%' identified with mysql_native_password by '12345678';
-- 4.mysql8.0版本之前 授权和创建用户操作可同时(grant)
mysql> grant all on *.* to hong@'10.0.0.%' identified by '12345678';
-- 查看
mysql> select user,host,authentication_string,plugin from mysql.user;
删除用户
-- 一般不建议删除用户,直接lock用户即可
-- drop user hong@'10.0.0.%';
-- drop user if exists hong@'10.0.0.%';
mysql> alter user hong@'10.0.0.%' account lock;
查看密码的生命周期 default_password_lifetime
设置时间或者修改为永不过期
mysql [localhost:8032] {msandbox} ((none)) > show variables like 'default_password%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| default_password_lifetime | 0 |
+---------------------------+-------+
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} ((none)) > set default_password_lifetime=365;
ERROR 1229 (HY000): Variable 'default_password_lifetime' is a GLOBAL variable and should be set with SET GLOBAL
mysql [localhost:8032] {msandbox} ((none)) > set global default_password_lifetime=365;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost:8032] {msandbox} ((none)) > show variables like 'default_password%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| default_password_lifetime | 365 |
+---------------------------+-------+
1 row in set (0.01 sec)
mysql [localhost:8032] {msandbox} ((none)) >
mysql [localhost:8032] {msandbox} ((none)) > alter user hong@'%' password expire never;
Query OK, 0 rows affected (0.01 sec)
mysql [localhost:8032] {msandbox} ((none)) >
caching_sha2_password的一些说明,参考
MySQL caching_sha2_password快速问答
MySQL 半链接(Semi join)
SemiJoin定义
SemiJoin是表上的一个运算。假定R和S是两个表,R SemiJoin S可记为R ⋉ S,结果是在S中满足条件的元组。
结论:
①半连接的结果是左表的子集。
②增加右表一条重复记录,结果并不随之增多。
对于不同类型的子查询,优化器会选择不同的策略。
对于 IN、=ANY 子查询,优化器有如下策略选择:
semijoin
Materialization
exists
对于 NOT IN、<>ALL 子查询,优化器有如下策略选择:
Materialization
exists
对于 derived 派生表,优化器有如下策略选择:
derived_merge,将派生表合并到外部查询中(5.7引入 );
将派生表物化为内部临时表,再用于外部查询。
注意:update 和 delete 语句中子查询不能使用 semijoin、materialization 优化策略
CREATE TABLE `t2` (
`id` int(11) NOT NULL,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB;
drop procedure idata;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=1000)do
insert into t2 values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
create table t1 like t2;
insert into t1 (select * from t2 where id<=100)
mysql [localhost:8032] {msandbox} (test) > select count(*) from t1;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.04 sec)
mysql [localhost:8032] {msandbox} (test) > select count(*) from t2;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.04 sec)
mysql [localhost:8032] {msandbox} (test) > SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+------+------+
| id | a | b |
+----+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 7 | 7 | 7 |
| 8 | 8 | 8 |
| 9 | 9 | 9 |
+----+------+------+
9 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) >
mysql [localhost:8032] {msandbox} (test) > explain format=tree SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=4.95 rows=9)
-> Filter: (`<subquery2>`.b is not null) (cost=2.83..1.80 rows=9)
-> Table scan on <subquery2> (cost=3.25..5.58 rows=9)
-> Materialize with deduplication (cost=2.96..2.96 rows=9)
-> Filter: (t2.b is not null) (cost=2.06 rows=9)
-> Filter: (t2.id < 10) (cost=2.06 rows=9)
-> Index range scan on t2 using PRIMARY over (id < 10) (cost=2.06 rows=9)
-> Index lookup on t1 using a (a=`<subquery2>`.b) (cost=2.35 rows=1)
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > explain SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | t1 | NULL | ref | a | a | 5 | <subquery2>.b | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | t2 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (test) >
跟exist语句是等价的
mysql [localhost:8032] {msandbox} (test) > explain select * from t1 where exists(select b from t2 where id < 10 and t1.a=t2.b);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | t1 | NULL | ref | a | a | 5 | <subquery2>.b | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | t2 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 2 warnings (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `<subquery2>`.`b`) and (`test`.`t2`.`id` < 10))
2 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > explain SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | t1 | NULL | ref | a | a | 5 | <subquery2>.b | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | t2 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
mysql [localhost:8032] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `<subquery2>`.`b`) and (`test`.`t2`.`id` < 10))
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (test) >
如果外层表是一个非常大的表,对于外层查询的每一行,子查询都得执行一次,这个查询的性能会非常差。我们很容易想到将其改写成 join 来提升效率:
select t1.* from t1 join t2 on t1.a=t2.b and t2.id<10;
这样优化可以让 t2 表做驱动表,t1表关联字段有索引,查找效率非常高。
但这里会有个问题,join 是有可能得到重复结果的,而 in(select …) 子查询语义则不会得到重复值。
而 semijoin 正是解决重复值问题的一种特殊联接。
在子查询中,优化器可以识别出 in 子句中每组只需要返回一个值,在这种情况下,可以使用 semijoin 来优化子查询,提升查询效率。
这是 MySQL5.6 加入的新特性,MySQL5.6 以前优化器只有 exists 一种策略来“优化”子查询。
一个子查询能够转换成SemiJoin还需满足如下条件(没有全部测试过)
1. 必须是In或any子查询
2. 不包含UNION
3. 不包含GROUP BY 或 HAVING
4. 不包含聚集函数
5. 不包含LIMIT
6. 子查询有ORDER BY时,转换后去掉ORDER BY
7. 外部查询没有STRAIGHT_JOIN(错)
8. 是否相关子查询均支持
Referece
参考:
MySQL全面瓦解29:分库分表之Partition功能详解
Have a good work&life! 2023/02 via LinHong