Lin Hong's TECH Blog! 刀不磨要生锈,人不学习要落后 - Thinking ahead

MySQL 8.0 Study 007 Tips

2023-02-01

MySQL 8.0 Study 007 Tips

学习系列

  • MySQL 哈希分区
  • MySQL 涵盖索引测试
  • MySQL 用户管理一些Tips
  • MySQL 半链接(Semi join)

MySQL 哈希分区

分区表优势:

分区表主要有以下几种优势:
大幅提升某些查询的性能。
简化日常数据运维工作量、提升运维效率。
并行查询、均衡写 IO 。
对应用透明,不需要在应用层部署路由或者中间层。

哈希分区示例:

create table t1_p_hash_id(
  id bigint not null auto_increment,
  col1 int default null,
  logdate date default null,
  primary key(id)
)
partition by hash(id)
partitions 100
;

select left(partition_name,5) p,sum(table_rows) cnt from information_schema.partitions where table_name = 't1_p_hash_id' group by left(partition_name,5);

表 t1_p_hash_id 的分区方式很好理解,按照 ID 来对数据进行 HASH 拆分,也就是按照分区数量求模, 类似于 hash(mod(id,100)) ,数据分布非常均匀。

SQL 1:select count(*) from t1_p_hash_id where id = 1;
SQL 2:select count(*) from t1_p_hash_id where id in (1,2,3,4,5);
SQL 3:select count(*) from t1_p_hash_id where id <=1;
SQL 4:select count(*) from t1_p_hash_id where id <=5;

SQL 1 和 SQL 2 非常适合检索哈希分区表,SQL 3 和 SQL 4 就不太适合。

mysql> explain select count(*) from t1_p_hash_id where id in (1,2,3,4,5)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1_p_hash_id
   partitions: p1,p2,p3,p4,p5
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from t1_p_hash_id where id <=25\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1_p_hash_id
   partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> 

非等值的 SQL 3 和 SQL 4 需要扫描全部分区,类似全表扫。

所以需要注意的是哈希分区表仅限于等值过滤检索,类似对普通表基于哈希索引的检索。

MySQL 涵盖索引测试

CREATE TABLE t1 (
  id bigint(20) not null,
  col1 varchar(64) collate utf8_bin not null,
  col2 tinyint(4) not null,
  col3 bigint(20) default null,
  path_name varchar(64) not null,
  primary key(id),
  key idx_t1_01(col1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;


SELECT col1,
       SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
       SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
       SUM(col3)
FROM  t1
GROUP BY col1;

下面语句能用到idx_t1_01索引,但Extra是NULL,需要回表通过主键再去扫数据

mysql [localhost:8032] {msandbox} (testsql) > explain SELECT col1,
    ->        SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
    ->        SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
    ->        SUM(col3)
    -> FROM  t1
    -> GROUP BY col1;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | index | idx_t1_01     | idx_t1_01 | 194     | NULL |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (testsql) > explain analyze SELECT col1,        SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,        SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
  SUM(col3) FROM  t1 GROUP BY col1 \G
*************************** 1. row ***************************
EXPLAIN: -> Group aggregate: sum((case when (t1.col2 = 0) then 1 else 0 end)), sum((case when (t1.col2 = 1) then 1 else 0 end)), sum(t1.col3)  (cost=0.45 rows=1) (actual time=0.024..0.024 rows=0 loops=1)
    -> Index scan on t1 using idx_t1_01  (cost=0.35 rows=1) (actual time=0.022..0.022 rows=0 loops=1)

1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox}

通过创建联合索引,不需要回表了 Extra 是Using index / 用 Covering index scan

create index idx_t1_02 on (col1,col2,col3);

mysql [localhost:8032] {msandbox} (testsql) > create index idx_t1_02 on t1(col1,col2,col3);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (testsql) > explain SELECT col1,
    ->        SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,
    ->        SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
    ->        SUM(col3)
    -> FROM  t1
    -> GROUP BY col1;
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys       | key       | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | index | idx_t1_01,idx_t1_02 | idx_t1_02 | 204     | NULL |    1 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------------+-----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (testsql) > explain analyze SELECT col1,        SUM(CASE WHEN col2=0 THEN 1 ELSE 0 END) as folders,        SUM(CASE WHEN col2=1 THEN 1 ELSE 0 END) as files,
  SUM(col3) FROM  t1 GROUP BY col1\G
*************************** 1. row ***************************
EXPLAIN: -> Group aggregate: sum((case when (t1.col2 = 0) then 1 else 0 end)), sum((case when (t1.col2 = 1) then 1 else 0 end)), sum(t1.col3)  (cost=0.45 rows=1) (actual time=0.023..0.023 rows=0 loops=1)
    -> Covering index scan on t1 using idx_t1_02  (cost=0.35 rows=1) (actual time=0.022..0.022 rows=0 loops=1)

1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (testsql) >  

MySQL 用户管理一些Tips

创建用户

-- 1.创建无密码用户
mysql> create user hong@'10.0.0.%';
-- 2.创建有密码用户( identified by)
mysql> create user hong@'10.0.0.%' identified by '12345678';
-- 3.创建有密码用户时指定加密插件(with)
mysql> create user hong@'10.0.0.%' identified with mysql_native_password by '12345678';
-- 4.mysql8.0版本之前 授权和创建用户操作可同时(grant)
mysql> grant all on *.* to hong@'10.0.0.%' identified by '12345678';

-- 查看
mysql> select user,host,authentication_string,plugin from mysql.user;

删除用户

-- 一般不建议删除用户,直接lock用户即可
-- drop user hong@'10.0.0.%';
-- drop user if exists hong@'10.0.0.%';
mysql> alter user hong@'10.0.0.%' account lock;

查看密码的生命周期 default_password_lifetime 设置时间或者修改为永不过期

mysql [localhost:8032] {msandbox} ((none)) > show variables like 'default_password%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| default_password_lifetime | 0     |
+---------------------------+-------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} ((none)) > set default_password_lifetime=365;
ERROR 1229 (HY000): Variable 'default_password_lifetime' is a GLOBAL variable and should be set with SET GLOBAL
mysql [localhost:8032] {msandbox} ((none)) > set global default_password_lifetime=365;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost:8032] {msandbox} ((none)) > show variables like 'default_password%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| default_password_lifetime | 365   |
+---------------------------+-------+
1 row in set (0.01 sec)

mysql [localhost:8032] {msandbox} ((none)) > 
mysql [localhost:8032] {msandbox} ((none)) > alter user hong@'%' password expire never;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost:8032] {msandbox} ((none)) > 

caching_sha2_password的一些说明,参考

MySQL caching_sha2_password快速问答

MySQL 半链接(Semi join)

SemiJoin定义

SemiJoin是表上的一个运算。假定R和S是两个表,R SemiJoin S可记为R ⋉ S,结果是在S中满足条件的元组。

结论:

①半连接的结果是左表的子集。
②增加右表一条重复记录,结果并不随之增多。

对于不同类型的子查询,优化器会选择不同的策略。

对于 IN、=ANY 子查询,优化器有如下策略选择:
semijoin
Materialization
exists
对于 NOT IN、<>ALL 子查询,优化器有如下策略选择:
Materialization
exists
对于 derived 派生表,优化器有如下策略选择:
derived_merge,将派生表合并到外部查询中(5.7引入 );
将派生表物化为内部临时表,再用于外部查询。
注意:update 和 delete 语句中子查询不能使用 semijoin、materialization 优化策略
CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB;

drop procedure idata;
delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=1000)do
    insert into t2 values(i, i, i);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();

create table t1 like t2;
insert into t1 (select * from t2 where id<=100)
mysql [localhost:8032] {msandbox} (test) > select count(*) from t1;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.04 sec)

mysql [localhost:8032] {msandbox} (test) > select count(*) from t2;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.04 sec)

mysql [localhost:8032] {msandbox} (test) > SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+------+------+
| id | a    | b    |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    2 |    2 |
|  3 |    3 |    3 |
|  4 |    4 |    4 |
|  5 |    5 |    5 |
|  6 |    6 |    6 |
|  7 |    7 |    7 |
|  8 |    8 |    8 |
|  9 |    9 |    9 |
+----+------+------+
9 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) >
mysql [localhost:8032] {msandbox} (test) > explain format=tree SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=4.95 rows=9)
    -> Filter: (`<subquery2>`.b is not null)  (cost=2.83..1.80 rows=9)
        -> Table scan on <subquery2>  (cost=3.25..5.58 rows=9)
            -> Materialize with deduplication  (cost=2.96..2.96 rows=9)
                -> Filter: (t2.b is not null)  (cost=2.06 rows=9)
                    -> Filter: (t2.id < 10)  (cost=2.06 rows=9)
                        -> Index range scan on t2 using PRIMARY over (id < 10)  (cost=2.06 rows=9)
    -> Index lookup on t1 using a (a=`<subquery2>`.b)  (cost=2.35 rows=1)

1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > explain SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type  | table       | partitions | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL          | NULL |   100.00 | Using where |
|  1 | SIMPLE       | t1          | NULL       | ref   | a             | a       | 5       | <subquery2>.b |    1 |   100.00 | NULL        |
|  2 | MATERIALIZED | t2          | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL          |    9 |   100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

跟exist语句是等价的

mysql [localhost:8032] {msandbox} (test) > explain select * from t1 where exists(select b from t2 where id < 10 and t1.a=t2.b);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type  | table       | partitions | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL          | NULL |   100.00 | Using where |
|  1 | SIMPLE       | t1          | NULL       | ref   | a             | a       | 5       | <subquery2>.b |    1 |   100.00 | NULL        |
|  2 | MATERIALIZED | t2          | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL          |    9 |   100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 2 warnings (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `<subquery2>`.`b`) and (`test`.`t2`.`id` < 10))
2 rows in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > explain SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type  | table       | partitions | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL          | NULL |   100.00 | Using where |
|  1 | SIMPLE       | t1          | NULL       | ref   | a             | a       | 5       | <subquery2>.b |    1 |   100.00 | NULL        |
|  2 | MATERIALIZED | t2          | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL          |    9 |   100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `<subquery2>`.`b`) and (`test`.`t2`.`id` < 10))
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (test) > 

如果外层表是一个非常大的表,对于外层查询的每一行,子查询都得执行一次,这个查询的性能会非常差。我们很容易想到将其改写成 join 来提升效率:

select t1.* from t1 join t2 on t1.a=t2.b and t2.id<10;

这样优化可以让 t2 表做驱动表,t1表关联字段有索引,查找效率非常高。

但这里会有个问题,join 是有可能得到重复结果的,而 in(select …) 子查询语义则不会得到重复值。

而 semijoin 正是解决重复值问题的一种特殊联接。

在子查询中,优化器可以识别出 in 子句中每组只需要返回一个值,在这种情况下,可以使用 semijoin 来优化子查询,提升查询效率。

这是 MySQL5.6 加入的新特性,MySQL5.6 以前优化器只有 exists 一种策略来“优化”子查询。

一个子查询能够转换成SemiJoin还需满足如下条件(没有全部测试过)

1. 必须是In或any子查询
2. 不包含UNION
3. 不包含GROUP BY 或 HAVING
4. 不包含聚集函数
5. 不包含LIMIT
6. 子查询有ORDER BY时,转换后去掉ORDER BY
7. 外部查询没有STRAIGHT_JOIN(错)
8. 是否相关子查询均支持

Referece

参考:

MySQL优化器和SemiJoin优化

MySQL 分区表案例分享

MySQL全面瓦解29:分库分表之Partition功能详解

Have a good work&life! 2023/02 via LinHong


Similar Posts

Comments