我们的文章会在微信公众号IT民工的龙马人生和博客网站( www.htz.pw )同步更新 ,欢迎关注收藏,也欢迎大家转载,但是请在文章开始地方标注文章出处,谢谢!
由于博客中有大量代码,通过页面浏览效果更佳。
本文转自朋友的真实案例分享。
MySQL :eq_range_index_dive_limit参数
在学习mysql统计信息和执行计划的过程中,发现了一个挺有意思的问题。就是mysql优化器在没有创建直方图的情况下,在评估rows上能做到非常精准。
oracle CBO在评估rows时,如果没有非常准的频率直方图,是无法做的精准的rows评估的。因为oracle在对象存在统计信息的情况下,所有cost、rows评估都是完全依赖统计信息的。其中rows的评估依赖dba_tables.num_rows、dba_tab_columns.num_distinct、num_nulls、density、low_value、high_value等等。
看一个oracle简单的例子
SQL> explain plan for select * from test.t where c='C';Explained.SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1386885587---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 350 | 2450 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 350 | 2450 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_C | 350 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - access("C"='C')SQL> explain plan for select * from test.t where c='D';Explained.SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1386885587---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 175 | 1225 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 175 | 1225 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_C | 175 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - access("C"='D') SQL> select c,count(*) from test.t group by c; C COUNT(*)
---------------------------------------- ----------
A 1000
B 10
C 40
可以看到c=’C’或’D’都与实际值相差较多。实际上对于rows评估,oracle是完全依赖于统计信息的。
SQL> select NUM_DISTINCT,NUM_NULLS,DENSITY,LOW_VALUE,HIGH_VALUE from dba_tab_columns where owner='TEST' and table_name='T' and column_name='C';NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE
------------ ---------- ---------- ---------- ----------3 0 .333333333 41 43SQL> select NUM_ROWS from dba_tables where owner='TEST' and table_name='T';NUM_ROWS
----------1050
c=’C’落在LOW_VALUE和HIGH_VALUE之间,rows评估公式如下:
rows=(NUM_ROWS-NUM_NULLS)/NUM_DISTINCT
所以rows=1050/3=350
c=’D’在LOW_VALUE和HIGH_VALUE之外,rows评估公式如下:
rows=NUM_ROWS/(2*DENSITY)
所以rows=1050/2/3=175
可以看出oracle在字段数据倾斜很严重的时候,rows评估会造成比较大的偏差,所以才会有直方图的出现。即使有直方图,oracle也无法非常精准的评估出rows,因为CBO是完全依赖统计信息的。
下面来看看mysql的测试:
mysql> select c,count(*) from t group by c;
+------+----------+
| c | count(*) |
+------+----------+
| A | 2000 |
| B | 1000 |
| C | 1000 |
| D | 500 |
| E | 800 |
| F | 700 |
+------+----------+
6 rows in set (0.01 sec)mysql> explain select * from t where c='D';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t | NULL | ref | idx_t_c | idx_t_c | 63 | const | 500 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)mysql> explain select * from t where c='F';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t | NULL | ref | idx_t_c | idx_t_c | 63 | const | 700 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)mysql> explain select * from t where c='E';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t | NULL | ref | idx_t_c | idx_t_c | 63 | const | 800 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)mysql> explain select * from t where c='B';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t | NULL | ref | idx_t_c | idx_t_c | 63 | const | 1000 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
在没有直方图的情况下,可以看到rows的评估非常精准。
mysql> select * from mysql.innodb_index_stats where table_name='t';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| lixinyan | t | GEN_CLUST_INDEX | 2023-02-24 21:28:18 | n_diff_pfx01 | 6000 | 14 | DB_ROW_ID |
| lixinyan | t | GEN_CLUST_INDEX | 2023-02-24 21:28:18 | n_leaf_pages | 14 | NULL | Number of leaf pages in the index |
| lixinyan | t | GEN_CLUST_INDEX | 2023-02-24 21:28:18 | size | 15 | NULL | Number of pages in the index |
| lixinyan | t | idx_t_a | 2023-02-24 21:28:18 | n_diff_pfx01 | 1 | 7 | a |
| lixinyan | t | idx_t_a | 2023-02-24 21:28:18 | n_diff_pfx02 | 6000 | 7 | a,DB_ROW_ID |
| lixinyan | t | idx_t_a | 2023-02-24 21:28:18 | n_leaf_pages | 7 | NULL | Number of leaf pages in the index |
| lixinyan | t | idx_t_a | 2023-02-24 21:28:18 | size | 8 | NULL | Number of pages in the index |
| lixinyan | t | idx_t_c | 2023-02-24 21:28:18 | n_diff_pfx01 | 6 | 7 | c |
| lixinyan | t | idx_t_c | 2023-02-24 21:28:18 | n_diff_pfx02 | 6000 | 7 | c,DB_ROW_ID |
| lixinyan | t | idx_t_c | 2023-02-24 21:28:18 | n_leaf_pages | 7 | NULL | Number of leaf pages in the index |
| lixinyan | t | idx_t_c | 2023-02-24 21:28:18 | size | 8 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
从统计信息中也无法做出如此精准的判断,这是什么原因呢?
原因就在于mysql在没有直方图之前,如果数据倾斜严重的情况下,容易走错执行计划。所以引入了一个index dive的特性。官方文档介绍如下:
Equality Range Optimization of Many-Valued Comparisons
Consider these expressions, where col_name is an indexed column:col_name IN(val1, …, valN)
col_name = val1 OR … OR col_name = valN
Each expression is true if col_name is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.
Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.
With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.To update table index statistics for best estimates, use ANALYZE TABLE.Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness, except by using the eq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:The query is for a single table, not a join on multiple tables.
A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.
The index is nonunique and not a FULLTEXT index.
No subquery is present.
No DISTINCT, GROUP BY, or ORDER BY clause is present.
For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:For traditional output, the rows and filtered values are NULL.
For JSON output, rows_examined_per_scan and rows_produced_per_join do not appear, skip_index_dive_due_to_force is true, and cost calculations are not accurate.
Without FOR CONNECTION, EXPLAIN output does not change when index dives are skipped.After execution of a query for which index dives are skipped, the corresponding row in the Information Schema OPTIMIZER_TRACE table contains an index_dives_for_range_access value of skipped_due_to_force_index.
该特性的意思是mysql在评估满足index dive生效条件的执行计划时,会实际去访问一些page,从而获取更加精准的rows。以便得到正确的执行计划。并不是完全依靠统计信息。
index dive的参数由eq_range_index_dive_limit控制。
mysql> show variables like '%eq_range%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 200 |
+---------------------------+-------+
0 :始终使用index dive的方式。
1 :不使用index dive的方式。
N :1+N 条件个数。
index dive的适用条件:
非唯一索引字段的in或者or,col_name IN (val1, ..., valN)/col_name = val1 OR … OR col_name = valN
非唯一索引字段的(< > <= >=)的范围扫描
非唯一索引字段的等值查询col_name=val
index dive的限制:
在 MySQL 8.0 之前,只能使用eq_range_index_dive_limit跳过index dive。在 MySQL 8.0 中,满足这些条件的查询可能会跳过index dive:
多表关联
使用了FORCE INDEX HINT
存在子查询
存在DISTINCT, GROUP BY, or ORDER BY
索引为全文索引
通过optimizer_trace验证是否使用了index dive:
mysql> set optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)mysql> select * from t where c='D';
...mysql> select * from information_schema.optimizer_trace\G
*************************** 1. row ***************************QUERY: select * from t where c='D'
..."analyzing_range_alternatives": {"range_scan_alternatives": [{"index": "idx_t_c","ranges": ["c = 'D'"],"index_dives_for_eq_ranges": true,"rowid_ordered": true,"using_mrr": false,"index_only": false,"in_memory": 1,"rows": 500,"cost": 175.26,"chosen": true}]
...
后续小菜鸟还初探了一下源码,可以看到每次index dive采样的page最多10个page,以及rows评估的公式。
/* Do not read more than this number of pages in order not to hurtperformance with this code which is just an estimation. If we readthis many pages before reaching slot2->page_no then we estimate theaverage from the pages scanned so far. */constexpr uint32_t N_PAGES_READ_LIMIT = 10;page_id_t page_id(dict_index_get_space(index), slot1->page_no);const fil_space_t *space = fil_space_get(index->space);ut_ad(space);const page_size_t page_size(space->flags);level = slot1->page_level;do {mtr_t mtr;page_t *page;buf_block_t *block;mtr_start(&mtr);/* Fetch the page. Because we are not holding theindex->lock, the tree may have changed and we may beattempting to read a page that is no longer part ofthe B-tree. We pass Page_fetch::POSSIBLY_FREED in order tosilence a debug assertion about this. */block =buf_page_get_gen(page_id, page_size, RW_S_LATCH, nullptr,Page_fetch::POSSIBLY_FREED, UT_LOCATION_HERE, &mtr);page = buf_block_get_frame(block);/* It is possible that the tree has been reorganized in themeantime and this is a different page. If this happens thecalculated estimate will be bogus, which is not fatal asthis is only an estimate. We are sure that a page withpage_no exists because InnoDB never frees pages, onlyreuses them. */if (!fil_page_index_page_check(page) ||btr_page_get_index_id(page) != index->id ||btr_page_get_level(page) != level) {/* The page got reused for something else */mtr_commit(&mtr);goto inexact;}/* It is possible but highly unlikely that the page wasoriginally written by an old version of InnoDB that didnot initialize FIL_PAGE_TYPE on other than B-tree pages.For example, this could be an almost-empty BLOB pagethat happens to contain the magic values in the fieldsthat we checked above. */n_pages_read++;if (page_id.page_no() != slot1->page_no) {/* Do not count the records on slot1->page_no,we already counted them before this loop. */n_rows += page_get_n_recs(page);}page_id.set_page_no(btr_page_get_next(page, &mtr));mtr_commit(&mtr);if (n_pages_read == N_PAGES_READ_LIMIT || page_id.page_no() == FIL_NULL) {/* Either we read too many pages orwe reached the end of the level without passingthrough slot2->page_no, the tree must have changedin the meantime */goto inexact;}} while (page_id.page_no() != slot2->page_no);return (n_rows);inexact:*is_n_rows_exact = false;/* We did interrupt before reaching slot2->page */if (n_pages_read > 0) {/* The number of pages on this level isn_rows_on_prev_level, multiply it by theaverage number of recs per page so far */n_rows = n_rows_on_prev_level * n_rows / n_pages_read;} else {/* The tree changed before we could evenstart with slot1->page_no */n_rows = 10;}return (n_rows);
}
N_PAGES_READ_LIMIT 10
n_rows = n_rows_on_prev_level * n_rows / n_pages_read
------------------作者介绍-----------------------
姓名:黄廷忠
现就职:Oracle中国高级服务团队
曾就职:OceanBase、云和恩墨、东方龙马等
电话、微信、QQ:18081072613
个人博客: (http://www.htz.pw)
CSDN地址: (https://blog.csdn.net/wwwhtzpw)
博客园地址: (https://www.cnblogs.com/www-htz-pw)