1. Background and Architecture We all know that we can use programs to dynamically implement the logic required by our applications, so that we can get the results we need when the programs are executed. Then the database is an application that quickly obtains data by inputting SQL strings. Of course, assuming there is no system application such as a database, how can it be implemented with a program? We may find that no matter how the data is stored or whether the data is accessed concurrently, we still need to constantly modify the program to handle different requests for data from different applications. For example, in the field of big data, we usually obtain data through the API of non-relational databases. However, although this method is easy to get started, it is extremely difficult to maintain and is not very versatile. Even if you continue to design the software architecture or abstract reconstruction, you still need to constantly change the application. This is why non-relational databases turn back to embrace database SQL optimizers. The SQL optimizer is essentially an implementation of a highly abstract data interface. With this design, customers can use the more general and easy-to-understand SQL language to operate and process data without having to pay attention to and abstract their own data interfaces, which greatly liberates customer applications. This article will use graphical explanations to explain how the MySQL 8.0 SQL optimizer turns a simple string (SQL) into an execution sequence that the database executor can understand, and finally returns the data to the customer. A powerful optimizer does not require customers to pay attention to how to write SQL better to obtain the required data faster, so the optimizer will definitely make some equivalent changes to the original SQL. In the article "Detailed Explanation of the Latest Architecture of the Server Layer in MySQL 8.0", we focused on the overall introduction of the latest version of MySQL on the server layer parser, optimizer, and executor, including a detailed display of some code structures and changes, and used the simple_joins function to show how the MySQL optimizer simplifies the optimization of nested Joins in logical transformation. In this article, we will take you step by step into the magical optimizer details, and understand in detail how each step of the optimizer optimization part changes the final execution of a SQL. This article is based on the latest MySQL version 8.0.25. Since the optimizer conversion part is quite long, we divide it into two articles. The first part introduces the parsing conversion process based on the basic structure of Setup and Resolve. The second part introduces the complex conversion process of more complex subqueries, partition tables and joins. The outline is as follows: Setup and Resolve
2. Detailed conversion process The entire framework of the conversion is a top-down or bottom-up process from Query_expression to Query_block calling the prepare function (sql/sql_resolver.cc) according to the requirements of different conversion rules. 1 Pass null to the inner table list of join (propagate_nullability) Prepare starts with processing nullable tables, which means that a table may contain rows that are all null. According to the JOIN relationship (top_join_list), null rows can be propagated. If a table is determined to be nullable, some optimizations will be degraded, such as the access method cannot be EQ_REF, and outer join cannot be optimized to inner join. 2 Parsing the leave_tables (setup_tables) of the setup query block
Before setup_table is called, leaf_tables of each Query_block is 0. The function is to build leaf_tables, including base tables and derived tables, for subsequent optimization. setup_tables will not be called recursively, but only solve the tables of this layer and count the number of derived tables of this layer. However, it will then call resolve_placeholder_tables()->resolve_derived()->derived(Query_expression)::prepare->Query_block::prepare to specifically recursively process the Query_expression corresponding to the derived table. Next, we will continue to look at the resolve_placeholder_tables function for derived table processing according to the call order of prepare. 3. Resolve query block Derived Table, View, Table function (resolve_placeholder_tables) This function is used to process derived tables, views, and table functions. If the table has been merged, or if it is called using transform_grouped_to_derived() and the materialized table method has been decided, it will be ignored. The function of resolve_derived() has been introduced before. Now let’s focus on the merge_derived() function. merge_derived changes the Query_expression/Query_block framework structure and merges the derived table or view into the query block. merge_derived processes and merges Derived table 1) Prerequisites for merge_derived transformation Whether the outer query block allows merge (allow_merge_derived) The outer query block is nullptr The subquery of the outer query expression is nullptr, and the derived table is the first-level subquery The outer query block of the outer layer can have allow_merge_derived=true, or if the outer query block does not include the outer layer, is it SELECT/SET Use the derived_table->nested_join structure to assist in handling OUTER JOIN situations. merge_derived diagram process It seems that the official derived merge is still not perfect and cannot perform a bottom-up recursive merge.
This optimization can be controlled by set optimizer_switch="derived_merge=on/off". setup_materialized_derived Set up the materialized Derived Table For the remaining derived tables that cannot be processed by the merge algorithm, they will be processed by the materialize method. However, at this time, only some preprocessing such as variable settings is performed, and the actual materialization execution is performed in the executor stage. setup_materialized_derived_tmp_table(): Sets up a temporary table containing all row data of the materialized Derived Table.
setup_table_function handles table functions If there is a table function in the query block, the whole process will be processed twice. The first time, the table of the table function will be skipped, and the second time, the above logic will be executed again on the table of the table function. The consideration here should be to resolve the external environment (relative to the table function) first, because it is possible that the function parameters will depend on the external derived table.
4 Expand the wildcard of SELECT * into specific fields (setup_wild) 5. Set up base_ref_items at the query_block level (setup_base_ref_items) base_ref_items records the location of all items, so that other items in the query block can refer to them, or directly refer to them through Item_ref and its Item_ref subclasses, such as subquery references (Item_view_ref), aggregate function references (Item_aggregate_ref), outer query column references (Item_outer_ref), and subquery reference assistance for NULL values (Item_ref_null_helper). Let's take an example to illustrate a more complex Item_outer_ref: 6 Fix_fields() and column permission check for select_fields (setup_fields) The following figure shows a more complex fixed field process with subqueries. Some fields are associated with tables, and some need to add corresponding Item_xxx_ref references. 7 Parsing and fixed_fields WHERE conditions and Join conditions (setup_conds) If there is nested_join in setup_join_cond, setup_join_cond will be recursively called for parsing and setting. Here we also introduce the function of simplify_const_condition function. If a const Item that can be deleted is found, Item_func_true/Item_func_false will be used to replace the entire condition, as shown in the figure. 8 Parsing and setting ROLLUP statements (resolve_rollup) In a database query statement, adding the WITH ROLLUP statement after the GROUP BY expression allows you to perform analysis and statistics on data at different levels through a single query statement.
Sorting has the problem of NULL, so the hierarchical aggregation is very difficult to achieve. In addition, the SQL complexity changes when the group column changes. ROLLUP can achieve the effect very easily. Let's see what kind of conversion rollup does in the parsing process to achieve unexpected results. 9 Parsing and setting GROUP BY/ORDER BY statements (setup_group/setup_order) One of the functions, find_order_in_list(), tries to find a column that can be mapped in the select fields. Otherwise, the current column must be added to the final projected all fields, and fix_fields is also performed. m_having_cond->fix_fields: fix_fields for the having condition. remove_redundant_subquery_clause : For the expression of Table Subquery, usually IN/ANY/ALL/EXISTS/etc, if there is no aggregate function and Having clause, you can usually consider removing unnecessary ORDER/DISTINCT/GROUP BY. This function supports three types: REMOVE_ORDER | REMOVE_DISTINCT | REMOVE_GROUP. If it is a subquery of SINGLEROW_SUBS, only consider removing REMOVE_ORDER.
Determine whether unnecessary DISTINCT statements can be deleted. The deletion condition is that all GROUP BY columns are in the SELECT list and there are no ROLLUP and Window functions.
For example, scenario: SELECT DISTINCT c1, max(c2) from t1 group by c1; 10 Parse and set Window function (Window::setup_windows1) |
<<: Number One "Thousand Faces" - What you see may not be me
>>: Summary of the State Council Information Office press conference, involving 5G, chips, etc.
[51CTO.com original article] With the trend of cl...
Although 6G will not be implemented until 2030, t...
The wonderful opening ceremony of the 2022 Beijin...
No matter how important an activity is, it seems ...
Will the "joker" break your cognitive b...
In the era of information explosion, consumers ar...
Preface This article conducts an in-depth analysi...
After the rapid development in 2020, 2021 is a cr...
Author | Lei Ge Source | Java interview questions...
5G will become one of the hot technologies in the...
With the rise of IoT applications and the increas...
The 2017 Asia-Pacific CDN Summit was held in Beij...
[[416937]] Experimental requirements ISP-1 and IS...
The 2018 Yunnan-Huawei Software Industry Summit w...
[[417906]] A new study by Juniper Research found ...