Prefix Index Optimization#
Build an index on the first few characters of a string field.
Limitations:
- Cannot use prefix index with order by.
- Cannot use prefix index as a covering index.
Covering Index Optimization#
Meaning: Refers to all fields selected in the SQL select statement, which can be found in the leaf nodes of the secondary index B+Tree. This means that the records can be queried from the secondary index B+Tree without the need to perform a lookup through the clustered index, thus avoiding the need for a round trip operation.
For example, when only the name and price of a product are needed, a composite index is created with [Product ID, Name, Price] as a composite index. If this data exists in the index, the query will not need to retrieve the primary key index again, thus avoiding a round trip (reducing I/O operations).
Primary Key Index is Best to be Auto-Incremented#
By default, InnoDB creates a primary key index as a clustered index, and the data is stored in the leaf nodes of the B+Tree (leaf nodes are automatically sorted).
Using an auto-incremented primary key: Each time a new record is inserted, it is an append operation and does not require data to be moved.
Using a non-auto-incremented primary key: Since the index value of the primary key inserted each time is random, when inserting new data, it may be inserted into a position in the middle of an existing data page, which will require moving other data to accommodate the insertion of new data, and may even need to copy data from one page to another. We usually refer to this situation as page splitting. Page splitting can also cause a large amount of memory fragmentation, resulting in a less compact index structure, which affects query efficiency.
For example, suppose the data in a data page is 1, 3, 5, 9, and the data page is full. Now, if you want to insert data 7, you need to split the data page into two:
When page splitting occurs, moving records from one page to another will affect performance and reduce the utilization of page space, resulting in wasted storage space.
However, if the records are sequentially inserted, for example, when inserting data 11, only a new data page needs to be allocated, and no page splitting will occur:
The length of the primary key field should not be too large, because the smaller the length of the primary key field, the smaller the leaf nodes of the secondary index (the leaf nodes of the secondary index store the primary key values), so the space occupied by the secondary index is also smaller.
Index Columns Should be Set as NOT NULL#
To better utilize the index, the index columns should be set with the NOT NULL constraint. There are two reasons for this:
- Having NULL values in the index columns will make the optimizer more complex and difficult to optimize when selecting indexes, because nullable columns make index selection, index statistics, and value comparisons more complex. For example, when performing index statistics, count will omit rows with NULL values.
- NULL values are meaningless, but they occupy physical space, so they will cause storage space issues. When InnoDB stores records, if there are fields that allow NULL values in the table, at least 1 byte of space will be used to store the NULL value list in the row format, as shown in the purple part of the following figure:
Prevent Index Inefficiency#
Using an index does not guarantee that it will be used during a query. Common cases of index inefficiency include:
- Using left or left-right fuzzy matching, such as
like '%xx'
orlike '%xx%'
. - Performing calculations, functions, or type conversions on indexed columns in the WHERE clause.
- Not following the principle of leftmost matching for composite indexes.
- In the WHERE clause, using
where indexed column or non-indexed column
.
In practice, it is necessary to check the execution plan (explain) to determine whether the index is used. The following figure shows a query statement that performs a full table scan without using an index:
For the results of the execution plan:
- The possible_keys field indicates the possible indexes that can be used.
- The key field indicates the actual index used. If this item is NULL, it means that no index is used.
- The key_len field indicates the length of the index.
- The rows field indicates the number of rows scanned.
- Important!! The type field indicates the type of data scan, and the execution efficiency is in the order from low to high:
- All (full table scan)!! Avoid if possible.
- index (full index scan)!! Avoid if possible.
- range (index range scan): At least reach this level.
- ref (non-unique index scan): Duplicate values exist.
- eq_ref (unique index scan): eq_ref is usually used in multi-table joins.
- const (scan of a primary key or unique index with only one result): const is used for comparison with constants, which is faster.
- Extra:
- !! Avoid Using filesort: When the query statement includes group by and cannot use an index for sorting.
- !! Avoid Using temporary: Using a temporary table to store intermediate results. MySQL uses a temporary table when sorting the query results, commonly seen in order by and group by queries.
- !! Good Using index: The required data can be obtained from the secondary index B+Tree, so there is no need to perform a round trip to retrieve the data one by one. This means that a covering index is used to avoid round trip operations.