Overview
Contents:
- MySQL architecture overview
- SQL optimization principles
- SQL optimization examples
1) MySQL basic architecture
Clients are the various front ends that connect to the server, such as command-line clients, Workbench, or GUI tools like Navicat. The server side is split into the SQL layer and the storage engine layer.
After a query is executed, the executor returns results. The executor may write results to the query cache so subsequent identical queries can be served from cache, and it also returns the response to the client.
Query engines
To list supported engines:
show engines;
To check the storage engine variable:
show variables like "%storage_engine%";
Specifying a storage engine for a table
create table tb(
id int(4) auto_increment,
name varchar(5),
dept varchar(5),
primary key(id)
) engine=myISAM auto_increment=1 default charset=utf8;
2) SQL optimization
Why optimize SQL?
Complex operations such as multi-table joins and subqueries can execute very slowly if SQL is not written efficiently. In addition to correctness, SQL performance has major impact on the overall system responsiveness.
MySQL query composition and parsing
Typical query writing order:
select dinstinct ..from ..join ..on ..where ..group by ..having ..order by ..limit ..
MySQL parsing order:
from .. on.. join ..where ..group by ..having ..select dinstinct ..order by ..limit ..
Main focus: index optimization
Indexing is the primary method for optimizing SQL. An index is like a dictionary index. Using an index lets the database quickly locate rows without scanning the entire table.
What is an index?
An index is a data structure that helps MySQL retrieve rows efficiently. MySQL commonly uses a B+ tree structure for indexes.
Index illustration
An index maps key values to the identifiers (physical addresses) of table rows. With an index on the column age, for example, age=50 in the index points to the corresponding row in the base table.
Without an index, a query like
select * from student where age=33
requires a full table scan. With an index, the database can search the tree structure, reducing the number of comparisons significantly.
Disadvantages of indexes
- Indexes consume storage space and memory/disk resources, especially when the data size is large.
- Indexes are not suitable for every situation: a) small tables; b) frequently updated columns; c) rarely queried columns.
- Indexes speed up queries but slow down insert, delete, and update operations, because indexes must also be maintained when the base table changes.
Advantages of indexes
- Improved query performance and reduced I/O.
- Lower CPU usage for operations such as ORDER BY. Indexes are smaller than the base table and the index entries are already ordered, so sorting can be much cheaper or avoided.
B+ tree illustration
MySQL uses B+ tree structures for indexes.
Notes on B+ trees:
- When people refer to B-tree in databases, they usually mean B+ tree. All actual data entries are stored in the leaf nodes.
- Internal nodes hold separators and pointers to direct searches to the correct child node.
- A B+ tree of height 3 can store millions of records by increasing the node fanout.
- The number of comparisons to find any record in a B+ tree is proportional to the tree height.
3) Index types and creation
Index classification
- Single-column index
- Unique index
- Composite (multi-column) index
Single-column index
An index built on a single column. A table can have multiple single-column indexes, one per column as needed.
Unique index
A single-column index that enforces uniqueness on the column values, such as an ID or student number.
Composite index
An index on multiple columns, for example (name, age). The composite index is first keyed on name, and when name values are equal, age is used to refine the search. Note that not all columns in a composite index must be used in a query for the index to be useful, but the leftmost prefix rule applies.
Creating indexes
Syntax example:
create index index_name on table(column);
Example table definitions:
create table course
(
cid int(3),
cname varchar(20),
tid int(3)
);
create table teacher
(
tid int(3),
tname varchar(20),
tcid int(3)
);
create table teacherCard
(
tcid int(3),
tcdesc varchar(200)
);
4) EXPLAIN execution plan keywords
1) id
Example: retrieve teacher records where course id is 2 or teacherCard id is 3.
# Check execution plan
explain select t.*
from teacher t,course c,teacherCard tc
where t.tid = c.tid and t.tcid = tc.tcid
and (c.cid = 2 or tc.tcid = 3);
2) select_type: query type
- simple: simple SELECT without subqueries or UNION
- primary: the outermost SELECT when subqueries are present
- subquery: nested subquery SELECT
- derived: derived table, i.e., uses a temporary table
3) type: join type
Values like system or const are ideal cases. Practical optimization often reaches up to index > range > ref levels.
4) possible_keys and key
possible_keys lists indexes that might be used. key shows the index actually used.
5) key_len
The length of the key used, useful for determining whether a composite index (a,b,c) is being fully utilized.
6) ref
Shows which column or constant is used with the key for lookups.
7) rows
Estimated number of rows examined to produce the result for that part of the query.
8) Extra
Additional information about the execution plan, often useful for diagnosing issues.
5) Optimization example
Setup
create table test03
(
a1 int(4) not null,
a2 int(4) not null,
a3 int(4) not null,
a4 int(4) not null
);
create index a1_a2_a3_test03 on test03(a1,a2,a3);
Check execution plan
explain select a3 from test03 where a1=1 and a2=2 and a3=3;
Summary and suggestions
- Prefer leftmost prefix usage: maintain consistent order between index definition and column usage in queries.
- Index design should be iterative: add indexes as needed and remove obsolete indexes.
- Place range queries such as those using IN at the end of WHERE conditions when possible to avoid index loss.
In the example where both Using where (requires returning to the base table) and Using index (does not require returning to the base table) appear: the condition authorid=1 and typeid in(2,3) with index (authorid,typeid,bid) means authorid can be used from the index without returning to the base table; however, a range condition on typeid like IN can make the typeid portion of the composite index unusable, causing a table lookup and resulting in Using where.
For example, removing the IN will avoid Using where:
explain select bid from book where authorid=1 and typeid=3 order by typeid desc;