MySQL Architecture and SQL Optimization Guide

Overview

Contents:

MySQL architecture overview
SQL optimization principles
SQL optimization examples

1) MySQL basic architecture

Clients are the various front ends that connect to the server, such as command-line clients, Workbench, or GUI tools like Navicat. The server side is split into the SQL layer and the storage engine layer.

After a query is executed, the executor returns results. The executor may write results to the query cache so subsequent identical queries can be served from cache, and it also returns the response to the client.

Query engines

To list supported engines:

show engines;

To check the storage engine variable:

show variables like "%storage_engine%";

Specifying a storage engine for a table

create table tb( id int(4) auto_increment, name varchar(5), dept varchar(5), primary key(id) ) engine=myISAM auto_increment=1 default charset=utf8;

2) SQL optimization

Why optimize SQL?

Complex operations such as multi-table joins and subqueries can execute very slowly if SQL is not written efficiently. In addition to correctness, SQL performance has major impact on the overall system responsiveness.

MySQL query composition and parsing

Typical query writing order:

select dinstinct ..from ..join ..on ..where ..group by ..having ..order by ..limit ..

MySQL parsing order:

from .. on.. join ..where ..group by ..having ..select dinstinct ..order by ..limit ..

Main focus: index optimization

Indexing is the primary method for optimizing SQL. An index is like a dictionary index. Using an index lets the database quickly locate rows without scanning the entire table.

What is an index?

An index is a data structure that helps MySQL retrieve rows efficiently. MySQL commonly uses a B+ tree structure for indexes.

Index illustration

An index maps key values to the identifiers (physical addresses) of table rows. With an index on the column age, for example, age=50 in the index points to the corresponding row in the base table.

Without an index, a query like

select * from student where age=33 requires a full table scan. With an index, the database can search the tree structure, reducing the number of comparisons significantly.

Disadvantages of indexes

Indexes consume storage space and memory/disk resources, especially when the data size is large.
Indexes are not suitable for every situation: a) small tables; b) frequently updated columns; c) rarely queried columns.
Indexes speed up queries but slow down insert, delete, and update operations, because indexes must also be maintained when the base table changes.

Advantages of indexes

Improved query performance and reduced I/O.
Lower CPU usage for operations such as ORDER BY. Indexes are smaller than the base table and the index entries are already ordered, so sorting can be much cheaper or avoided.

B+ tree illustration

MySQL uses B+ tree structures for indexes.

Notes on B+ trees:

When people refer to B-tree in databases, they usually mean B+ tree. All actual data entries are stored in the leaf nodes.
Internal nodes hold separators and pointers to direct searches to the correct child node.
A B+ tree of height 3 can store millions of records by increasing the node fanout.
The number of comparisons to find any record in a B+ tree is proportional to the tree height.

3) Index types and creation

Index classification

Single-column index
Unique index
Composite (multi-column) index

Single-column index

An index built on a single column. A table can have multiple single-column indexes, one per column as needed.

Unique index

A single-column index that enforces uniqueness on the column values, such as an ID or student number.

Composite index

An index on multiple columns, for example (name, age). The composite index is first keyed on name, and when name values are equal, age is used to refine the search. Note that not all columns in a composite index must be used in a query for the index to be useful, but the leftmost prefix rule applies.

Creating indexes

Syntax example:

create index index_name on table(column);

Example table definitions:

create table course ( cid int(3), cname varchar(20), tid int(3) ); create table teacher ( tid int(3), tname varchar(20), tcid int(3) ); create table teacherCard ( tcid int(3), tcdesc varchar(200) );

4) EXPLAIN execution plan keywords

1) id

Example: retrieve teacher records where course id is 2 or teacherCard id is 3.

# Check execution plan explain select t.* from teacher t,course c,teacherCard tc where t.tid = c.tid and t.tcid = tc.tcid and (c.cid = 2 or tc.tcid = 3);

2) select_type: query type

simple: simple SELECT without subqueries or UNION
primary: the outermost SELECT when subqueries are present
subquery: nested subquery SELECT
derived: derived table, i.e., uses a temporary table

3) type: join type

Values like system or const are ideal cases. Practical optimization often reaches up to index > range > ref levels.

4) possible_keys and key

possible_keys lists indexes that might be used. key shows the index actually used.

5) key_len

The length of the key used, useful for determining whether a composite index (a,b,c) is being fully utilized.

6) ref

Shows which column or constant is used with the key for lookups.

7) rows

Estimated number of rows examined to produce the result for that part of the query.

8) Extra

Additional information about the execution plan, often useful for diagnosing issues.

5) Optimization example

Setup

create table test03 ( a1 int(4) not null, a2 int(4) not null, a3 int(4) not null, a4 int(4) not null ); create index a1_a2_a3_test03 on test03(a1,a2,a3);

Check execution plan

explain select a3 from test03 where a1=1 and a2=2 and a3=3;

Summary and suggestions

Prefer leftmost prefix usage: maintain consistent order between index definition and column usage in queries.
Index design should be iterative: add indexes as needed and remove obsolete indexes.
Place range queries such as those using IN at the end of WHERE conditions when possible to avoid index loss.

In the example where both Using where (requires returning to the base table) and Using index (does not require returning to the base table) appear: the condition authorid=1 and typeid in(2,3) with index (authorid,typeid,bid) means authorid can be used from the index without returning to the base table; however, a range condition on typeid like IN can make the typeid portion of the composite index unusable, causing a table lookup and resulting in Using where.

For example, removing the IN will avoid Using where:

explain select bid from book where authorid=1 and typeid=3 order by typeid desc;