Index? Yes, please

When altering some tables, I got into optimization adding indexes.

I have one table – users

mysql> select count(1) from users;
+----------+
| count(1) |
+----------+
|   389900 |
+----------+
1 row in set (0.08 sec)

and doing one query:

mysql> SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----------+
| COUNT(*) |
+----------+
|      913 |
+----------+
1 row in set (0.23 sec)

and slow-log complains

# Schema: test  Last_errno: 0  Killed: 0
# Query_time: 0.365658  Lock_time: 0.000060  Rows_sent: 1  Rows_examined: 389894  Rows_affected: 0
# Bytes_sent: 65
SET timestamp=1389055417;
SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:23:36') AND (last_logoff <> last_active_at or last_logoff is null);

from his you can see it did full table scan (Rows_examined: 389894) which is kinda wrong.

so, let’s explain

mysql> explain SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 385645 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

Hmm, “type” shows no index has been used and ALL table scanned. WRONG

Let’s add an index – we have two fields, last_active_at and last_logoff, let’s try with last_logoff only

mysql> alter table users add index ix_last_logoff (last_logoff);
Query OK, 0 rows affected (2.16 sec)
mysql> explain SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----+-------------+-------+------+----------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys  | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+----------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | users | ALL  | ix_last_logoff | NULL | NULL    | NULL | 385646 | Using where |
+----+-------------+-------+------+----------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

hmm, still using full table scan. Let’s drop and create combined index

mysql> alter table users drop index ix_last_logoff;
Query OK, 0 rows affected (1.40 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table users add index ix_last_logoff (last_logoff, last_active_at);
Query OK, 0 rows affected (2.20 sec)
Records: 0  Duplicates: 0  Warnings: 0

and what our explain friend shows now?

mysql> explain SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----+-------------+-------+-------+----------------+----------------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys  | key            | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+----------------+----------------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | users | index | ix_last_logoff | ix_last_logoff | 12      | NULL | 385646 | Using where; Using index |
+----+-------------+-------+-------+----------------+----------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----------+
| COUNT(*) |
+----------+
|      913 |
+----------+
1 row in set (0.23 sec)

Oops. Explain shows it’s using index (bit better than ALL), but still fetches 385646 rows. Still wrong. Let’s switch columns in index.

mysql> alter table users drop index ix_last_logoff;
Query OK, 0 rows affected (3.95 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table users add index ix_last_logoff (last_active_at, last_logoff);                                                         
Query OK, 0 rows affected (3.81 sec)
Records: 0  Duplicates: 0  Warnings: 0

What explain says now?

mysql> explain SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----+-------------+-------+-------+----------------+----------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys  | key            | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+----------------+----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | users | range | ix_last_logoff | ix_last_logoff | 6       | NULL |  963 | Using where; Using index |
+----+-------------+-------+-------+----------------+----------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Great! and select?

mysql> SELECT COUNT(*) FROM `users`  WHERE (last_active_at > '2014-01-07 00:17:42') AND (last_logoff <> last_active_at or last_logoff is null);
+----------+
| COUNT(*) |
+----------+
|      924 |
+----------+
1 row in set (0.00 sec)

0.20secs improvement.

Explanation? It’s simple. last_active_at is doing range scan against a value and against second column, thus first referenced. If last_logoff is first in combined index, it can’t be used for last_active_at > part of the query.