Unneeded MySQL indices

We have one of our tables in our database that is starting to be pretty big : 10M rows 2.14G for data
3.55G for indices

I was pretty surprised to see that the indices are almost twice as big as the data itself :/

So I showed the indices :

show index from entries;
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name                               | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| entries |          0 | PRIMARY                                |            1 | id          | A         |    13538389 |     NULL | NULL   |      | BTREE      |         | 
| entries |          0 | index_entries_on_link_and_feed_id      |            1 | link        | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_link_and_feed_id      |            2 | feed_id     | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_unique_id_and_feed_id |            1 | unique_id   | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_unique_id_and_feed_id |            2 | feed_id     | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_feed_id               |            1 | feed_id     | A         |       81556 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_time                  |            1 | time        | A         |      967027 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_created_at            |            1 | created_at  | A         |      846149 |     NULL | NULL   | YES  | BTREE      |         | 
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (1.35 sec)

As far as I can tell our code uses all the indices, yet some information might be duplicated : I am thinking that the index index_entries_on_feed_id might be a duplicate since both index_entries_on_link_and_feed_id and index_entries_on_unique_id_and_feed_id use it.

Am I right?

13.10.2009 17:18:45
I would expect MySQL to not use those other indexes because feed_id is not the first field in sequence. You should not be surprised that the indexes sometimes take up more space than the data.
Joe 13.10.2009 17:29:36
interesting. Does that mean that if I change the order in any of index_entries_on_link_and_feed_id or index_entries_on_unique_id_and_feed_id, I don't need the index_entries_on_feed_id index?
Julien Genestoux 13.10.2009 23:33:59
2 ОТВЕТА
РЕШЕНИЕ

A few observations:

If unique_id is really unique, then I'd check carefully to see if feed_id is really necessary: even if it's for a one-field lookup, the gain in performance is very small.

What's the difference between id (primary) and unique_id?

It's quite possible for indices to use more space than the data if you are indexing a relatively small row several different ways.

10M rows isn't really very large, unless you're scanning the whole thing, in which case it's way too big. Providing your queries are making appropriate use of the indexes, it shouldn't really matter for another 100M rows or more.

If you're writing any moderately complex queries, involving joining 2 or 3 tables, I'd strongly recommend using EXPLAIN to check the query plan - I've had some surprising improvements from hand-tuning complex MySQL queries.

2
13.10.2009 17:41:02
Thanks Mike. The unique_id is actually 'external data', and so the name might be misleading. In practice, we found that it's not really unique in our DB, but unique in the context of the external data. (Think to it as the <id> element if ATOM feed entries for example). Ok, then, I guess our situation is fine. We don't have a real performance impact so far... it's just that I wanted to be sure we're not "over-indexing", for future scalability, since this DB has a lot of writes, and quite few selects (80%/20%), and writes are where the number of index matters.
Julien Genestoux 13.10.2009 18:09:39

you can use EXPLAIN followed by your SQL queries to get info about used indices. If some indices are not used, you can drop them.

Also, your indices: index_entries_on_link_and_feed_id index_entries_on_unique_id_and_feed_id

are the same, even their size is the same, so you can delete them...

-1
13.10.2009 17:21:24
Well, no, they are different indices! And they're used by different queries :/ and yes, we used EXPLAIN and all the index are used... which doesn't mean that if we delete one, the query solver will not try to use -successfully- another one :/
Julien Genestoux 13.10.2009 17:27:35