Background
InnoDB gathers statistics for the data in user tables, which are used by the MySQL optimizer to choose the best query plan. For a long time the imprecision and instability of these statistics have been creating problems for users.
The problem is that these statistics are recalculated at any of the following events:
* When the table is opened
* When the table has changed a lot (1/16th of the table has been updated/deleted or inserted)
* When ANALYZE TABLE is run
* When SHOW TABLE STATUS or SHOW INDEX FROM is run
* When InnoDB Monitor is turned ON
* others
and so their recalculation must be quick an unnoticeable. Thus the quick algorithm just picks 8 random pages and could give a wildly varying results.
Quick glance
To solve this problem MySQL 5.6.2 introduces the InnoDB Persistent Statistics feature, which:
* Uses a more precise sampling algorithm (which is also slower) in an attempt to better inform the MySQL optimizer so it can choose the best possible query plan.
* Does not recalculate the statistics automatically, only when ANALYZE TABLE is run. This means that the same query plan will always be chosen even when MySQL is restarted (see below). A query plan using the global statistics can only change after ANALYZE TABLE has been run (manually). This also means that the user and/or DBA is responsible for running ANALYZE TABLE regularly if the data in the table has been changed a lot.
* Saves the statistics on disk in normal user-visible and updatable SQL tables.
How to turn it ON?
To turn ON the persistent statistics for a given table you must first create the persistent statistics storage by executing the SQL script storage/innobase/scripts/persistent_storage.sql which is shipped in the source distribution (in future versions it may be executed automatically by mysql_install_db). This script will create a new database named “innodb” with two tables in it – “table_stats” and “index_stats”. InnoDB will read/write the statistics from/to those tables. Second you must set the new parameter innodb_analyze_is_persistent to ON (it is OFF by default). This will instruct InnoDB to use the better (and slower!) sampling algorithm during ANALYZE TABLE and to save the results to the stats tables. To change the number of leaf pages that are being sampled during such a “persistent-results” ANALYZE TABLE set the parameter innodb_stats_persistent_sample_pages to a higher value.
Once the stats “meta” tables are present and stats for a given table exist in them, then those stats will be used whenever stats are needed – including open table/show table status/show index from and others. To stop using them you must DELETE the corresponding rows from innodb.table_stats and innodb.index_stats.
Beware of bugs
Some bugs in persistent stats code which may cause InnoDB to crash have been fixed, but are not included in 5.6.2. See:
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/revision/3167.15.262
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/revision/3167.15.266
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/revision/3167.15.271
April 27th, 2011 at 1:08 pm
[...] ניתן לקרוא עוד בלינק הבא. [...]
December 31st, 2011 at 12:27 pm
Thanks for sharing this. Make innodb statistics persistent is a great job. In this slide, i notice one paragraph,which says that:
Does not recalculate the statistics automatically, only when ANALYZE TABLE is run. This means that the same query plan will always be chosen even when MySQL is restarted (see below).
I think it is not true. Mysql’s range query optimization will call function records_in_range to estimate rows in the range, and records_in_range calculate this value on the fly every time it is called, may result in different value. Different value returned by records_in_range will lead to different range query plan. An i right?
[Reply]
Vasil Dimov Reply:
January 16th, 2012 at 11:03 am
@he dengcheng,
Yes, ha_innobase::records_in_range() does estimate the number of records in a given range every time it is called and is not part of persistent statistics.
This is because it is not feasible to pre-estimate the number of rows in _all possible_ ranges in advance and reuse the results later.
But ha_innobase::records_in_range()’s algorithm does not use random sampling and is much more stable – for the same input it will always return the same result, given the part of the tree between the two records has not changed.
Btw ha_innobase::records_in_range() was improved in Aug 2010 in MySQL 5.5 and 5.6 to return more precise estimates.
[Reply]
February 5th, 2012 at 10:33 am
[...] can be different) – you can do SET GLOBAL innodb_analyze_is_persistent = 1; (read more: InnoDB Persistent Statistics at last, InnoDB Persistent Statistics Save the [...]
April 11th, 2012 at 9:13 am
[...] introducing InnoDB persistent statistics in MySQL 5.6 we have now dressed it up in a nice UI and refactored the internals a bit to make the [...]
April 22nd, 2012 at 2:22 pm
[...] typed the title of this post into Google and came across this post which gave me the clues I needed to solve the problem. It seems that by default InnoDB uses 8 [...]