To speed up bulk loading of data, InnoDB implements an insert buffer, a special index in the InnoDB system tablespace that buffers modifications to secondary indexes when the leaf pages are not in the buffer pool. Batched merges from the insert buffer to the index pages result in less random access patterns than when updating the pages directly. This speeds up the operation on hard disks.
In MySQL 5.5, the insert buffer has been extended to a change buffer, which covers all modifications of secondary index leaf pages. This will improve the performance of bulk deletes and updates, transaction rollback and the purging of deleted records (reducing the “purge lag”).
To assess the benefits of the extended buffering, you may want to run benchmarks with the settings innodb_change_buffering=all, innodb_change_buffering=inserts, and innodb_change_buffering=none. Users of solid-state storage, where random reads are about as fast as sequential reads, might benefit from disabling the buffering altogether.
Read on to learn how the change buffering works.
Operations on Secondary Indexes
InnoDB can perform three kinds of modifications on secondary index records. If the affected index page is not in the buffer pool, the modifications can be buffered in the change buffer. When an index lookup or scan needs a page that is not in the buffer pool, the page will be read from the tablespace and any buffered changes will be merged to it.
The following operations can modify secondary index pages:
- Insert
- Inserting a record; supported in all versions of InnoDB
- Delete-mark
- Marking a record for deletion
- Purge
- Removing a deleted record that is no longer accessible by active transactions
Before MySQL 5.5, UPDATE, DELETE and purge operations were performed directly on the index pages, resulting in random-access I/O. In MySQL 5.5, all of these operations can be buffered.
Implications of InnoDB Multiversioning
In InnoDB, there are two types of indexes: the clustered index B-tree, where the records are stored in the PRIMARY KEY order, and secondary index B-trees, which identify rows by primary key. InnoDB multiversion concurrency control (MVCC) treats these indexes differently.
Records in the clustered index can be updated in place, and their hidden system columns DB_TRX_ID, DB_ROLL_PTR point to undo log entries from which earlier versions can be reconstructed. InnoDB secondary index records do not contain any system columns, and their data is never updated in place. An UPDATE of an indexed column requires the operations Delete-mark(old), Insert(new) and eventually Purge(old) in the secondary index. An UPDATE of a PRIMARY KEY results in Delete-mark, Insert and eventually Purge in all indexes.
When a secondary index record has been marked for deletion or when the page has been updated by a newer transaction, InnoDB will look up the clustered index record. In the clustered index, it suffices to check the DB_TRX_ID and only retrieve the correct version from the undo log when the record was modified after the reading transaction started.
To Buffer or not to Buffer
When a page is in the buffer pool, it will always be updated directly. When a page is loaded to the buffer pool, any buffered changes will be merged to it, so that users never see unmerged changes.
Because change buffering works on individual leaf pages, we cannot buffer changes that would result into page splits or merges, but must perform such changes on the B-tree pages directly.
The insert buffer bitmap keeps track on the available space on pages and prevents overflows when buffering inserts. Delete-marking records can always be buffered, because the flag will be updated in place. Purging a delete-marked record could result in an empty page, something that we do not allow. We determine the non-emptiness of a page from previously buffered operations on the same page. If there are no previously buffered operations, the purge will have to load the index page to the buffer pool.
InnoDB refuses to buffer an operation when the on-disk change buffer tree would grow bigger than ⅓ of the in-memory buffer pool (innodb_buffer_pool_size). This might be a good rule-of-thumb, but some setups could benefit from the ability of setting the change buffer size independently of the buffer pool size.
Conclusion
The InnoDB change buffer is a persistent data structure and a complex mechanism that comes into play when the workload does not fit in the buffer pool. Because it trades random I/O with a larger amount of sequential I/O, it speeds up operation on hard disks, where random access is much slower than sequential access.
On solid-state storage, there is not much difference between sequential and random access times. Change buffering may still be useful if writes to solid-state storage are expensive, either in terms of speed or the consumption of limited program/erase cycles. Change buffering could reduce the write load on user tablespaces and cause more writes to the system tablespace (which contains the insert buffer) and the redo log. These should be placed on a hard disk.
September 20th, 2010 at 1:35 am
[...] URL: Transactions on InnoDB » Blog Archive » MySQL 5.5: InnoDB Change … a-change-buffer, all-modifications, and-updates, change-buffer, improve-the-performance, [...]
September 23rd, 2010 at 5:09 am
Thanks for all of the details. It will take me some time to remember all of this but knowing this will help us get more from InnoDB.
[Reply]
September 23rd, 2010 at 5:16 am
Thank you for the details. Queries that use a secondary index scan are index only except when the seconary index page has been updated by a more recent transaction. I will try to remember that one.
[Reply]
September 24th, 2010 at 7:15 am
wow, an update on secondary index columns became delete, insert and purge three operations. that hurts.
[Reply]
Marko Mäkelä Reply:
September 27th, 2010 at 8:38 pm
@Yingkuan, it is the multiversioning and the “covering indexes” feature of MySQL/InnoDB that causes the complexity. Kristian Köhntopp blogged on Covering indexes und MVCC some time ago. It is in German, but some quotes are in English. In the beginning it points to a comparison of advanced indexing between MySQL and Postgres.
If you want to be really hurt, update the
PRIMARY KEY. That one will become delete, insert and purge in each and every index B-tree of the table, including the clustered index. While you are at it, use a multi-column primary key instead of a 32-bitauto_incrementinteger, so that the secondary index records will be longer.[Reply]
October 30th, 2010 at 5:04 pm
When an update is done for a secondary index page (delete-marked + insert) and there are no buffered changes for that page must the page be loaded to determine that the insert buffer would not cause a page split? Or is the insert buffer bitmap sufficient?
I ask because a common case for me is an IO-bound workload and I want change buffering for the first change to a secondary index page.
[Reply]
Marko Mäkelä Reply:
November 1st, 2010 at 10:13 pm
When the secondary index leaf page is not in the buffer pool, a Delete-mark operation will always be buffered without consulting the bitmap. Insert operations can lead to page splits and Purge operations can lead to page merges.
An update of a secondary index record is implemented as a combination of Insert and Delete-mark operations.
Even if there are no buffered operations for a page, there will be an up-to-date insert buffer bitmap that covers the page. The bitmap page repeats every page_size pages. For instance, the bitmap of an uncompressed tablespace would be at pages 1, 16385, 32769, ….
Let us assume that the secondary index leaf page is not in the buffer pool, innodb_change_buffering=all, and no operations for the page exist in the change buffer. If the 2-bit ‘free space’ count (
IBUF_BITMAP_FREE) on the bitmap page indicates that there is enough free space on the page, the insert will be buffered. The delete-mark operation (which can be on the same or on a different page than the insert) can always be buffered, since it would update an existing record in place. The attempt to buffer a purge operation would consult the previously buffered entries for the page. In our example, the delete-mark operation would always be buffered on the page. Having at most one buffered operation on the page would not guarantee nonemptiness of the page after applying the purge operation.Only if the insert operation was buffered for the same page, then [unless we get a hash collision in ibuf_get_volume_buffered_hash()] we would know that there will exist at least two records on the page at the time when the purge gets merged, and the page will be nonempty after merging the purge operation.
I believe that it is unusual for the insert and delete-mark of an
UPDATEstatement to be buffered on the same page. You could have that when updating thePRIMARY KEY, I guess. Thus, normally the purge would force a change buffer merge and an in-buffer update, unless there were multiple user operations buffered for the page before the purge kicked in.I could have countered this by allocating a some bits for estimating the number of records on the page, but this flaw did not occur to me. In fact, the original design, which was not done by me, was supposed to leave empty leaf pages in the index B-trees and work around any resulting issues. This turned out to be inefficient and challenging (an endless source of bugs).
An anecdote: The insert operation could be merged by delete-unmarking a delete-marked record. This may be wrong in case-insensitive indexes, as Bug #56680 shows.
[Reply]
April 6th, 2011 at 12:12 am
[...] Normally, InnoDB would update all indexes of a table when rows are inserted or deleted. If you update an indexed column, InnoDB would have to delete the old value and insert the new value in the corresponding index. If you update a primary key column, the row would be deleted and inserted in every index of the table. This can be slow even despite of the change buffering in MySQL 5.5. [...]
April 6th, 2011 at 10:52 am
[...] multiple entries for the same key, these need to be removed (garbage collected) too by Purge (see InnoDB Change Buffering for details regarding secondary [...]
April 12th, 2011 at 7:56 am
[...] scheme. First, the master thread is also tasked to do other background activities. It has to do change buffer merges and possibly purge (though starting with 5.5 we have an option to use a dedicated thread for [...]