Apr 16

What a busy week! Lots of MySQL 5.5 announcements that just happened to coincide with the MySQL Conference and Expo in Silicon Valley. Here are some highlights of the performance and scalability work that the InnoDB team was involved with.

A good prep for the week of news is the article Introduction to MySQL 5.5, which includes information about the major performance and scalability features. That article will lead you into the MySQL 5.5 manual for general features and the InnoDB 1.1 manual for performance & scalability info.

Then there were the conference presentations from InnoDB team members, which continued the twin themes of performance and scalability:

Read the rest of this entry »

Apr 14
With the exception of Windows InnoDB has used ‘simulated AIO’ on all other platforms to perform certain IO operations. The IO requests that have been performed in a ‘simulated AIO’ way are the write requests and the readahead requests for the datafile pages. Let us first look at what does ‘simulated AIO’ mean in this context.

We call it ‘simulated AIO’ because it appears asynchronous from the context of a query thread but from the OS perspective the IO calls are still synchronous. The query thread simply queues the request in an array and then returns to the normal working. One of the IO helper thread, which is a background thread, then takes the request from the queue and issues a synchronous IO call (pread/pwrite) meaning it blocks on the IO call. Once it returns from the pread/pwrite call, this helper thread then calls the IO completion routine on the block in question which includes doing a merge of buffered operations, if any, in case of a read. In case of a write, the block is marked as ‘clean’ and is removed from the flush_list. Some other book keeping stuff also happens in IO completion routine.

What we have changed in the InnoDB Plugin 1.1 is to use the native AIO interface on Linux. Note that this feature requires that your system has libaio installed on it. libaio is a thin wrapper around the kernelized AIO on Linux. It is different from Posix AIO which requires user level threads to service AIO requests. There is a new boolean switch, innodb_use_native_aio, to choose between simulated or native AIO, the default being to use native AIO.

Read the rest of this entry »

Apr 13
One of the well known and much written about complaint regarding InnoDB recovery is that it does not scale well on high-end systems. Well, not any more. In InnoDB plugin 1.0.7 (which is GA) and plugin 1.1 (which is part of MySQL 5.5.4) this issue has been addressed. Two major improvements, apart from some other minor tweaks, have been made to the recovery code. In this post I’ll explain these issues and the our solution for these.

First issue reported here is about available memory check eating up too much CPU. During recovery, the first phase, called redo scan phase, is where we read the redo logs from the disk and store them in a hash table. In the second phase, the redo application phase, these redo log entries are applied to the data pages. The hash table that stores the redo log entries grows in the buffer pool i.e.: memory for the entries is allocated in 16K blocks from the buffer pool. We have to ensure that the hash table does not end up allocating all the memory in the buffer pool leaving us with no room to read in pages during the redo log application phase. For this we have to keep checking the size of the heap that we are using for allocating the memory for the hash table entries. So why would it kill the performance? Because we do not have the total size of the heap available to us. We calculate it by traversing the list of blocks so far allocated. Imagine if we have gigabytes or redo log to apply (it can be up to 4G). That would mean hundreds of thousands of blocks in the heap! And we have to make a check roughly whenever we are reading in a new redo page during scan. An O(n * m) algorithm where ‘n’ is number of blocks in the heap and ‘m’ is number of redo pages that have to be scanned.

What is the solution we came up with? Store the total size of a heap in its header. Simple and effective. Our algorithm now becomes O(m).

Read the rest of this entry »

Apr 13

Performance Schema Support in InnoDB

With the plugin 1.1 release, InnoDB will have full support of Performance Schema, a new feature of MySQL 5.5 release. This allows a user to peak into some critical server synchronization events and obtain their usage statistics. On the other hand, in order to make a lot of sense of the instrumented result, you might need some understanding of InnoDB internals, especially in the area of synchronization with mutexes and rwlocks.

With this effort, the following four modules have been performance schema instrumented.

1. Mutex
2. RWLOCKs
3. File I/O
4. Thread

Almost all mutexes (42), rwlocks (10) and 6 types of threads are instrumented. Most mutex/rwlock instrumentations are turned on by default, a few of them are under special define. For File I/O, their statistics are categorized into Data, Log and Temp file I/O.

This blog is to give you a quick overview on this new machinery.

Read the rest of this entry »

Apr 13
Background

The original motivation behind this patch was the infamous Bug#26590MySQL does not allow more than 1023 open transactions. Actually the 1024 limit has to do with the number of concurrent update transactions that can run within InnoDB. Where does this magic number come from ? 1024 is the total number of UNDO log list slots on one rollback segment header page. And in the past InnoDB created just one rollback segment header page during database creation. This rollback segment header page is anchored in the system header page, there is space there for 128 rollback segments but only one was being created and used resulting in the 1024 limit. Each slot in the rollback segment header array comprises of {space_id, page_no}, where both space_id and page_no are of type uint32_t . Currently the space id is “unused” and always points to the system table space, which is tablespace 0. Now, onto the rollback segment header page. This page contains a rollback segment header (details of which are outside the scope of this blog entry :-) ), followed by an array of 1024 UNDO slots. Each slot is the base node of a file based linked list of UNDO logs. Each node in this file based list contains UNDO log records, containing the data updated by a transaction. A single UNDO log node can contain UNDO entries from several different transactions.

Performance ramifications

When a transaction is started it is allocated a rollback segment to write its modifications. Multiple transactions can write to the same rollback segment but only one transaction is allowed to write to any one UNDO slot during its lifetime. This should make clear where the 1024 limit comes from. Each rollback segment is protected by its own mutex and when we have a single rollback segment this rollback segment mutex can become a high contention mutex.

Requirements

Backward compatibility in file formats is something we take very seriously at InnoDB.  InnoDB has always had the ability to use up to 128 pages but before this fix it created only one rollback segment. We had to figure out a way to make the multiple rollback segments change backward compatible, without breaking any assumptions in the code of older versions of InnoDB about absolute locations of system pages and  changes to system data. The 128 limit is a result of the latter. While there is space for 256 rollback segments, InnoDB uses only 7 bits from that field. Once we fix that we could in the future enable 256 rollback segments, however 128 seems to be sufficient for now. There are other scalability issues that need to be addressed first before 128K concurrent transactions will become an issue :-) .

Read the rest of this entry »

Apr 1

The InnoDB Plugin manual is now available on the MySQL web site.