Dec 20

Last month we did a few improvements in InnoDB memory usage. We solved a challenging issue about how InnoDB uses memory in certain places of the code.

The symptom of the issue was that under a certain workloads the memory used by InnoDB kept growing infinitely, until OOM killer kicked in. It looked like a memory leak, but Valgrind wasn’t reporting any leaks and the issue was not reproducible on FreeBSD – it only happened on Linux (see Bug#57480). Especially the latest fact lead us to think that there is something in the InnoDB memory usage pattern that reveals a nasty side of the otherwise good-natured Linux’s memory manager.

It turned out to be an interesting memory fragmentation caused by a storm of malloc/free calls of various sizes. We had to track and analyze each call to malloc during the workload, including the code path that lead to it. We collected a huge set of analysis data – some code paths were executed many 10’000s of times! A hurricane of allocations and deallocations! We looked at the hottest ones hoping that some of them are not necessary, can be eliminated, avoided, minimized or stuck together. Luckily there were plenty of them!

After an extensive testing we did a numerous improvements, allocating the smallest chunks of the memory from the stack instead of from the heap, grouping allocations together where possible, removing unnecessary allocations altogether, estimating exactly how much memory will be consumed by a given operation and allocating it in advance and others and others and others.

This not only fixed Bug#57480 but improved InnoDB memory usage in general.

Read the rest of this entry »

Dec 20

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

Read the rest of this entry »

Jul 27

In April of 2011, InnoDB team published the early access of NoSQL to InnoDB with memcached, plus several new features as part of MySQL 5.6.2 milestone release. This week, we announced additional early access to new InnoDB features for the community to test, and provide feedback.

There are two release packages from InnoDB team on MySQL Labs: InnoDB full-text search, and InnoDB new features.

InnoDB Full-Text Search

MySQL 5.5 makes InnoDB the default storage engine, so everyone can benefit from ACID-compliant transactions, referential integrity, crash recovery.  However, some users need InnoDB to have built-in full-text search, similar to MyISAM’s full-text search.

InnoDB full-text search provides users with the ability to build full text indices and search for specific text-based content stored in InnoDB tables.  This new functionality supports fast and accurate search on document content using natural language, boolean, wildcard, and proximity search.

Read the rest of this entry »

Apr 11

MySQL is the most popular open source SQL database. The ever-increasing performance demands of web-based services have generated significant interest in providing NoSQL access methods to MySQL. Today, MySQL is announcing the preview of the NoSQL to InnoDB via memcached. This offering provides users with the best of both worlds – maintain all of the advantages of rich SQL query language, while providing better performance for simple queries via direct access to shared data.

In this preview release, memcached is implemented as a MySQL plugin daemon, accessing InnoDB directly via the native InnoDB API:

Features provided in the current release:

  • Memcached as a daemon plugin of mysqld: both mysqld and memcached are running in the same process space, with very low latency access to data
  • Direct access to InnoDB: bypassing SQL parser and optimizer
  • Support standard protocol (memcapable): support both memcached text-based protocol and binary protocol; all 55 memcapable tests are passed
  • Support multiple columns: users can map multiple columns into “value”. The value is separated by a pre-defined “separator” (configurable).
  • Optional local caching: three options – “cache-only”, “innodb-only”, and “caching” (both “cache” and “innodb store”). These local options can apply to each of four Memcached operations (set, get, delete and flush).
  • Batch operations: user can specify the batch commit size for InnoDB memcached operations via “daemon_memcached_r_batch_size” and “daemon_memcached_w_batch_size” (default 32)
  • Support all memcached configure options through MySQL configure variable “daemon_memcached_option”

Read the rest of this entry »

Apr 11

In MySQL 5.6.2 we have introduced a new background thread named the page_cleaner in InnoDB. Adaptive flushing of modified buffer pool pages in the main background thread, async flushing by a foreground query thread if the log files are near to wrapping around, idle flushing and shutdown flushing are now moved to this thread, leaving only the last resort sync flushing in foreground query threads. We’ve also added counters for these activities.

As page_cleaner is all about the flushing of dirty pages to disk it’ll do you a world of good if you can go through this post where I have explained different types of flushing that happen inside InnoDB and the conditions that trigger flushing. The page_cleaner thread is only concerned with flush_list flushing (this may change in future releases). So let us dig a bit deeper into why flush_list flushing happens and why it would make sense to do this flushing in a separate thread. As is usually the case I have to skip some details to keep this note simple.

On a busy server flush_list flushing (which I’ll simply call flushing from this point onwards) happens under four conditions. In order to understand these conditions let us familiarize ourselves with the concept of checkpoint_age. The checkpoint_age is the difference between the current_lsn (the latest change to the database) and the last checkpoint_lsn (the lsn when last checkpoint happened). We obviously don’t want to let this difference grow beyond the log file size because if that happens then we end up overwriting redo log entries before the corresponding dirty pages are flushed to the disk, losing the ability to recover them. In order to avoid the above situation we maintain two high water marks to indicate if we are nearing the end of reusable redo log space. Lets call these water marks async_water_mark and sync_water_mark, where the later represents a more urgent situation than the former. Now that we have clarified the checkpoint_age concept let us get back to the four conditions under which flushing of dirty pages happens:

  1. checkpoint_age < async_water_mark
    • This condition means that we have enough reusable redo space. As such there is no hurry to flush dirty pages to the disk. This is the condition where we’d like our server to be most of the time.
    • Based on adaptive_flushing heuristics we flush some dirty pages in this state. This flushing happens in the background master thread.
    • During the flushing no other threads are blocked, so queries continue normally.

    Read the rest of this entry »

Apr 6

Purpose

What does purge exactly do and why is it needed? If you have ever wondered then read on. It is really a type of garbage collector. When a user issues a DML like “DELETE FROM t WHERE c = 1;”, InnoDB doesn’t remove the matching record. This is what happens under the hood:

  1. It marks the record as deleted by setting a bit in the control bits of the record.
  2. Stores the before image of the modified columns to the UNDO log
  3. Updates the system columns DB_TRX_ID and DB_ROLL_PTR in the clustered index record. DB_TRX_ID identifies the transaction that made the last change, and DB_ROLL_PTR points to the new UNDO log record. This UNDO log record contains the old values of DB_TRX_ID and DB_ROLL_PTR, possibly pointing to an older transaction and undo log entry.

From this you should be able to visualise that the UNDO log records related to the modified clustered record are in a disk based linked list, with the head anchored in the clustered index record. For the sake of simplicity I’ve ignored the UNDO log pages and the case where DML updates a record. This information is required by  rollback and MVCC (multi version concurrency control). For MVCC we need the entry to exist in the clustered index so that we can follow a pointer back to where the “before” changes were written in the UNDO log and use that information to construct a previous version of the record. For rollback we need the before  information of the record  (UNDO entry) so that we can restore it when a transaction is rolled back.

Another benefit of purging separately in the background is that expensive B+Tree block merge operations, if the removal of the record were to lead to underflow, can be done asynchronously by purge and not by user transactions.

Garbage collection

Read the rest of this entry »

Apr 16

What a busy week! Lots of MySQL 5.5 announcements that just happened to coincide with the MySQL Conference and Expo in Silicon Valley. Here are some highlights of the performance and scalability work that the InnoDB team was involved with.

A good prep for the week of news is the article Introduction to MySQL 5.5, which includes information about the major performance and scalability features. That article will lead you into the MySQL 5.5 manual for general features and the InnoDB 1.1 manual for performance & scalability info.

Then there were the conference presentations from InnoDB team members, which continued the twin themes of performance and scalability:

Read the rest of this entry »

May 13

Well, it took us a little while (we’ve been busy ;-) !), but we’ve now posted our presentations on InnoDB from the MySQL Conference and Expo 2009. You can download these presentations by Heikki Tuuri, Ken Jacobs and Calvin Sun from the InnoDB website, as follows:

The description of these and other presentations about InnoDB are available here.