Dec 23

Continues from Improving InnoDB memory usage.

Here are some numbers from the fixups described in the above article:

The workload consists of 10 partitioned tables, each one containing 1000 partitions. This means 10’000 InnoDB tables. We truncate the tables, then restart mysqld and run:

1. INSERT a single row into each of the 10 tables
2. SELECT * from each table
3. FLUSH TABLES (this causes the tables to be closed and reopened on the next run)
4. wait for 10 seconds

we repeat the above steps 10 times. Here is the total memory consumption by mysqld with 1GB InnoDB buffer pool during the workload:

Read the rest of this entry »

Dec 20

Last month we did a few improvements in InnoDB memory usage. We solved a challenging issue about how InnoDB uses memory in certain places of the code.

The symptom of the issue was that under a certain workloads the memory used by InnoDB kept growing infinitely, until OOM killer kicked in. It looked like a memory leak, but Valgrind wasn’t reporting any leaks and the issue was not reproducible on FreeBSD – it only happened on Linux (see Bug#57480). Especially the latest fact lead us to think that there is something in the InnoDB memory usage pattern that reveals a nasty side of the otherwise good-natured Linux’s memory manager.

It turned out to be an interesting memory fragmentation caused by a storm of malloc/free calls of various sizes. We had to track and analyze each call to malloc during the workload, including the code path that lead to it. We collected a huge set of analysis data – some code paths were executed many 10’000s of times! A hurricane of allocations and deallocations! We looked at the hottest ones hoping that some of them are not necessary, can be eliminated, avoided, minimized or stuck together. Luckily there were plenty of them!

After an extensive testing we did a numerous improvements, allocating the smallest chunks of the memory from the stack instead of from the heap, grouping allocations together where possible, removing unnecessary allocations altogether, estimating exactly how much memory will be consumed by a given operation and allocating it in advance and others and others and others.

This not only fixed Bug#57480 but improved InnoDB memory usage in general.

Read the rest of this entry »

Dec 20

The problem and its cause

There have been several complaints over the years about InnoDB’s inability to scale beyond 256 connections. One of the main issues behind this scalability bottleneck was the read view creation that is required for MVCC (Multi Version Concurrency Control) to work. When the user starts a transaction this is what InnoDB does under the hood:

  • Create or reuse a transaction instance – usually it is reused, the transactions are reused from a pool (trx_sys_t::mysql_trx_list).
  • Initialize the transaction start time and assign a rollback segment
  • Append the transaction to an active  transaction list ordered on trx_t::id in descending order

The append to  the trx_sys_t::trx_list and corresponding remove during commit is covered by trx_sys_t::mutex. After the transaction is “started” and if the transaction has an isolation greater than or equal to REPEATABLE-READ then before the first record/row is accessed by the transaction, InnoDB creates a view (snapshot) of the running system state. It does this by examining the transactions that are active at the time of the MVCC snapshot, so that their changes can be excluded from the creating transaction’s read view. This read view creation is also covered by the trx_sys_t::mutex. As the number of active transactions in the system increases this read view creation takes longer and longer. This increases the wait times on the trx_sys_t::mutex (during transaction start and commit) and once threads are forced to wait on a condition variable (in contrast to simply spinning while waiting for the mutex) the system throughput drops dramatically.

The solution

While investigating this problem there were two observations that I made: Read the rest of this entry »

Dec 20

InnoDB full-text search (FTS) is finally available in MySQL 5.6.4 release. The feature has been on trial through MySQL’s summer lab release, thus we had several blogs covering the feature. In this blog, we will leave the “how to” part of the feature to those blogs, and focus on some important characteristics of this new feature, so you will have a better understanding when trying on the feature.

The InnoDB Full-text Index as an Inverted Index

When comes to the basic design, InnoDB takes a traditional way to implementation the full-text index, which is a so called “Inverted Index”. It composes of a set of auxiliary “index tables” that stores the “Word” “Doc ID” pair, as well as each word’s position in the original text.

Incoming text strings in the inserting records are extracted out, tokenized and decomposed into individual words, and such words are then inserted into the auxiliary “index tables” along with their position info and the Doc ID associated with the record. With the position info, we will be able to support proximity search that is lacking in MyISAM FTS.

Create the Full-text Index in Parallel

Read the rest of this entry »

Dec 20

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

Read the rest of this entry »