Apr 13

In our 5.6 release, we introduced a new feature – “InnoDB Metrics Table” as part of our effort to extend InnoDB diagnostic and monitoring capability. The “Metrics Table” feature consolidates all InnoDB related Performance and Resource related counters (as of today, 176 counters) and makes them available through an information schema table. In addition, it gives the user the ability to control these counters for their monitoring need.

In following sections, we will go over the feature in detail and focus on its usage to get you started with “metrics counters”.

1) The user interface – InnoDB Metrics Table

Before this feature, InnoDB already had dozens of performance counters. They are exposed to the user through “MySQL System Variables”. These counters are permanent counters, and there are no external controls on them. However, they had become important probes that allow users to peek into the server running state and make appropriate decisions to tune the system to its best state.

In an effort to extend our system monitoring capability, we decided to consolidate the monitoring mechinery, and created a dedicated monitoring module. And the interface to the monitor counters will be a relational table, so users can perform SQL queries on the result. And thus we have the “information_schema.innodb_metrics” table as the interfaces of the “Metrics Table” feature.

Read the rest of this entry »

Apr 11

As Calvin mentioned in “NoSQL to InnoDB with Memcached“, we just released a “technology preview” of the feature that makes memcached a MySQL Daemon Plugin. And this “technology preview” release demonstrates how user can go around SQL Optimizer and Query Processing and directly interact with InnoDB Storage Engine through InnoDB APIs. Here, we would like to walk with you step by step to see how to get the memcached Daemon Plugin set up and get it running.

If you would just like to get a brief introduction on the setup steps, there is a “README-innodb_memcached” in the mysql-5.6.2-labs-innodb-memcached package. This is a more elaborated description on these steps.

1) Prerequisite:

Currently, the Memcached Daemon Plugin prototype is only supported on Linux platform. And as a prerequisite, you must have libevent installed, since it is required by memcached.

If you have the source code release, then there is a libevent 1.4.3 included in the package (plugin/innodb_memcached/libevent). You can go to the directory and do autoconf, ./configure, make and make install to make the libevent installed.

Read the rest of this entry »

Apr 11

MySQL is the most popular open source SQL database. The ever-increasing performance demands of web-based services have generated significant interest in providing NoSQL access methods to MySQL. Today, MySQL is announcing the preview of the NoSQL to InnoDB via memcached. This offering provides users with the best of both worlds – maintain all of the advantages of rich SQL query language, while providing better performance for simple queries via direct access to shared data.

In this preview release, memcached is implemented as a MySQL plugin daemon, accessing InnoDB directly via the native InnoDB API:

Features provided in the current release:

  • Memcached as a daemon plugin of mysqld: both mysqld and memcached are running in the same process space, with very low latency access to data
  • Direct access to InnoDB: bypassing SQL parser and optimizer
  • Support standard protocol (memcapable): support both memcached text-based protocol and binary protocol; all 55 memcapable tests are passed
  • Support multiple columns: users can map multiple columns into “value”. The value is separated by a pre-defined “separator” (configurable).
  • Optional local caching: three options – “cache-only”, “innodb-only”, and “caching” (both “cache” and “innodb store”). These local options can apply to each of four Memcached operations (set, get, delete and flush).
  • Batch operations: user can specify the batch commit size for InnoDB memcached operations via “daemon_memcached_r_batch_size” and “daemon_memcached_w_batch_size” (default 32)
  • Support all memcached configure options through MySQL configure variable “daemon_memcached_option”

Read the rest of this entry »

Apr 11

For those interested in InnoDB internals, this post tries to explain why the global kernel mutex was required and the new mutexes and rw-locks that now replace it. Along with the long term benefit from this change.

InnoDB’s core sub-systems up to v5.5 are protected by a global mutex called the Kernel mutex. This makes it difficult to do even some common sense optimisations. In the past we tried optimising the code but it would invariably upset the delicate balance that was achieved by tuning of the code that used the global Kernel mutex, leading to unexpected performance regression. The kernel mutex is also abused in several places to cover operations unrelated to the core e.g., some counters in the server thread main loop.

The InnoDB core sub-systems are:

  1. The Locking sub-system
  2. The Transaction sub-system
  3. MVCC  views

For any state change in the above sub-systems we had to acquire the kernel mutex and this would reduce concurrency and made the kernel mutex very highly contended. A transaction that is creating a lock would end up blocking read view creation (for MVCC) and transaction start or commit/rollback. With the the finer granularity mutexes and rw-locks, a transaction that is creating a lock will not block transaction start or commit/rollback. MVCC read view creation will however block transaction create and commit/rollback because of the shared trx_sys_t::trx_list. But MVCC read view creations will not block each other because they will acquire an S lock.

Read the rest of this entry »

Apr 11

Background

InnoDB gathers statistics for the data in user tables, which are used by the MySQL optimizer to choose the best query plan. For a long time the imprecision and instability of these statistics have been creating problems for users.

The problem is that these statistics are recalculated at any of the following events:

* When the table is opened

* When the table has changed a lot (1/16th of the table has been updated/deleted or inserted)

* When ANALYZE TABLE is run

Read the rest of this entry »

Apr 11

One of the most important things a database user or DBA wants to know is what columns, indexes etc. a table has. There are a few ways to find these things out, such as show tables. However, to really reveal all of the detailed metadata information in an InnoDB database, many have tried the “innodb_table_monitor” to peek into internal data dictionary. By creating an “innodb_table_monitor” table, InnoDB will print out the contents of metadata periodically. Unfortunately, it is printed out in an unstructured text for each table, and to find out what you need you would have to either carefully scan the output or have your own parser to do some additional analysis on the result if you want to display them systematically.

Well, in the MySQL 5.6 release, “innodb_table_monitor” can become history, and you will no longer need to search the text output to find out system metadata information. Instead, you can query the InnoDB internal system tables through Information Schema System Tables we implemented in this release.

To begin with, let’s take a look at six of the system tables in InnoDB:

SYS_TABLES
SYS_INDEXES
SYS_FIELDS
SYS_COLUMNS
SYS_FOREIGN_COLS
SYS_FOREIGN

The name of these tables explain what each system table contains, SYS_TABLES contains basic information about the table, each table’s index information is stored in SYS_INDEXES, and its column information is stored in SYS_COLUMNS. Each index’s field information is stored in SYS_FIELDS. And if a table contains foreign keys, that information is contained in SYS_FOREIGN_COLS and SYS_FOREIGN. You can get a good idea what a table or index look like after digging into all the detailed information stored in these system tables.

Read the rest of this entry »

Apr 11

In MySQL 5.6.2 we have introduced a new background thread named the page_cleaner in InnoDB. Adaptive flushing of modified buffer pool pages in the main background thread, async flushing by a foreground query thread if the log files are near to wrapping around, idle flushing and shutdown flushing are now moved to this thread, leaving only the last resort sync flushing in foreground query threads. We’ve also added counters for these activities.

As page_cleaner is all about the flushing of dirty pages to disk it’ll do you a world of good if you can go through this post where I have explained different types of flushing that happen inside InnoDB and the conditions that trigger flushing. The page_cleaner thread is only concerned with flush_list flushing (this may change in future releases). So let us dig a bit deeper into why flush_list flushing happens and why it would make sense to do this flushing in a separate thread. As is usually the case I have to skip some details to keep this note simple.

On a busy server flush_list flushing (which I’ll simply call flushing from this point onwards) happens under four conditions. In order to understand these conditions let us familiarize ourselves with the concept of checkpoint_age. The checkpoint_age is the difference between the current_lsn (the latest change to the database) and the last checkpoint_lsn (the lsn when last checkpoint happened). We obviously don’t want to let this difference grow beyond the log file size because if that happens then we end up overwriting redo log entries before the corresponding dirty pages are flushed to the disk, losing the ability to recover them. In order to avoid the above situation we maintain two high water marks to indicate if we are nearing the end of reusable redo log space. Lets call these water marks async_water_mark and sync_water_mark, where the later represents a more urgent situation than the former. Now that we have clarified the checkpoint_age concept let us get back to the four conditions under which flushing of dirty pages happens:

  1. checkpoint_age < async_water_mark
    • This condition means that we have enough reusable redo space. As such there is no hurry to flush dirty pages to the disk. This is the condition where we’d like our server to be most of the time.
    • Based on adaptive_flushing heuristics we flush some dirty pages in this state. This flushing happens in the background master thread.
    • During the flushing no other threads are blocked, so queries continue normally.

    Read the rest of this entry »

Apr 8

In MySQL 5.6 we’ve added a new feature that closes and unloads table instances from the InnoDB internal data dictionary, once a user configurable threshold is reached. This ends the situation where you could have hundreds of megabytes caching rarely used entries until the server was restarted and will be particularly appreciated by hosting and software as a service providers.

For this we’ve used an existing MySQL config variable table-definition-cache.  This cache limit is a soft limit. This means that if the user has more than table-definition-cache tables open then InnoDB will not force eviction of the table from the InnoDB data dictionary cache.

Users don’t really have to do anything to make use of this feature. It should pick up the existing setting (default or explicitly set) and just work. This feature should benefit users that have 1000s of tables and open a subset of tables only rarely. The LRU mechanism will eventually mark them for eviction and remove them from the data dictionary cache.

Other benefits

Some of the additional benefits of this change are internal to InnoDB and help in improving the design. We now have well defined interfaces for opening and closing tables. All internal sub-systems  need to use the same interface as the MySQL table handler. This has helped in loosening the tight coupling between various InnoDB sub-systems and the data dictionary. By loosening this tight coupling, it will help us in adding features to the InnoDB data dictionary more easily.

Read the rest of this entry »

Apr 7

Data Dictionary Language (DDL) operations have traditionally been slow in MySQL. Any change to the table definition would be implemented by creating a copy of the table and index layout with the requested changes, copying the table contents row by row, and finally renaming tables and dropping the original table.

The InnoDB Plugin for MySQL 5.1 implements a more efficient interface for creating and dropping indexes. Indexes can be created or dropped without rebuilding the entire table.

Speeding up Bulk INSERT, UPDATE and DELETE Operations

Normally, InnoDB would update all indexes of a table when rows are inserted or deleted. If you update an indexed column, InnoDB would have to delete the old value and insert the new value in the corresponding index. If you update a primary key column, the row would be deleted and inserted in every index of the table. This can be slow even despite the change buffering in MySQL 5.5.

Bulk data imports, bulk deletes and bulk updates could be executed faster if they did not have to update too many secondary indexes. Provided that there are no operations running that would benefit from the existence of the indexes while the bulk operation is in progress, it could be useful to drop the affected indexes before starting a bulk operation and to create the indexes after the operation.

Read the rest of this entry »

Apr 6

Purpose

What does purge exactly do and why is it needed? If you have ever wondered then read on. It is really a type of garbage collector. When a user issues a DML like “DELETE FROM t WHERE c = 1;”, InnoDB doesn’t remove the matching record. This is what happens under the hood:

  1. It marks the record as deleted by setting a bit in the control bits of the record.
  2. Stores the before image of the modified columns to the UNDO log
  3. Updates the system columns DB_TRX_ID and DB_ROLL_PTR in the clustered index record. DB_TRX_ID identifies the transaction that made the last change, and DB_ROLL_PTR points to the new UNDO log record. This UNDO log record contains the old values of DB_TRX_ID and DB_ROLL_PTR, possibly pointing to an older transaction and undo log entry.

From this you should be able to visualise that the UNDO log records related to the modified clustered record are in a disk based linked list, with the head anchored in the clustered index record. For the sake of simplicity I’ve ignored the UNDO log pages and the case where DML updates a record. This information is required by  rollback and MVCC (multi version concurrency control). For MVCC we need the entry to exist in the clustered index so that we can follow a pointer back to where the “before” changes were written in the UNDO log and use that information to construct a previous version of the record. For rollback we need the before  information of the record  (UNDO entry) so that we can restore it when a transaction is rolled back.

Another benefit of purging separately in the background is that expensive B+Tree block merge operations, if the removal of the record were to lead to underflow, can be done asynchronously by purge and not by user transactions.

Garbage collection

Read the rest of this entry »