Dec 20

InnoDB full-text search (FTS) is finally available in MySQL 5.6.4 release. The feature has been on trial through MySQL’s summer lab release, thus we had several blogs covering the feature. In this blog, we will leave the “how to” part of the feature to those blogs, and focus on some important characteristics of this new feature, so you will have a better understanding when trying on the feature.

The InnoDB Full-text Index as an Inverted Index

When comes to the basic design, InnoDB takes a traditional way to implementation the full-text index, which is a so called “Inverted Index”. It composes of a set of auxiliary “index tables” that stores the “Word” “Doc ID” pair, as well as each word’s position in the original text.

Incoming text strings in the inserting records are extracted out, tokenized and decomposed into individual words, and such words are then inserted into the auxiliary “index tables” along with their position info and the Doc ID associated with the record. With the position info, we will be able to support proximity search that is lacking in MyISAM FTS.

Create the Full-text Index in Parallel

Read the rest of this entry »

Oct 4

In our earlier Labs release, we presented a MySQL NoSQL solution through InnoDB Memcached Daemon Plugin (see earlier Calvin’s and my posts). That earlier release allows the Memcached Plugin directly interacting with InnoDB, completely bypassing MySQL optimizer and QP layers. It differs with another popular MySQL NoSQL solution, HandlerSocket, by skipping even the Handler API layer and directly access the InnoDB through InnoDB APIs. Thus, in theory, it would be simpler and more efficient.

However, there is one major functionality we did not implement in the first release: the replication (binlog) capability. And this important feature is being done in the latest labs release (mysql-5.6.4-labs-innodb-memcached), in which all memcached commands are now supported for binlogging.

In the following sections, we will quick walk you through the prototype and show how to use the binlog capability.

Set up the InnoDB Memcached:

Our earlier post (Get started with InnoDB Memcached Daemon plugin) details the step to set up the InnoDB Memcached. We will go very briefly on some important steps here: Read the rest of this entry »

Jul 25

This Summer’s lab release includes an important feature of InnoDB – InnoDB Fulltext Search. This feature would greatly enhance InnoDB’s capability in Text search and retrieving. Since the feature is designed for our transactional storage engine, its underlying architecture design and implementation are completely different with those of MyISAM. So it is worth to give a brief technology review of this feature, familiarize users with some important concepts in the InnoDB FTS so that they can better utilize this feature.

There are a few other posts on the subject. John Russell will give a brief tutorial on the InnoDB fulltext search command and syntax. I will also discuss some comparison with MyISAM fulltext search in another post. And Vinay in our server testing will give some performance number from his experiments.

The Design:

To begin with, I will go over briefly on some key design concepts, which would help you better understand the feature.

    Read the rest of this entry »

    Jul 25

    With this lab release, InnoDB now has its own Fulltext Search (FTS) support. People might wonder how different is this FTS comparing with that of MyISAM. From the usage standpoint, they seem very similar with no apparent difference. However, in the underlying FTS engine, the difference is huge, since two are built on different storage engines with totally different design. So in some aspect such as DML, it might be difficult to make it an apple to apple comparison. However, we do try to list and explain the difference, so to make it easier for user to understand our design and make the transition to use InnoDB FTS easier.

    The Design:

    The most significant difference is the design. Like most transaction storage engine, InnoDB stores the inverted index in auxiliary relational tables. All queries and DMLs eventually route to these tables. On the other hand, MyISAM stores the inverted list in a two-level B-Tree. The first level contains records of word and its count, and the second level contains records of this word’s weight and the rowid. So it is more of a special B-tree “index”.

    However, the MyISAM design would not be suitable for highly concurrent transaction storage engine like InnoDB. And its special structure limits its extensibility, as it still can’t include the position information in its indexing to support proximity search.

    For InnoDB, by storing the inverted index in auxiliary tables, it renders all the benefit of relational table manipulations to the FullText index. It already applied the partition concept so that multiple tables are used to handle inverted index. This not allows the create index to be parallelized, it also allow its query can be parallelized in the future. So InnoDB design apparently wins in terms of future extensibility and scalability.

    Read the rest of this entry »

    Apr 13

    In our 5.6 release, we introduced a new feature – “InnoDB Metrics Table” as part of our effort to extend InnoDB diagnostic and monitoring capability. The “Metrics Table” feature consolidates all InnoDB related Performance and Resource related counters (as of today, 176 counters) and makes them available through an information schema table. In addition, it gives the user the ability to control these counters for their monitoring need.

    In following sections, we will go over the feature in detail and focus on its usage to get you started with “metrics counters”.

    1) The user interface – InnoDB Metrics Table

    Before this feature, InnoDB already had dozens of performance counters. They are exposed to the user through “MySQL System Variables”. These counters are permanent counters, and there are no external controls on them. However, they had become important probes that allow users to peek into the server running state and make appropriate decisions to tune the system to its best state.

    In an effort to extend our system monitoring capability, we decided to consolidate the monitoring mechinery, and created a dedicated monitoring module. And the interface to the monitor counters will be a relational table, so users can perform SQL queries on the result. And thus we have the “information_schema.innodb_metrics” table as the interfaces of the “Metrics Table” feature.

    Read the rest of this entry »

    Apr 11

    As Calvin mentioned in “NoSQL to InnoDB with Memcached“, we just released a “technology preview” of the feature that makes memcached a MySQL Daemon Plugin. And this “technology preview” release demonstrates how user can go around SQL Optimizer and Query Processing and directly interact with InnoDB Storage Engine through InnoDB APIs. Here, we would like to walk with you step by step to see how to get the memcached Daemon Plugin set up and get it running.

    If you would just like to get a brief introduction on the setup steps, there is a “README-innodb_memcached” in the mysql-5.6.2-labs-innodb-memcached package. This is a more elaborated description on these steps.

    1) Prerequisite:

    Currently, the Memcached Daemon Plugin prototype is only supported on Linux platform. And as a prerequisite, you must have libevent installed, since it is required by memcached.

    If you have the source code release, then there is a libevent 1.4.3 included in the package (plugin/innodb_memcached/libevent). You can go to the directory and do autoconf, ./configure, make and make install to make the libevent installed.

    Read the rest of this entry »

    Apr 11

    One of the most important things a database user or DBA wants to know is what columns, indexes etc. a table has. There are a few ways to find these things out, such as show tables. However, to really reveal all of the detailed metadata information in an InnoDB database, many have tried the “innodb_table_monitor” to peek into internal data dictionary. By creating an “innodb_table_monitor” table, InnoDB will print out the contents of metadata periodically. Unfortunately, it is printed out in an unstructured text for each table, and to find out what you need you would have to either carefully scan the output or have your own parser to do some additional analysis on the result if you want to display them systematically.

    Well, in the MySQL 5.6 release, “innodb_table_monitor” can become history, and you will no longer need to search the text output to find out system metadata information. Instead, you can query the InnoDB internal system tables through Information Schema System Tables we implemented in this release.

    To begin with, let’s take a look at six of the system tables in InnoDB:

    SYS_TABLES
    SYS_INDEXES
    SYS_FIELDS
    SYS_COLUMNS
    SYS_FOREIGN_COLS
    SYS_FOREIGN

    The name of these tables explain what each system table contains, SYS_TABLES contains basic information about the table, each table’s index information is stored in SYS_INDEXES, and its column information is stored in SYS_COLUMNS. Each index’s field information is stored in SYS_FIELDS. And if a table contains foreign keys, that information is contained in SYS_FOREIGN_COLS and SYS_FOREIGN. You can get a good idea what a table or index look like after digging into all the detailed information stored in these system tables.

    Read the rest of this entry »

    Apr 13

    Performance Schema Support in InnoDB

    With the plugin 1.1 release, InnoDB will have full support of Performance Schema, a new feature of MySQL 5.5 release. This allows a user to peak into some critical server synchronization events and obtain their usage statistics. On the other hand, in order to make a lot of sense of the instrumented result, you might need some understanding of InnoDB internals, especially in the area of synchronization with mutexes and rwlocks.

    With this effort, the following four modules have been performance schema instrumented.

    1. Mutex
    2. RWLOCKs
    3. File I/O
    4. Thread

    Almost all mutexes (42), rwlocks (10) and 6 types of threads are instrumented. Most mutex/rwlock instrumentations are turned on by default, a few of them are under special define. For File I/O, their statistics are categorized into Data, Log and Temp file I/O.

    This blog is to give you a quick overview on this new machinery.

    Read the rest of this entry »