<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Transactions on InnoDB</title>
	<atom:link href="http://blogs.innodb.com/wp/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.innodb.com/wp</link>
	<description>&#34;The word&#34; about InnoDB Products and Technology</description>
	<lastBuildDate>Mon, 17 May 2010 16:28:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>InnoDB recovery gets even faster in Plugin 1.1, thanks to native AIO</title>
		<link>http://blogs.innodb.com/wp/2010/05/innodb-recovery-gets-even-faster-in-plugin-1-1-thanks-to-native-aio/</link>
		<comments>http://blogs.innodb.com/wp/2010/05/innodb-recovery-gets-even-faster-in-plugin-1-1-thanks-to-native-aio/#comments</comments>
		<pubDate>Mon, 17 May 2010 16:28:23 +0000</pubDate>
		<dc:creator>Michael Izioumtchenko</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=596</guid>
		<description><![CDATA[InnoDB Plugin 1.1 doesn&#8217;t add any recovery specific improvements on top of what we already have in Plugin 1.0.7. The details on the latter are available in this blog. Yet, when I tried to recover another big recovery dataset I created, I got the following results for total recovery time:

Plugin 1.0.7: 46min 21s
Plugin 1.1: 32min 41s

Plugin [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">InnoDB Plugin 1.1 doesn&#8217;t add any recovery specific improvements on top of what we already have in Plugin 1.0.7. The details on the latter are available in this <a href="http://blogs.innodb.com/wp/2010/04/innodb-performance-recovery">blog</a>. Yet, when I tried to recover another big recovery dataset I created, I got the following results for total recovery time:</p>
<ul>
<li>Plugin 1.0.7: 46min 21s</li>
<li>Plugin 1.1: 32min 41s</li>
</ul>
<p><strong>Plugin 1.1 recovery is 1.5 times faster</strong>. Why would that happen? The numerous concurrency improvements in Plugin 1.1 and MySQL 5.5 can&#8217;t really affect the recovery. The honor goes to Native Asynchronous IO on Linux. Let&#8217;s try without it:</p>
<ul>
<li>Plugin 1.1 with &#8211;innodb-use-native-aio=0: 49min 07s</li>
</ul>
<p>which is about the same as 1.0.7 time. My numerous other recovery runs showed that the random fluctuations account for 2-3min of a 30-45min test.</p>
<p><span id="more-596"></span></p>
<p>Why is native AIO good for you? Why is it better the  than the simulated AIO we already have? Here&#8217;s what Inaam Rana, our IO expert and the author of the AIO patch, says:</p>
<ul>
<li>During recovery typically redo log application is performed by the IO helper threads in the completion routine.</li>
<li>With simulated aio the thread waits for IO to complete and then calls the completion routine.</li>
<li>With native aio the thread doesn&#8217;t have to wait for io to complete, instead it picks a completed request and applies redo to it.</li>
</ul>
<p>Read more about native AIO <a href="../2010/04/innodb-performance-aio-linux/">here</a>.</p>
<p>You don&#8217;t have to do anything to take advantage of this feature. It is enabled by default and is used where available as determined by <em>configure</em>.</p>
<p>Here are some details about the test environment:</p>
<p>Hardware: HP DL480, 32G RAM, 2&#215;4 core Intel(R) Xeon(R) CPU E5450  @ 3.00GHz, RAID5, about 1T total storage</p>
<p>Dataset:  1757549 dirty pages, 2808364565 bytes of redo. For the curious, it was a sysbench table with 400 million rows and the workload I used was random row update by a simple perl script.  Note that this is over 28G worth of dirty pages which means I had to use a very abusive settings of innodb_buffer_pool=29G and innodb_max_dirty_pages_pct=99, given only 32G of RAM. The recovery was done using the same settings and in the first few attempts the recovery would fail because of what was eventually diagnosed as <a href="http://bugs.mysql.com/bug.php?id=53122">bug 53122</a>. As it happens, InnoDB recovery uses some memory outside of the buffer pool and it wanted more of it that was really necessary.</p>
<p>InnoDB configuration parameters:</p>
<p>&#8211;innodb-buffer-pool-size=28g<br />
&#8211;innodb-log-file-size=2047m<br />
&#8211;innodb-adaptive-flushing=0<br />
&#8211;innodb-io-capacity=100<br />
&#8211;innodb-additional-mem-pool-size=16m<br />
&#8211;innodb-log-buffer-size=16m<br />
&#8211;innodb-adaptive-hash-index=0<br />
&#8211;innodb-flush-log-at-trx-commit=2<br />
&#8211;innodb-max-dirty-pages-pct=99</p>
<p>This is highly artificial setup that targets maximizing the generation of dirty pages and redo, and using as much memory as possible for those dirty pages.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/05/innodb-recovery-gets-even-faster-in-plugin-1-1-thanks-to-native-aio/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Post-Conference Roundup of InnoDB-related Info</title>
		<link>http://blogs.innodb.com/wp/2010/04/post-conference-roundup-of-innodb-related-info/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/post-conference-roundup-of-innodb-related-info/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 05:02:49 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[John Russell]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scalability]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=590</guid>
		<description><![CDATA[What a busy week! Lots of MySQL 5.5 announcements that just happened to coincide with the MySQL Conference and Expo in Silicon Valley. Here are some highlights of the performance and scalability work that the InnoDB team was involved with.
A good prep for the week of news is the article Introduction to MySQL 5.5, which includes [...]]]></description>
			<content:encoded><![CDATA[<p>What a busy week! Lots of MySQL 5.5 announcements that just happened to coincide with the MySQL Conference and Expo in Silicon Valley. Here are some highlights of the performance and scalability work that the InnoDB team was involved with.</p>
<p>A good prep for the week of news is the article <a href="http://dev.mysql.com/tech-resources/articles/introduction-to-mysql-55.html">Introduction to MySQL 5.5</a>, which includes information about the major performance and scalability features. That article will lead you into the <a href="http://dev.mysql.com/doc/refman/5.5/en/">MySQL 5.5 manual</a> for general features and the <a href="http://dev.mysql.com/doc/innodb-plugin/1.1/en/">InnoDB 1.1 manual</a> for performance &amp; scalability info.</p>
<p>Then there were the conference presentations from InnoDB team members, which continued the twin themes of performance and scalability:</p>
<ul>
<li>InnoDB: Status, Architecture, and New Features: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/Whats_New_in_InnoDB_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13502">Rate / leave feedback</a></li>
<li>InnoDB Plugin: Performance Features and Benchmarks: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/InnoDB_Performance_benchmarks_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13503">Rate / leave feedback</a></li>
<li>What&#8217;s New in MySQL 5.5? Performance/Scale Unleashed!: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/Performance_Change_Analysis_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13363">Rate / leave feedback</a></li>
<li>What&#8217;s New in MySQL 5.5?: Performance and Scalability Benchmarks: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/Benchmark_Analysis_Final_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/14298">Rate / leave feedback</a></li>
<li>Introduction to InnoDB Monitoring System and Resource &amp; Performance Tuning: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/InnoDB_Monitoring_System_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13508">Rate / leave feedback</a></li>
<li>Backup Strategies with InnoDB Hot Backup: <a href="http://www.innodb.com/wp/wp-content/uploads/2010/04/Backup_Strategies_with_MySQL_Enterprise_Backup_2010.pdf">Slides</a>, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13505">Rate / leave feedback</a></li>
</ul>
<p><span id="more-590"></span></p>
<p>We hope that a good and useful time was had by all. Best regards to our European friends and colleagues whose return plans were disrupted by the Icelandic volcano!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/post-conference-roundup-of-innodb-related-info/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>InnoDB now supports native AIO on Linux</title>
		<link>http://blogs.innodb.com/wp/2010/04/innodb-performance-aio-linux/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/innodb-performance-aio-linux/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 16:55:28 +0000</pubDate>
		<dc:creator>Inaam Rana</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=498</guid>
		<description><![CDATA[With the exception of Windows InnoDB has used &#8217;simulated AIO&#8217; on all other platforms to perform certain IO operations. The IO requests that have been performed in a &#8217;simulated AIO&#8217; way are the write requests and the readahead requests for the datafile pages. Let us first look at what does &#8217;simulated AIO&#8217; mean in this [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">With the exception of Windows InnoDB has used &#8217;simulated AIO&#8217; on all other platforms to perform certain IO operations. The IO requests that have been performed in a &#8217;simulated AIO&#8217; way are the write requests and the readahead requests for the datafile pages. Let us first look at what does &#8217;simulated AIO&#8217; mean in this context.</p>
</div>
<div id="_mcePaste">We call it &#8217;simulated AIO&#8217; because it appears asynchronous from the context of a query thread but from the OS perspective the IO calls are still synchronous. The query thread simply queues the request in an array and then returns to the normal working. One of the IO helper thread, which is a background thread, then takes the request from the queue and issues a synchronous IO call (pread/pwrite) meaning it blocks on the IO call. Once it returns from the pread/pwrite call, this helper thread then calls the IO completion routine on the block in question which includes doing a merge of buffered operations, if any, in case of a read. In case of a write, the block is marked as &#8216;clean&#8217; and is removed from the flush_list. Some other book keeping stuff also happens in IO completion routine.</p>
</div>
<div id="_mcePaste">What we have changed in the InnoDB Plugin 1.1 is to use the native AIO interface on Linux. Note that this feature requires that your system has libaio installed on it. libaio is a thin wrapper around the kernelized AIO on Linux. It is different from Posix AIO which requires user level threads to service AIO requests. There is a new boolean switch, innodb_use_native_aio, to choose between simulated or native AIO, the default being to use native AIO.</p>
<p><span id="more-498"></span></p>
</div>
<div id="_mcePaste">How does this change the design of the InnoDB IO subsystem? Now the query thread instead of enqueueing the IO request actually dispatches the request to the kernel and returns to the normal working. The IO helper thread, instead of picking up enqueued requests, waits on the IO wait events for any completed IO requests. As soon as it is notified by the kernel that a certain request has been completed it calls the IO completion routine on that request and then returns back to wait on the IO wait events. In this new design the IO requesting thread becomes kind of a dispatcher while the background IO thread takes on the role of a collector.</p>
</div>
<div id="_mcePaste">What will this buy us? The answer is simple &#8211; scalability. For example, consider a system which is heavily IO bound. In InnoDB one IO helper thread works on a maximum of 256 IO requests at one time. Assume that the heavy workload results in the queue being filled up. In simulated AIO the IO helper thread will go through these requests one by one making a synchronous call for each request. This means serialisation forcing the request that is serviced last to wait for the other 255 requests before it gets a chance. What this implies is that with simulated AIO there can be at most &#8216;n&#8217; IO requests in parallel inside the kernel where &#8216;n&#8217; is the total number of IO helper threads (this is not entirely true because query threads are also allowed to issue synchronous requests as well, but I&#8217;ll gloss over that detail for now). In case of native AIO all 256 requests are dispatched to the kernel and if the underlying OS can service more requests in parallel then we&#8217;ll take advantage of that.</p>
</div>
<div id="_mcePaste">The idea of coalescing contiguous requests is now off loaded to the kernel/IO scheduler. What this means is that which IO scheduler you are using or the properties of your RAID/disk controller may now have more affect on the overall IO performance. This is also true because now many more IO requests will be inside the kernel than before. Though we have not run tests to specifically certify any particular IO scheduler the conventional wisdom has been that for database engine workloads perhaps no-op or deadline scheduler would give optimal performance. I have heard that lately a lots of improvements have gone in cfq as well. It is for you to try and as always YMMV. And we look forward to hear your story.</p>
</div>
<div id="_mcePaste">NOTE:InnoDB h as always used native AIO on Windows and it continues to do so in Plugin 1.1. innodb_use_native_aio will have no affect on Windows.</div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/innodb-performance-aio-linux/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>InnoDB recovery is now faster&#8230;much faster!</title>
		<link>http://blogs.innodb.com/wp/2010/04/innodb-performance-recovery/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/innodb-performance-recovery/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 09:29:20 +0000</pubDate>
		<dc:creator>Inaam Rana</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=482</guid>
		<description><![CDATA[One of the well known and much written about complaint regarding InnoDB recovery is that it does not scale well on high-end systems. Well, not any more. In InnoDB plugin 1.0.7 (which is GA) and plugin 1.1 (which is part of MySQL 5.5.4) this issue has been addressed. Two major improvements, apart from some other [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">One of the well known and much written about complaint regarding InnoDB recovery is that it does not scale well on high-end systems. Well, not any more. In InnoDB plugin 1.0.7 (which is GA) and plugin 1.1 (which is part of MySQL 5.5.4) this issue has been addressed. Two major improvements, apart from some other minor tweaks, have been made to the recovery code. In this post I&#8217;ll explain these issues and the our solution for these.</p>
</div>
<div id="_mcePaste">First issue reported <a href="http://bugs.mysql.com/bug.php?id=49535">here</a> is about available memory check eating up too much CPU. During recovery, the first phase, called redo scan phase, is where we read the redo logs from the disk and store them in a hash table. In the second phase, the redo application phase, these redo log entries are applied to the data pages. The hash table that stores the redo log entries grows in the buffer pool i.e.: memory for the entries is allocated in 16K blocks from the buffer pool. We have to ensure that the hash table does not end up allocating all the memory in the buffer pool leaving us with no room to read in pages during the redo log application phase. For this we have to keep checking the size of the heap that we are using for allocating the memory for the hash table entries. So why would it kill the performance? Because we do not have the total size of the heap available to us. We calculate it by traversing the list of blocks so far allocated. Imagine if we have gigabytes or redo log to apply (it can be up to 4G). That would mean hundreds of thousands of blocks in the heap! And we have to make a check roughly whenever we are reading in a new redo page during scan. An O(n * m) algorithm where &#8216;n&#8217; is number of blocks in the heap and &#8216;m&#8217; is number of redo pages that have to be scanned.</p>
</div>
<div id="_mcePaste">What is the solution we came up with? Store the total size of a heap in its header. Simple and effective. Our algorithm now becomes O(m).</p>
<p><span id="more-482"></span></p>
</div>
<div id="_mcePaste">Lets talk about the second issue reported <a href="http://bugs.mysql.com/bug.php?id=29847">here</a>. During the redo log application phase, data pages to which redo log entries are applied are to be inserted in a list called flush_list which is ordered by the LSN of the earliest modification to a page i.e.: oldest_modification. During the normal working the LSN increases monotonically therefore the insertion to the flush_list always happens at the head. But during recovery we have to linearly search the flush_list to find the appropriate spot for insertion. The length to which the flush_list can grow is the number of modified pages (called dirty pages in db parlance) we had at the time of crash. On high-end system with multi-gigabyte buffer pools this number can be very high i.e.: million or more dirty pages. A linear search for insertion won&#8217;t scale.</p>
</div>
<div id="_mcePaste">There has been a talk in the community about how to fix this and various solutions have been suggested and some were implemented by the community as well. What we, at InnoDB, have finally implemented is to have an auxiliary data structure (a red-black tree in this case) which is active only during the recovery phase and which is used to speed up sorted insertions in the the flush_list. The flush list remains a list and after the recovery is over the red black is tree is discarded as during the normal operations we only ever append to the flush_list.</p>
</div>
<div id="_mcePaste">So much for the theory. Now let us see if we can walk the talk. To evaluate the effectiveness of this fix we&#8217;d need a crash when there are a lot of redo logs to apply and there are a lot of dirty pages. I requested Michael Izioumtchenko (InnoDB&#8217;s QA supremo) to come up with something. And he did the following:</div>
<div id="_mcePaste">The dataset was obtained by running a 60m sysbench readwrite uniform distribution in memory workload with prewarmed cache using the following configuration parameters:</div>
<div><span style="font-family: monospace;">&#8211;innodb-buffer-pool-size=18g</span></div>
<div><span style="font-family: monospace;">&#8211;innodb-log-file-size=2047m</span></div>
<div><span style="font-family: monospace;">&#8211;innodb-adaptive-flushing=0</span></div>
<div><span style="font-family: monospace;">&#8211;innodb-io-capacity=100</span></div>
<div id="_mcePaste">The latter two are used to throttle flushing in order to maximize the number of dirty pages.</div>
<div id="_mcePaste">It took only about 20 min of running a workload to arrive to the test dataset, including cache prewarming.</div>
<div id="_mcePaste">So at time of crash we had:</div>
<div><span style="font-family: monospace;">Modified db pages  1007907</span></div>
<div><span style="font-family: monospace;">Redo bytes: 3050455773</span></div>
<div id="_mcePaste">And the recovery times were:</div>
<p><code> </code></p>
<div><span style="font-family: monospace;">Plugin 1.0.7 (also Plugin 1.1): 1m52s scan, 12m04s apply, total 13m56s</span></div>
<div><span style="font-family: monospace;">Plugin 1.0.6: 31m39s scan, 7h06m21s apply, total 7h38m</span></div>
<div><span style="font-family: monospace;">1.0.7 (and Plugin 1.1) is better 16.95x on scan, 35.33x on apply, 32.87x overall</span></div>
<div id="_mcePaste">Note that all this comes to you transparently. You don&#8217;t have to set any parameter to take advantage of this feature. My only suggestion would be to use as large log files (there is a limit of 4G on total log file size) as you can. I know users have been using smaller log files to avoid recovery running into hours. They have taken a hit on throughput during normal running to avoid longer recovery time. You don&#8217;t have to do that any more. InnoDB recovery will never run into hours. It&#8217;s a guarantee!</div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/innodb-performance-recovery/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>InnoDB Performance Schema</title>
		<link>http://blogs.innodb.com/wp/2010/04/innodb-performance-schema/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/innodb-performance-schema/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 09:26:54 +0000</pubDate>
		<dc:creator>jimmy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=481</guid>
		<description><![CDATA[Performance Schema Support in InnoDB
With the plugin 1.1 release, InnoDB will have full support of Performance Schema, a new feature of MySQL 5.5 release. This allows a user to peak into some critical server synchronization events and obtain their usage statistics. On the other hand, in order to make a lot of sense of the [...]]]></description>
			<content:encoded><![CDATA[<p>Performance Schema Support in InnoDB</p>
<p>With the plugin 1.1 release, InnoDB will have full support of Performance Schema, a new feature of MySQL 5.5 release. This allows a user to peak into some critical server synchronization events and obtain their usage statistics. On the other hand, in order to make a lot of sense of the instrumented result, you might need some understanding of InnoDB internals, especially in the area of synchronization with mutexes and rwlocks.</p>
<p>With this effort, the following four modules have been performance schema instrumented.</p>
<p>1. Mutex<br />
2. RWLOCKs<br />
3. File I/O<br />
4. Thread</p>
<p>Almost all mutexes (42), rwlocks (10) and 6 types of threads are instrumented. Most mutex/rwlock instrumentations are turned on by default, a few of them are under special define. For File I/O, their statistics are categorized into Data, Log and Temp file I/O.</p>
<p>This blog is to give you a quick overview on this new machinery.</p>
<p><span id="more-481"></span></p>
<p><strong>Start the MySQL Server with Performance Schema</strong></p>
<p>To start with, you probably want to take a quick look at MySQL&#8217;s Performance Schema Manual (http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html ), this gives you a quick overview on the general performance schema features.</p>
<p>The performance schema is by default built in with MySQL 5.5 release. However, you do need to add &#8220;-performance_schema&#8221; to your server boot command line or have performance_schema system variable enabled in your server configure file in order to enable the performance schema. Otherwise, it will be disabled.  Please note, you can specify &#8220;performance_schema&#8221; variable with no value or a value of 1 to enable it, or with a value of 0 to disable it.</p>
<p>When server starts, please pay attention to following lines in server error log:</p>
<p>&#8220;100407 16:13:02 [Note] Buffered information: Performance schema enabled.&#8221;</p>
<p>This means the server starts with performance schema running fine.</p>
<p>It could also display message such as:</p>
<p>&#8220;100407 16:13:02 [Note] Buffered information:  Performance schema disabled (reason: start parameters)&#8221;</p>
<p>This shows the performance schema is disabled due to lack of either &#8220;performance_schema&#8221; boot option or appropriate variable set in the configuration file.</p>
<p>The third type message would be &#8220;Performance schema disabled (reason: init failed)&#8221;, it is due to performance schema initialization failure (could due to reasons such as memory allocation failure etc.). This message is relatively rare. I have not encountered it. If you do hit it, please check  other performance schema related system variables, to see if they are out of reasonable range.</p>
<p><strong>Performance Schema Database and its Tables</strong></p>
<p>Assuming server starts fine with Performance Schema enabled, first stop you want to visit is probably the new database called &#8220;performance_schema&#8221;. All performance schema related tables are in this database:</p>
<p>mysql&gt; use performance_schema</p>
<p>mysql&gt; show tables;</p>
<p>Tables_in_performance_schema</p>
<p><code> </code></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="79" valign="top"></td>
<td width="474" valign="top"><code>Tables_in_performance_schema</code></td>
</tr>
<tr>
<td width="79" valign="top"><code>1</code></td>
<td width="474" valign="top"><code>COND_INSTANCES</code></td>
</tr>
<tr>
<td width="79" valign="top"><code>2</code></td>
<td width="474" valign="top"><code>FILE_INSTANCES</code></td>
</tr>
<tr>
<td width="79" valign="top"><code>3</code></td>
<td width="474" valign="top"><code>MUTEX_INSTANCES </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>4</code></td>
<td width="474" valign="top"><code>RWLOCK_INSTANCES </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>5</code></td>
<td width="474" valign="top"><code>EVENTS_WAITS_CURRENT </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>6</code></td>
<td width="474" valign="top"><code>| EVENTS_WAITS_HISTORY </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>7</code></td>
<td width="474" valign="top"><code>EVENTS_WAITS_HISTORY_LONG </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>8</code></td>
<td width="474" valign="top"><code>EVENTS_WAITS_SUMMARY_BY_EVENT_NAME </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>9</code></td>
<td width="474" valign="top"><code>EVENTS_WAITS_SUMMARY_BY_INSTANCE </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>10</code></td>
<td width="474" valign="top"><code>EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME</code></td>
</tr>
<tr>
<td width="79" valign="top"><code>11</code></td>
<td width="474" valign="top"><code>PROCESSLIST </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>12</code></td>
<td width="474" valign="top"><code>SETUP_CONSUMERS </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>13</code></td>
<td width="474" valign="top"><code>SETUP_INSTRUMENTS </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>14</code></td>
<td width="474" valign="top"><code>SETUP_OBJECTS </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>15</code></td>
<td width="474" valign="top"><code>SETUP_TIMERS </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>16</code></td>
<td width="474" valign="top"><code>PERFORMANCE_TIMERS </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>17</code></td>
<td width="474" valign="top"><code>FILE_SUMMARY_BY_EVENT_NAME </code></td>
</tr>
<tr>
<td width="79" valign="top"><code>18</code></td>
<td width="474" valign="top"><code>FILE_SUMMARY_BY_INSTANCE </code></td>
</tr>
</tbody>
</table>
<p>These 18 tables can be categorized into a few big groups, such as &#8220;Instance&#8221; tables, &#8220;Wait&#8221; table with &#8220;History&#8221;, or &#8220;Wait&#8221; table with &#8220;Summary&#8221; and &#8220;Setup&#8221; table.</p>
<p>In the next few section, I will go through a few tables in this list that I think are important.</p>
<p><strong>Find Instrumented Events with INSTANCE TABLES</strong></p>
<p>To view what InnoDB events are active and being instrumented, please check following four &#8220;Instance&#8221; tables for corresponding modules:</p>
<p>MUTEX_INSTANCES<br />
RWLOCK_INSTANCES<br />
PROCESSLIST<br />
FILE_INSTANCES</p>
<p>mysql&gt; SELECT DISTINCT(NAME)<br />
-&gt;  FROM MUTEX_INSTANCES<br />
-&gt; WHERE NAME LIKE &#8220;%innodb%&#8221;;</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="403" valign="top"><code>name </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/analyze_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/mutex_list_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/ibuf_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/rseg_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/autoinc_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/flush_list_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>…..</code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/thr_local_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/srv_monitor_file_mutex</code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/buf_pool_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/recv_sys_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/fil_system_mutex </code></p>
<p><code>wait/synch/mutex/innodb/fil_system_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/trx_doublewrite_mutex </code></td>
</tr>
<tr>
<td width="403" valign="top"><code>wait/synch/mutex/innodb/flush_order_mutex </code></td>
</tr>
</tbody>
</table>
<p>35 rows in set (0.00 sec)</p>
<p>Please notice there could be multiple instances of a mutex in the server,</p>
<p>mysql&gt; SELECT COUNT(*)<br />
-&gt; FROM MUTEX_INSTANCES<br />
-&gt; WHERE NAME LIKE &#8220;%rseg_mutex%&#8221;;</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="91" valign="top"><code>COUNT(*)</code></td>
</tr>
<tr>
<td width="91" valign="top"><code>128 </code></td>
</tr>
</tbody>
</table>
<p>1 row in set (0.92 sec)</p>
<p>This is why we need to use &#8220;SELECT DISTINCT (NAME)&#8221; clause in the initial query to get only the distinct mutex names from the MUTEX_INSTANCES table. Without the DISTINCT, there could be hundreds of instances of mutex being displayed. This also applies to other instance tables.</p>
<p>Also please note, if the mutex is not yet created, it will not be listed in the instance table, so you might see fewer events/instances than you might expected.</p>
<p>One last point for this section, buffer block mutex and rwlock are instrumented but disabled by default from performance schema instrumentation. The reason is that there comes one mutex/rwlock pair per 16k buffer block. Server with large buffer pool configuration could easily create thousands of instances of this type of mutexes/rwlocks.  This easily exceed the default value of max mutex/rwlock instances (1000) allowed. And user would require to extend the limit by setting system variable performance_schema_max_mutex_instances and/or performance_schema_max_rwlock_instances.<br />
However, as we mentioned, the block mutex/rwlock are instrumented,  to enable them, you might need to change the code and un-define &#8220;PFS_SKIP_BUFFER_MUTEX_RWLOCK&#8221;.</p>
<p><strong>Find out what is going on with EVENTS_WAITS_CURRENT table</strong></p>
<p>The next table you might be interested in is the EVENTS_WAITS_CURRENT table,</p>
<p>mysql&gt;  SELECT THREAD_ID, EVENT_NAME, SOURCE<br />
-&gt;   FROM EVENTS_WAITS_CURRENT<br />
-&gt;  WHERE EVENT_NAME LIKE  &#8220;%innodb%&#8221;;</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="115" valign="top"><code>THREAD_ID </code></td>
<td width="322" valign="top"><code>EVENT_NAME </code></td>
<td width="153" valign="top"><code> SOURCE </code></td>
</tr>
<tr>
<td width="115" valign="top"><code>2</code></td>
<td width="322" valign="top"><code>wait/synch/mutex/innodb/ios_mutex </code></td>
<td width="153" valign="top"><code>srv0start.c:495</code></td>
</tr>
<tr>
<td width="115" valign="top"><code>8</code></td>
<td width="322" valign="top"><code>wait/synch/mutex/innodb/log_sys_mutex </code></td>
<td width="153" valign="top"><code>log0log.ic:405 </code></td>
</tr>
<tr>
<td width="115" valign="top"><code>9</code></td>
<td width="322" valign="top"><code>wait/synch/mutex/innodb/kernel_mutex </code></td>
<td width="153" valign="top"><code>srv0srv.c:2182</code></td>
</tr>
<tr>
<td width="115" valign="top"><code>10</code></td>
<td width="322" valign="top"><code>wait/synch/mutex/innodb/thr_local_mutex </code></td>
<td width="153" valign="top"><code>thr0loc.c:127 </code></td>
</tr>
</tbody>
</table>
<p>4 rows in set (0.00 sec)</p>
<p>This table shows the latest instrumented activity for a particular thread. And the nice part of it is that it has the exact file name and line number of each event. So in case there is a hang/blocking situation (due to mutex/rwlock), you could know which mutex or rwlock is actually involved.</p>
<p><strong>Check into &#8220;Limited history&#8221; with HISTORY tables</strong></p>
<p>There are a couple of &#8220;HISTORY&#8221; tables that record each instrumented events. The EVENTS_WAITS_HISTORY table contains the most recent 10 events per thread. And EVENTS_WAITS_HISTORY_LONG contains the most recent 10,000 events by default.  They also come with the &#8220;SOURCE&#8221; field with file name and line number, and you might be able to do some aggregation on them to find some interesting behavior.</p>
<p>For example, following query gives you exact mutex instances that has been on the top list as shown in the history table:</p>
<p>mysql&gt; SELECT EVENT_NAME, SUM(TIMER_WAIT), COUNT(*), SOURCE<br />
-&gt; FROM EVENTS_WAITS_HISTORY_LONG<br />
-&gt; WHERE EVENT_NAME LIKE &#8220;%innodb%&#8221;<br />
-&gt; GROUP BY SOURCE<br />
-&gt; ORDER BY SUM(TIMER_WAIT) DESC;</p>
<p>Or you can obtain the instance with the most average time wait:</p>
<p>mysql&gt; SELECT EVENT_NAME, SUM(TIMER_WAIT)/count(*), source<br />
-&gt; FROM EVENTS_WAITS_HISTORY_LONG<br />
-&gt; WHERE EVENT_NAME LIKE &#8220;%innodb%&#8221;<br />
-&gt; GROUP BY source<br />
-&gt; ORDER BY SUM(TIMER_WAIT) / COUNT(*) DESC;</p>
<p>As mentioned, the history table has limited size, with 10 events per thread for  EVENTS_WAITS_and 10,000 for  EVENTS_WAITS_HISTORY_LONG.  However, you could extend the history length of these two tables by changing<br />
&#8220;performance_schema_events_waits_history_size&#8221; and &#8220;performance_schema_events_waits_history_long_size&#8221; system variables. The performance_schema_events_waits_history_long_size can be extended to a million rows in maximum. However, please do not expect this would be enough. Even with 1 million events configured, in a busy system, it probably only contains a few seconds operation of the server.</p>
<p><strong><br />
Find out aggregated information from SUMMARY Tables</strong></p>
<p>To get the overall aggregated value for these instances, you would need the &#8220;SUMMARY&#8221; table. There are 5 Summary tables,</p>
<p>EVENTS_WAITS_SUMMARY_BY_EVENT_NAME<br />
EVENTS_WAITS_SUMMARY_BY_INSTANCE<br />
EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME<br />
FILE_SUMMARY_BY_EVENT_NAME<br />
FILE_SUMMARY_BY_INSTANCE</p>
<p>As their name suggested, they are just events statistics aggregated with different criteria. Digging into these table gives you some idea where the contention could be.</p>
<p>For example, following query shows what is the hottest mutex (these values have unit as pico-second):</p>
<p>mysql&gt; SELECT EVENT_NAME, COUNT_STAR,SUM_TIMER_WAIT,   AVG_TIMER_WAIT<br />
-&gt; FROM EVENTS_WAITS_SUMMARY_BY_EVENT_NAME<br />
-&gt; WHERE EVENT_NAME LIKE &#8220;%innodb%&#8221;<br />
-&gt; ORDER BY COUNT_STAR DESC;</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="148" valign="top">EVENT_NAME</td>
<td width="148" valign="top">COUNT_STAR</td>
<td width="148" valign="top">SUM_TIMER_WAIT</td>
<td width="148" valign="top">AVG_TIMER_WAIT</td>
</tr>
<tr>
<td width="148" valign="top">buf_pool_mutex</td>
<td width="148" valign="top">1925253</td>
<td width="148" valign="top">264662026992</td>
<td width="148" valign="top">137468</td>
</tr>
<tr>
<td width="148" valign="top">buffer_block_mutex</td>
<td width="148" valign="top">720640</td>
<td width="148" valign="top">80696897622</td>
<td width="148" valign="top">111979</td>
</tr>
<tr>
<td width="148" valign="top">kernel_mutex</td>
<td width="148" valign="top">243870</td>
<td width="148" valign="top">44872951662</td>
<td width="148" valign="top">184003</td>
</tr>
<tr>
<td width="148" valign="top">purge_sys_mutex</td>
<td width="148" valign="top">162085</td>
<td width="148" valign="top">12238011720</td>
<td width="148" valign="top">75503</td>
</tr>
<tr>
<td width="148" valign="top"><code>….</code></td>
<td width="148" valign="top"><code>…</code></td>
<td width="148" valign="top"><code>…</code></td>
<td width="148" valign="top"><code>..</code></td>
</tr>
</tbody>
</table>
<p>This experiment shows the buf_pool_mutex has been the hottest mutex. However, aggregate on AVG_TIMER_WAIT shows that ibuf_mutex is the one we waited the longest, even though it is much less frequently accessed.</p>
<p>Also please note these tables can be truncated, so you can essentially reset these wait values, and start the counting and aggregation afresh.</p>
<p>Again, to really understand and interpret information from these tables require some understanding of internals that these mutexes/rwlocks reside. It could target to advanced users and developers who want to analyze performance bottlenecks. However, common users might still be able to infer certain information out of it, and have some creative usage of these statistics. For example, for some I/O bound servers, you might find double write buffer mutex is on the top listed mutexes in terms of total time waited. Then you might want to consider to turn off the double write buffer option etc.</p>
<p><strong>Performance Impact: </strong><br />
The last item we discuss is that this performance schema comes with a cost. It does have visible performance impact. A simple dbt2 test with 50 warehouse, 32 connections on a server with 2G buffer pool size show about 8% performance impact with all performance schema events turned on. This is also confirmed by some sysbench&#8217; tests.</p>
<p>In fact, to minimize performance impact, performance schema allows you turn off counting on individual event with SETUP tables,  you can use SETUP_CONSUMERS to turn on/off logging into history table etc., and SETUP_INSTRUMENTS to turn on/off counting on a particular mutex/rwlock etc.  However, turning off events counting cannot completely eliminate the performance impact from the performance schema. This is something the performance schema to be improved upon.</p>
<p><strong>Summary:</strong><br />
In summary, we are providing a rich set of mutex, rwlock, I/O and thread usage information through Performance Schema instrumentations. It can be used to diagnose server performance bottlenecks, find possible hot spots in the server as well as gain a better understanding on system behavior/access pattern on modules where these mutexes/rwlocks reside in. However, it does come with a cost to server performance itself. So this is more suitable for development server tuning and studying. You might want to leave this out for any production server.</p>
<p>-Jimmy Yang</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/innodb-performance-schema/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Better Scalability with Multiple Rollback Segments</title>
		<link>http://blogs.innodb.com/wp/2010/04/innodb-multiple-rollback-segments/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/innodb-multiple-rollback-segments/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 09:25:12 +0000</pubDate>
		<dc:creator>Sunny Bains</dc:creator>
				<category><![CDATA[Bug fix]]></category>
		<category><![CDATA[Feature]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=485</guid>
		<description><![CDATA[Background
The original motivation behind this patch was the infamous Bug#26590 &#8211; MySQL does not allow more than 1023 open transactions. Actually the 1024 limit has to do with the number of concurrent update transactions that can run within InnoDB.  Where does this magic number come from ? 1024 is the total number of UNDO [...]]]></description>
			<content:encoded><![CDATA[<div><strong>Background</strong></div>
<p>The original motivation behind this patch was the infamous Bug#<a href="http://bugs.mysql.com/bug.php?id=26590">26590</a> &#8211; <em>MySQL does not allow more than 1023 open transactions</em>. Actually the 1024 limit has to do with the number of concurrent update transactions that can run within InnoDB.  Where does this magic number come from ? 1024 is the total number of UNDO log list slots on one rollback segment header page. And in the past InnoDB created just one rollback segment header page during database creation. This rollback segment header page is anchored in the system header page, there is space there for 128 rollback segments but only one was being created and used resulting in the 1024 limit. Each slot in the rollback segment header array comprises of <em>{space_id, page_no}</em>, where both space_id and page_no are of type uint32_t . Currently the space id is &#8220;unused&#8221; and always points to the system table space, which is tablespace 0. Now, onto the rollback segment header page. This page contains a rollback segment header (details of which are outside the scope of this blog entry <img src='http://blogs.innodb.com/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ), followed by  an array of 1024 UNDO slots.  Each slot is  the base node of a file based linked list of UNDO logs.  Each node in this file based list  contains UNDO log records, containing the  data updated by  a transaction.  A single UNDO log node can contain UNDO entries from several different transactions.</p>
<p><strong>Performance ramifications</strong></p>
<p>When a transaction is started it is allocated a rollback segment to write its modifications. Multiple transactions can write to the same rollback segment but only one transaction is allowed to write to any one UNDO slot during its lifetime. This should make clear where the 1024 limit comes from. Each rollback segment is protected by its own mutex and when we have  a single rollback segment this rollback segment mutex can become  a high contention mutex.</p>
<p><strong>Requirements</strong></p>
<p>Backward compatibility in file formats is something we take very seriously at InnoDB.  InnoDB has always had the ability to use up to 128 pages but before this fix it created only one rollback segment. We had to figure out a way to make the multiple rollback segments change backward compatible, without breaking any assumptions in the code of  older versions of InnoDB about absolute locations of system pages and  changes to system data. The 128 limit is a result of the latter. While there is space for 256 rollback segments, InnoDB uses only 7 bits from that field. Once we fix that we could in the future enable 256 rollback segments, however 128 seems to be sufficient for now. There are other scalability issues that need to be addressed first before 128K concurrent transactions will become an issue <img src='http://blogs.innodb.com/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p><span id="more-485"></span></p>
<p><strong>The solution</strong></p>
<p>To keep backward compatibility the additional rollback segments are created after the double write buffer when creating the rollback segments in a new instance. We keep the total number of rollback segments set to 128 rather than 256 and set the remaining slots to NULL because the older versions will try and scan up to 256. This means that older versions of InnoDB could also benefit from these extra rollback segments.  If you decide to create the extra rollback segments with a newer version but decide to revert back to an older version the older versions should be able to use the additional rollback segments. Newer versions of InnoDB that contain this fix, will create the additional segments in existing instances only  if the innodb_force_recovery flag is not set and the database instance was shutdown cleanly. The additional segments will be created by default when creating new instances.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/innodb-multiple-rollback-segments/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>InnoDB Plugin Doc now on dev.mysql.com</title>
		<link>http://blogs.innodb.com/wp/2010/04/innodb-plugin-doc-now-on-dev-mysql-com/</link>
		<comments>http://blogs.innodb.com/wp/2010/04/innodb-plugin-doc-now-on-dev-mysql-com/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 17:53:31 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[doc]]></category>
		<category><![CDATA[Plugin]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/2010/04/innodb-plugin-doc-now-on-dev-mysql-com/</guid>
		<description><![CDATA[The InnoDB Plugin manual is now available on the MySQL web site.
]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://bit.ly/cq5IYM">InnoDB Plugin manual</a> is now available on the MySQL web site.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2010/04/innodb-plugin-doc-now-on-dev-mysql-com/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New InnoDB Plugin with MORE Performance: Thanks, Community!</title>
		<link>http://blogs.innodb.com/wp/2009/08/new-innodb-plugin4-with-more-performance-thanks-community/</link>
		<comments>http://blogs.innodb.com/wp/2009/08/new-innodb-plugin4-with-more-performance-thanks-community/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 21:33:48 +0000</pubDate>
		<dc:creator>Ken Jacobs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[contributions]]></category>
		<category><![CDATA[patches]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[third-party]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=450</guid>
		<description><![CDATA[Today, the InnoDB team announced the latest release of the InnoDB Plugin, release 1.0.4.  Some of the performance gains in this release are quite remarkable!


As noted in the announcement, this release contains contributions from Sun Microsystems, Google and Percona, Inc., for which we are very appreciative.  This page briefly describes each of the [...]]]></description>
			<content:encoded><![CDATA[<p>Today, the InnoDB team <a href="http://www.innodb.com/wp/2009/08/11/innodb-plugin-104-released/">announced</a> the latest release of the InnoDB Plugin, release 1.0.4.  Some of the performance gains in this release are quite remarkable!
</p>
<p>
As noted in the announcement, this release contains contributions from Sun Microsystems, Google and Percona, Inc., for which we are very appreciative.  <a href="http://www.innodb.com/wp/products/innodb_plugin/license/third-party-contributions-in-innodb-plugin-1-0-4/">This page</a> briefly describes each of the contributions and the way we treated them. The purpose of this post is to describe the general approach the InnoDB team takes toward third party contributions.
</p>
<p>
In principle, we appreciate third party contributions.   However, we simply don&#8217;t have the resources to seriously evaluate every change that someone proposes, but when we do undertake to evaluate a patch, we have some clear criteria in mind:</p>
<ul>
<li>The patch has to be technically sound, reliable, and effective</li>
<li>The change should fit with the architecture, and our overall plans and philosophy for InnoDB</li>
<li>The contribution must be available to us under a suitable license</li>
</ul>
<p>Let&#8217;s consider, in general terms, what these criteria mean in practice.
</p>
<p><span id="more-450"></span></p>
<p>
We have to expend a fair bit of effort to carefully evaluate and possibly modify a patch before we can include it in the release. Some of the third party contributions we&#8217;ve seen have not been portable, or have been developed just for Linux.  It can take time to find an approach that enables a platform to take advantage of a new feature, even if the platform has the required capabililties.  Some of the patches we&#8217;ve evaluated have contained actual bugs that would impact reliability, cause deadlocks or have other negative implications.   InnoDB is a clean and elegant piece of code, yet some of its internal algorithms and behaviors are subtle and complex.  Therefore, changes in the &#8220;guts&#8221; of InnoDB (or any storage engine) must be done carefully and thoroughly tested.   Some patches that have been offered make a difference, but only when compared to an inappropriate &#8220;baseline&#8221;.  At any given point, we would look to include a patch only if it makes a significant improvement over the &#8220;best&#8221; version or configuration of InnoDB available at the time.  We like to test each patch in isolation, to assess its individual value.  This requires some rigorous performance testing, with multiple workloads.
</p>
<p>
From time to time, third parties have made suggestions for changes that may seem attractive at first, but don&#8217;t make sense longer term.   In general, we may have a more comprehensive approach to a problem or requirement that we would like to implement, rather than incorporate a patch that would introduce a feature that would ultimately be made obsolete.  We prefer to have fewer &#8220;knobs&#8221; and tuning complexity, so we&#8217;re more inclined to implement heuristic, self-tuning capabilities than we are to add new configuration parameters.   Lastly, we take care to protect the ability to upgrade and downgrade user databases with the file format management features in the InnoDB Plugin.  If a patch requires an on-disk change, we will defer its incorporation until the time comes to implement a new file format.
</p>
<p>
For us to be able to make continued investment in InnoDB, we must be able to license the software commercially.   OEMs and ISVs who incorporate MySQL with InnoDB in their products may not wish to release their products in open source form.  Therefore, for each contribution we are to accept, we must have clear legal rights to the change.</p>
<p>
Beyond all that, of course, we take care to carefully document each new feature, both in terms of form and function. We try hard to explain the implications of a feature, providing information about <em>what </em>it does, and <em>when</em> and <em>where</em> to use a feature, as well as <em>how</em> to do so.   And, we generally speaking are committed to upward compatibility and support of a feature once it is introduced.</p>
<p>
It&#8217;s pretty clear that the integrity of InnoDB, with its broad adoption and importance everywhere it is used, is paramount to you and to us.  You can trust the InnoDB team to protect InnoDB now and in the future, while being open to suggestions and contributions.  Let us know if you think we&#8217;re doing a good job!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2009/08/new-innodb-plugin4-with-more-performance-thanks-community/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>InnoDB Conference Presentations Now Online</title>
		<link>http://blogs.innodb.com/wp/2009/05/innodb-conference-presentations-now-aonline/</link>
		<comments>http://blogs.innodb.com/wp/2009/05/innodb-conference-presentations-now-aonline/#comments</comments>
		<pubDate>Wed, 13 May 2009 00:30:48 +0000</pubDate>
		<dc:creator>Ken Jacobs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Concurrency]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[internals]]></category>
		<category><![CDATA[Locking]]></category>
		<category><![CDATA[Plugin]]></category>
		<category><![CDATA[Recovery]]></category>
		<category><![CDATA[Row Formats]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=434</guid>
		<description><![CDATA[

Well, it took us a little while (we&#8217;ve been busy   !), but we&#8217;ve now posted our presentations on InnoDB from the MySQL Conference and Expo 2009.  You can download these presentations by Heikki Tuuri, Ken Jacobs and Calvin Sun from the InnoDB website, as follows:

Ken and Heikki: InnoDB: Innovative Technologies for Performance [...]]]></description>
			<content:encoded><![CDATA[<p>
<p>
Well, it took us a little while (we&#8217;ve been busy <img src='http://blogs.innodb.com/wp/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  !), but we&#8217;ve now posted our presentations on InnoDB from the MySQL Conference and Expo 2009.  You can download these presentations by <a href="http://www.mysqlconf.com/mysql2009/public/schedule/speaker/1311">Heikki Tuuri</a>, <a href="http://www.mysqlconf.com/mysql2009/public/schedule/speaker/1312">Ken Jacobs</a> and <a href="http://www.mysqlconf.com/mysql2009/public/schedule/speaker/12396">Calvin Sun</a> from the InnoDB website, as follows:</p>
<ul>
<li>Ken and Heikki: <a href="http://www.innodb.com/wp/wp-content/uploads/2009/05/innovative-technologies-final.pdf">InnoDB: Innovative Technologies for Performance and Data Protection</a></li>
<li>Heikki: <a href="http://www.innodb.com/wp/wp-content/uploads/2009/05/innodbcrashrecovery-final.pdf">Crash Recovery and Media Recovery in InnoDB</a></li>
<li>Heikki: <a href="http://www.innodb.com/wp/wp-content/uploads/2009/05/concurrencycontrol.pdf">Concurrency Control: How it Really Works</a></li>
<li>Calvin and Heikki: <a href="http://www.innodb.com/wp/wp-content/uploads/2009/05/innodb-file-formats-and-source-code-structure.pdf">InnoDB File Formats and Source Code Structure</a></li>
</ul>
<p>The description of these and other presentations about InnoDB are available <a href="http://www.innodb.com/products/innodb/info/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2009/05/innodb-conference-presentations-now-aonline/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Only God can make random selections</title>
		<link>http://blogs.innodb.com/wp/2009/04/only-god-can-make-random-selections/</link>
		<comments>http://blogs.innodb.com/wp/2009/04/only-god-can-make-random-selections/#comments</comments>
		<pubDate>Fri, 17 Apr 2009 22:51:28 +0000</pubDate>
		<dc:creator>Vasil Dimov</dc:creator>
				<category><![CDATA[InnoDB Builtin]]></category>
		<category><![CDATA[InnoDB Plugin]]></category>

		<guid isPermaLink="false">http://blogs.innodb.com/wp/?p=392</guid>
		<description><![CDATA[Recently, it was reported (see MySQL bug #43660) that &#8220;SHOW INDEXES/ANALYZE does NOT update cardinality for indexes of InnoDB table&#8221;.   The problem appeared to happen only on 64-bit systems, but not 32-bit systems.  The bug turns out to be a case of mistaken identity.  The real criminal here wasn&#8217;t the SHOW [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, it was reported (see MySQL <a href="http://bugs.mysql.com/43660">bug #43660</a>) that &#8220;SHOW INDEXES/ANALYZE does NOT update cardinality for indexes of InnoDB table&#8221;.   The problem appeared to happen only on 64-bit systems, but not 32-bit systems.  The bug turns out to be a case of mistaken identity.  The real criminal here wasn&#8217;t the SHOW INDEXES or the ANALYZE command, but something else entirely.  It wasn&#8217;t specific to 64-bit platforms, either.  Read on for the interesting story about this mystery and its solution &#8230;</p>
<p>InnoDB estimates statistics for the query optimizer by picking random pages from an index.  Upon detailed analysis, we found that the algorithm that picks random pages for estimation always picked the same page, thus producing the same result every time.    This made it appear that the index cardinality was not updated by ANALYZE TABLE. Going deeper, the reason the algorithm always selected the same page was that the random number generator always generated numbers that, when divided by 3, always gave the same remainder (2).</p>
<p>The sampling algorithm selects a random leaf page by starting from the root page and then selecting a random record from it, descending into its child page and so on until it reaches a leaf page. In the particular case that was reported in the bug report, the root page contained only 3 records and the tree height was only 2 (i.e., the leaf pages were all just below the root page).</p>
<p>You can already guess what happened.  The &#8220;random&#8221; numbers generated, not being so random, caused the algorithm to always pick the same record from the root page (the second one) and then descend to the leaf page below it. Every time.  So, the 8 random pages that were sampled in order to get an estimate of the whole picture were in fact the same page, even in isolated ANALYZE TABLE runs.  </p>
<p>So, clearly there was a problem with the random number generator.  But why didn&#8217;t this problem seem to appear on 64-bit platforms?   It would have, had we only enough time to wait.  The random number generator, always generating numbers like 3k+2 of type unsigned long, at some point wrapped around 4 billion on 32-bit machines and started generating numbers like 3k+1. On 64-bit machines, where unsigned long is much bigger, this wrap did not occur.   But it would have occurred if we ran the test for 1000 years!. </p>
<p><span id="more-392"></span></p>
<p>So, on 32-bit machines, at some point the first record from the root page was picked instead of the second one, and this caused some changes in the results produced by ANALYZE TABLE. Yet, on 64-bit machines, for all practical purposes, this &#8220;never&#8221; happens.  By only looking at the symptoms, one would get the impression that the flaw existed only on 64-bit machines and that 32-bit systems were ok.</p>
<p>Well, what about the fix?  A possible fix would be to change InnoDB in 64-bit environments to behave the same way it does in 32-bit environments. People who are used to the behavior of InnoDB on 32-bit machines and upgrade to a 64-bit machine might be satisfied, because the problem on 64-bit systems was &#8220;solved&#8221;.  But in reality, this approach in no way would fix the underlying problem.  The real solution is to replace the random number generator with a better one (fully realizing that algorithmic random number generators are only <em>pseudo</em>-random number generators).</p>
<p>Yet even that is not so simple.  Making any change would have caused changes to index cardinality estimations, thereby causing changes in decisions made by the optimizer, resulting in different execution plans &#8230; and different, possibly worse, performance for queries. Because MySQL 5.1 and 5.0 are frozen for such drastic changes, we fixed this bug in the upcoming 1.0.4 release of the <a href="http://www.innodb.com/innodb_plugin/">InnoDB Plugin</a>.</p>
<p>In order to not break existing applications, and since many people wanted a fix for MySQL 5.0 and 5.1, we implemented this fix for MySQL under the control of a new configuration parameter (innodb_use_legacy_cardinality_algorithm), which is turned on by default, preserving past behavior.  Because the &#8220;right fix&#8221; is to permanently change the random number generator, this new configuration parameter is not present in the InnoDB Plugin, and the &#8220;more random&#8221; random number generator will always be used.</p>
<p>And that is the end of the case of mistaken identity.   It turns out that it is really hard to generate truly random numbers, hence the title of this blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.innodb.com/wp/2009/04/only-god-can-make-random-selections/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
