Introduction
The InnoDB UNDO entries reside in a special system table called the UNDO log. This log is made up of several segments. These segments are called rollback segments. A segment in InnoDB is similar to what a file would be in a file system,e.g., user tables and indexes are also stored as separate segments within the same tablespace, only their format is different. In that sense there is nothing special about InnoDB UNDO logs. This feature allows storing of the UNDO log across several tablespaces.
Purpose
UNDO logs contain the before image of modified records. There are two types of UNDO records, one for insert and another for updates. The insert UNDO records can be discarded on transaction rollback. The update records are used for rollback, MVCC and by purge. It is because of purge that we can’t just remove the UNDO log records once the UNDO logs are not referenced by any running transaction’s snapshot view (for MVCC). When a transaction is started it is assigned a rollback segment in a round robin scheme. Multiple transactions can be assigned the same rollback segment to write their changes. Up to a maximum of 1024 transactions can use a the same rollback segment. If you have more rollback segments then there is less contention around the rollback segment mutex.
Purge
The purge thread(s) run in the background and they read the UNDO log records from oldest to the latest, up to but not including the oldest active snapshot view in the system. It parses the UNDO log records and for entries that represent delete operations it uses the stored index key to search for the records in both the secondary and primary index and purges the entries, modifying the index tree structure if required. Normal DML operations simply delete mark the records but they don’t physically purge them, unless it is an insert that is being rolled back. This is to avoid expensive tree modifying operations in DML code. Once purge is finished with the UNDO entries it then truncates the UNDO log up to where it has processed the entries. For MVCC user transactions we need to follow the DATA_ROLL_PTR pointer to the UNDO log to build a previous version of the row.
Why did we make this change
All this activity results in lots of RANDOM IO. In order to reduce the of random IO this feature helps by allowing the user to place the UNDO logs anywhere in the file system. This flexibility can be used to place them on an SSD for example.
New configuration variables
- innodb_undo_dir – This should be set to the directory where all the configured UNDO tablespaces will be created. Can be an absolute path too. Default is “.”.
- innodb_undo_tablespaces – This should be set to the number of UNDO tabespaces that you want to created. Note: This must be set at the time of database creation. Once created this cannot be changed.
- innodb_undo_logs (renamed from innodb_rollback_segments) – This should be set to the minimum number of rollback segments that you want. More can be added later but they cannot be removed. However, setting it to a smaller value at runtime will ensure that only the number set will be used when assigning to transactions.
There are a few gotchas here that you should be aware off. The innodb_undo_tablespaces can be set only at the time of database creation and the value must be the same for the lifetime of the database. More UNDO tablespaces cannot be added and existing tablespaces cannot be dropped. We may add better UNDO tablespace management in the future.
Case 1. 128 segments spread across 128 segments.
- innodb_undo_logs = 128
- innodb_undo_tablespaces = 127 (we always create one log in tablespace 0, this is to avoid breaking the system tablespace layout)
Case 2. Start with one rollback segment but may want to create more later and spread them across say 4 UNDO tablespaces
- innodb_undo_tablespaces = 4
- innodb_undo_logs = 1
This will create the UNDO tablespaces but they will be empty, the single rollback segment that is created in tablespace 0 will work. Later if you want to add more tablespaces. You will need to shutdown the server and set the innodb_undo_logs variable to the number that you want. Lets say we add 2 more spaces. We would restart the server with the following parameters:
- innodb_undo_tablespaces = 4
- innodb_undo_logs = 3 - This will create two new rollback segments in the UNDO tablespace 1 and UNDO tablespace 2. Also this will disable the use of the rollback segment in tablespace 0.
If there were some UNDO records in rollback segment 1 they will be processed as normal. Only change is that no new ones will be written to rollback segment 1. It will be skipped when assigning rollback segments to new transactions.
Future work
Mapping of tablespace id to tablespace name is something that we would like to add so that users can change the path of individual tablespaces instead of all being lumped together in innodb_undo_dir. Managing these extra tablespaces is also something that we think is important, especially the adding and dropping of UNDO tablespaces including better management of the UNDO segments.
Conclusion
As we have seen writing to the UNDO logs and accessing them can result in a lot of random IO. One way to reduce that overhead is to place the UNDO tablespaces on SSD. This feature allows the placement of the UNDO logs anywhere on the file system in a directory that is controlled by the configuration variable innodb_undo_dir. Note: this change is not backward compatible, databases created with this feature cannot be opened by older versions of InnoDB. The main problems are that the old code doesn’t open these new system tablespaces before applying REDO and secondly there are some checks that assume space id can only be 0. The file format is not the problem as such.
July 26th, 2011 at 7:41 am
[...] similar to what a file would be in a file system,e.g., user tables and indexes are also stored… Read more… Categories: MySQL Share | Related [...]
January 14th, 2012 at 5:07 am
In the topic of undo records . I was wondering ( because i didnt see it in the code) is there a good place where one could add a new session sys_var that would allow the skipping for reuse or assign of an undo_no. Specfically I would like to make something like
set disable_undo=1;
create table t2
select * from t1 where x y ;
If this query was to fail the table would be removed anyhow, but this lets us specifically tell innodb not to bother with undo segments.
Another example might be
/*copy t1 to t1_old before hand*/
$query=’set disable_undo=1;
update t1 set userid=(select newid from user_conversion where legacyid=userid);’;
$db->query($query) or $db->query(“Rename table t1 to t1_failed;rename table t1_old to t1″);
echo “Change failed reverted to previous version please check logs”;
Allowing you to do an update for a massive data alteration that you know you want to bypass the rollback/undo phase for to be much faster.
In this example I would much rather have a very fast process that even if it fails I could just rename my tables back and report the change window a failure.
Having the undo segments would case the table to be locked and also would require 2x the io and storage as the undo would save all those rows into the shared tablespace.
[Reply]
January 14th, 2012 at 6:06 am
David,
If I understand your comment, you are asking for two things:
1. Transactional DDL for temp tables within a session
2. Avoid UNDO altogether for certain RW queries.
The problems with disabling the UNDO log write on a session basis for all tables touched by that session are:
1. The session that disables writing to the UNDO log will break MVCC for other sessions too, those that are reading and writing the global tables.
2. Session changes will not be garbage collected by purge, leading to table bloat.
It may however be possible to do something for session specific temp tables.
For [1], the DDL changes need to be rolled back on transaction commit/rollback. This will require persistent tracking of the operation so that it can be undone in case of crash recovery in the worst case. Therefore it will require some sort of logging and the UNDO log is IMO a natural fit. In the case of session specific temp tables we can simply drop the temp table during the recover phase and remove the table from the InnoDB data dictionary.
For [2], for temp tables that are session specific it may be possible to avoid UNDO logging. This is because the changes to the temp table are not visible to other transactions, therefore MVCC is irrelevant. The only gotcha is that if there are delete marked records in the temp table, purge will not be able to do any garbage collection, but that may be an acceptable trade off.
Hope that helps.
[Reply]
January 14th, 2012 at 7:22 pm
The issue is when we have a table say 80G in size….
By doing something like
create table xxx select t1.c1,t1.24,t2.c1,t2.c8,/*some other columns*/…. from t1 join t2 on t1.c99=t2.c1;
If this fails the undo system actually is removing 1 row at a time as each row was a new undo entry.
In such a case we should really just drop that table that was created rather than do rollback of the insert segments. This could also be of interest in a case where you are creating a table ten inserting into a table vs create select.
I fail to see how this would break MVCC for all other sessions, as you are simply bypassing adding the undo entries, as you know if err is flagged you could then issue drop table XXX internally to clean up things.
As this is a create table no other sessions would be able to touch the table yet do to the table not row level lock that is occurring.
I felt a sys_var would be simpler than asking for an addition to the language construct. We could limit this action by having something like
CREATE DIRTY TABLE xxxx select t1.c1,t1.24,t2.c1,t2.c8,/*some other columns*/…. from t1 join t2 on t1.c99=t2.c1;
or
CREATE TABLE xxx (…);
INSERT DIRTY into xxx select t1.c1,t1.24,t2.c1,t2.c8,/*some other columns*/…. from t1 join t2 on t1.c99=t2.c1;
however it has typical been the standard to prefer a sys_var to a yacc and language type change.
IF we wanted to be safe and you were doing a UPDATE/DELETE DIRTY you could even have that keyword escalate it to a table level exclusive lock, preventing any other transaction from reading or touching the data.
This really comes back to the fact there are some specific cases for use. Like where you might know you don’t need/want the full ACID level inside of innodb for an expensive action , however you would like to suspend that for a very specific item. Currently the all or nothing approach is a bit to black/white when you have large data-sets.
[Reply]
March 9th, 2012 at 1:10 pm
To storage session best idea is storing it in memcached servers.
It do very good performance and simply use i php scripts.
[Reply]