Difference between revisions of "iRefIndex Maintenance"

From irefindex
(→‎Dumping and Restoring Databases: Added background-capable commands.)
(Added note.)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{Note|
 +
This page describes maintenance issues related to the code supporting iRefIndex release 9 and earlier.
 +
}}
 +
 
When [[iRefIndex Build Process|building iRefIndex]] there can be a need to manage the database system and to assess whether enough disk space is available. The use of MySQL's single tablespace can lead to a very large single file in the filesystem that can appear to use most of the available space (as reported by <tt>df -h</tt>):
 
When [[iRefIndex Build Process|building iRefIndex]] there can be a need to manage the database system and to assess whether enough disk space is available. The use of MySQL's single tablespace can lead to a very large single file in the filesystem that can appear to use most of the available space (as reported by <tt>df -h</tt>):
  
Line 23: Line 27:
 
* [http://vdachev.net/2007/02/22/mysql-reducing-ibdata1/ MySQL: Reducing ibdata1]
 
* [http://vdachev.net/2007/02/22/mysql-reducing-ibdata1/ MySQL: Reducing ibdata1]
 
* [http://dev.mysql.com/doc/refman/5.0/en/innodb-configuration.html MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration]
 
* [http://dev.mysql.com/doc/refman/5.0/en/innodb-configuration.html MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration]
 +
* [http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration :: 13.2.2.1 Using Per-Table Tablespaces]
  
 
Unfortunately, reducing the database footprint on the disk requires substantial administrative work in single tablespace mode.
 
Unfortunately, reducing the database footprint on the disk requires substantial administrative work in single tablespace mode.
Line 63: Line 68:
 
nohup mysql -h <host> -u <username> --password=<password> < <dump file> > <log file> 2>&1 &
 
nohup mysql -h <host> -u <username> --password=<password> < <dump file> > <log file> 2>&1 &
 
</pre>
 
</pre>
 +
 +
== Promising but Ultimately Unusable Solutions ==
 +
 +
Alternatives to dumping and dropping all InnoDB databases, deleting the shared tablespace, then restoring the databases do not really exist. Here are some approaches that almost offered an alternative.
 +
 +
=== Moving Tables into Separate Table Files ===
 +
 +
To move existing tables into separate files, the server must be stopped:
 +
 +
mysqladmin -u root --password=<password> shutdown
 +
 +
Then, the <tt>my.cnf</tt> file needs to be changed to enable the single file per table mode. In the <tt>[mysqld]</tt> section, the following line can be added:
 +
 +
innodb_file_per_table
 +
 +
The server can then be started:
 +
 +
mysqld_safe &
 +
 +
At this point, existing tables residing in the single, common table storage file will continue to be accessible, and only new tables will be created in separate files, but it is possible to move existing tables into separate files by either using an <tt>alter table</tt> command on each table or by running <tt>[http://dev.mysql.com/doc/refman/5.0/en/mysqlcheck.html mysqlcheck]</tt> as follows:
 +
 +
mysqlcheck --optimize --databases <database>...
 +
 +
This is discussed in the user comments of the [http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html Using Per-Table Tablespaces] section of the MySQL documentation.
 +
 +
After this process has completed, a directory for the database will appear inside the directory specified by <tt>datadir</tt> in the defaults (<tt>my.cnf</tt>) file. However, the shared tablespace (<tt>ibdata1</tt>) will not be reduced or removed, and InnoDB will retain state information about the tables, regardless of their location, such that removing the shared tablespace will render the tables inaccessible.
 +
 +
Unfortunately, the InnoDB "data dictionary" maintained in the <tt>ibdata1</tt> file cannot be easily replaced, and although various hacks exist (such as [http://www.chriscalender.com/?p=28 Recovering an InnoDB table from only an .ibd file.]) to attempt to register separate tables with a new, clean "data dictionary", the process seems somewhat speculative.
 +
 +
=== XtraDB-related Tools ===
 +
 +
[http://www.percona.com/docs/wiki/percona-xtrabackup:xtrabackup_manual XtraBackup] and [http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_expand_import innodb_expand_import] should together be able to handle the migration of .ibd files. Unfortunately, the software does not even build according to a variety of ways of interpreting the build instructions.
 +
 +
== Recommendations ==
 +
 +
A full version of iRefIndex 7.0 requires approximately 150GB in the database. With only two full versions residing in MySQL, approximately two thirds of the 475GB volume will be used, leaving around 150GB space. When importing data into the databases, collections of data in the filesystem require around 60GB per version, meaning that if this data is to be situated locally - for improved performance - there needs to be enough local space for at least one of these collections at build-time. When dumping data, over 100GB is required for the output files, which should also be situated locally if this process is to run at the maximum speed.
 +
 +
Thus, the 475GB volume can be used as follows:
 +
 +
* 150GB - 170GB (one iRefIndex version) for MySQL
 +
* or 300GB - 340GB (two iRefIndex versions) for MySQL
 +
* 100GB - 120GB for a single database dump
 +
* 60GB - 70GB for source data
 +
 +
Clearly, avoiding two co-resident versions of iRefIndex in MySQL is desirable.
 +
 +
Once a database is dropped from MySQL, although this space should be reusable by MySQL, it will not be recovered for general filesystem use. Thus, a policy is required of removing databases that do not need to reside on cn1, and since the resources of cn1 are only really needed when building databases or generating products from databases, should a live instance of a database need to be available, it could arguably reside on another machine.
 +
 +
Removing source data and database dumps after use can also help to avoid disk space issues. Source files can be copied from other machines when needed, removed after use, and any output or dump files can be moved away after being generated.
 +
 +
== Related Useful Resources ==
 +
 +
* [http://www.debian-administration.org/articles/442 Resetting a forgotten MySQL root password]
  
 
[[Category:iRefIndex]]
 
[[Category:iRefIndex]]

Latest revision as of 16:05, 26 October 2012

NoteNote

This page describes maintenance issues related to the code supporting iRefIndex release 9 and earlier.

When building iRefIndex there can be a need to manage the database system and to assess whether enough disk space is available. The use of MySQL's single tablespace can lead to a very large single file in the filesystem that can appear to use most of the available space (as reported by df -h):

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             996M  575M  370M  61% /
/dev/sda3              24G  885M   22G   4% /biotek/cn1/programs
/dev/sda8             475G  443G  8.0G  99% /biotek/cn1/storage
tmpfs                 7.9G     0  7.9G   0% /dev/shm
/dev/sda5             9.5G  151M  8.9G   2% /tmp
/dev/sda6             7.6G  3.2G  4.1G  44% /usr
/dev/sda7             1.5G  437M  941M  32% /var

The file itself will look like this (as reported by ls -lh):

-rwxr-xr-x 1 nobody nobody 440G Feb 17 13:39 /biotek/cn1/storage/mysql/var/ibdata1

The following resources describe the situation and potential solutions:

Unfortunately, reducing the database footprint on the disk requires substantial administrative work in single tablespace mode.

Useful Common Options

When dealing with multiple instances of MySQL it can be useful to remember the --defaults-file option to the MySQL tools. For example:

mysql --defaults-file=/home/mysql/etc/my.cnf -h localhost -u root -p -A

This option is omitted from the examples given below.

Dumping and Restoring Databases

MySQL supports SQL and delimited/tabular dumps. Although the latter is arguably more elegant, it does not offer much help with the task of restoring the tables in the correct order so that foreign key constraints are always satisfied. Thus, only the SQL-based dump format is discussed here.

To dump a database:

mysqldump -h <host> -u <username> -p --databases <database>... > <dump file>

To do so in the background:

nohup mysqldump -h <host> -u <username> --password=<password> --databases <database>... > <dump file> 2> <log file> &

To restore a database:

mysql -h <host> -u <username> -p < <dump file>

To do so in the background:

nohup mysql -h <host> -u <username> --password=<password> < <dump file> > <log file> 2>&1 &

Promising but Ultimately Unusable Solutions

Alternatives to dumping and dropping all InnoDB databases, deleting the shared tablespace, then restoring the databases do not really exist. Here are some approaches that almost offered an alternative.

Moving Tables into Separate Table Files

To move existing tables into separate files, the server must be stopped:

mysqladmin -u root --password=<password> shutdown

Then, the my.cnf file needs to be changed to enable the single file per table mode. In the [mysqld] section, the following line can be added:

innodb_file_per_table

The server can then be started:

mysqld_safe &

At this point, existing tables residing in the single, common table storage file will continue to be accessible, and only new tables will be created in separate files, but it is possible to move existing tables into separate files by either using an alter table command on each table or by running mysqlcheck as follows:

mysqlcheck --optimize --databases <database>...

This is discussed in the user comments of the Using Per-Table Tablespaces section of the MySQL documentation.

After this process has completed, a directory for the database will appear inside the directory specified by datadir in the defaults (my.cnf) file. However, the shared tablespace (ibdata1) will not be reduced or removed, and InnoDB will retain state information about the tables, regardless of their location, such that removing the shared tablespace will render the tables inaccessible.

Unfortunately, the InnoDB "data dictionary" maintained in the ibdata1 file cannot be easily replaced, and although various hacks exist (such as Recovering an InnoDB table from only an .ibd file.) to attempt to register separate tables with a new, clean "data dictionary", the process seems somewhat speculative.

XtraDB-related Tools

XtraBackup and innodb_expand_import should together be able to handle the migration of .ibd files. Unfortunately, the software does not even build according to a variety of ways of interpreting the build instructions.

Recommendations

A full version of iRefIndex 7.0 requires approximately 150GB in the database. With only two full versions residing in MySQL, approximately two thirds of the 475GB volume will be used, leaving around 150GB space. When importing data into the databases, collections of data in the filesystem require around 60GB per version, meaning that if this data is to be situated locally - for improved performance - there needs to be enough local space for at least one of these collections at build-time. When dumping data, over 100GB is required for the output files, which should also be situated locally if this process is to run at the maximum speed.

Thus, the 475GB volume can be used as follows:

  • 150GB - 170GB (one iRefIndex version) for MySQL
  • or 300GB - 340GB (two iRefIndex versions) for MySQL
  • 100GB - 120GB for a single database dump
  • 60GB - 70GB for source data

Clearly, avoiding two co-resident versions of iRefIndex in MySQL is desirable.

Once a database is dropped from MySQL, although this space should be reusable by MySQL, it will not be recovered for general filesystem use. Thus, a policy is required of removing databases that do not need to reside on cn1, and since the resources of cn1 are only really needed when building databases or generating products from databases, should a live instance of a database need to be available, it could arguably reside on another machine.

Removing source data and database dumps after use can also help to avoid disk space issues. Source files can be copied from other machines when needed, removed after use, and any output or dump files can be moved away after being generated.

Related Useful Resources