iRefIndex Maintenance
Note |
This page describes maintenance issues related to the code supporting iRefIndex release 9 and earlier. |
When building iRefIndex there can be a need to manage the database system and to assess whether enough disk space is available. The use of MySQL's single tablespace can lead to a very large single file in the filesystem that can appear to use most of the available space (as reported by df -h):
Filesystem Size Used Avail Use% Mounted on /dev/sda1 996M 575M 370M 61% / /dev/sda3 24G 885M 22G 4% /biotek/cn1/programs /dev/sda8 475G 443G 8.0G 99% /biotek/cn1/storage tmpfs 7.9G 0 7.9G 0% /dev/shm /dev/sda5 9.5G 151M 8.9G 2% /tmp /dev/sda6 7.6G 3.2G 4.1G 44% /usr /dev/sda7 1.5G 437M 941M 32% /var
The file itself will look like this (as reported by ls -lh):
-rwxr-xr-x 1 nobody nobody 440G Feb 17 13:39 /biotek/cn1/storage/mysql/var/ibdata1
The following resources describe the situation and potential solutions:
- MySQL Forums :: Database Administration :: how to shrink a MySQL database
- MySQL: Reducing ibdata1
- MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration
- MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration :: 13.2.2.1 Using Per-Table Tablespaces
Unfortunately, reducing the database footprint on the disk requires substantial administrative work in single tablespace mode.
Contents
Useful Common Options
When dealing with multiple instances of MySQL it can be useful to remember the --defaults-file option to the MySQL tools. For example:
mysql --defaults-file=/home/mysql/etc/my.cnf -h localhost -u root -p -A
This option is omitted from the examples given below.
Dumping and Restoring Databases
MySQL supports SQL and delimited/tabular dumps. Although the latter is arguably more elegant, it does not offer much help with the task of restoring the tables in the correct order so that foreign key constraints are always satisfied. Thus, only the SQL-based dump format is discussed here.
To dump a database:
mysqldump -h <host> -u <username> -p --databases <database>... > <dump file>
To do so in the background:
nohup mysqldump -h <host> -u <username> --password=<password> --databases <database>... > <dump file> 2> <log file> &
To restore a database:
mysql -h <host> -u <username> -p < <dump file>
To do so in the background:
nohup mysql -h <host> -u <username> --password=<password> < <dump file> > <log file> 2>&1 &
Promising but Ultimately Unusable Solutions
Alternatives to dumping and dropping all InnoDB databases, deleting the shared tablespace, then restoring the databases do not really exist. Here are some approaches that almost offered an alternative.
Moving Tables into Separate Table Files
To move existing tables into separate files, the server must be stopped:
mysqladmin -u root --password=<password> shutdown
Then, the my.cnf file needs to be changed to enable the single file per table mode. In the [mysqld] section, the following line can be added:
innodb_file_per_table
The server can then be started:
mysqld_safe &
At this point, existing tables residing in the single, common table storage file will continue to be accessible, and only new tables will be created in separate files, but it is possible to move existing tables into separate files by either using an alter table command on each table or by running mysqlcheck as follows:
mysqlcheck --optimize --databases <database>...
This is discussed in the user comments of the Using Per-Table Tablespaces section of the MySQL documentation.
After this process has completed, a directory for the database will appear inside the directory specified by datadir in the defaults (my.cnf) file. However, the shared tablespace (ibdata1) will not be reduced or removed, and InnoDB will retain state information about the tables, regardless of their location, such that removing the shared tablespace will render the tables inaccessible.
Unfortunately, the InnoDB "data dictionary" maintained in the ibdata1 file cannot be easily replaced, and although various hacks exist (such as Recovering an InnoDB table from only an .ibd file.) to attempt to register separate tables with a new, clean "data dictionary", the process seems somewhat speculative.
XtraBackup and innodb_expand_import should together be able to handle the migration of .ibd files. Unfortunately, the software does not even build according to a variety of ways of interpreting the build instructions.
Recommendations
A full version of iRefIndex 7.0 requires approximately 150GB in the database. With only two full versions residing in MySQL, approximately two thirds of the 475GB volume will be used, leaving around 150GB space. When importing data into the databases, collections of data in the filesystem require around 60GB per version, meaning that if this data is to be situated locally - for improved performance - there needs to be enough local space for at least one of these collections at build-time. When dumping data, over 100GB is required for the output files, which should also be situated locally if this process is to run at the maximum speed.
Thus, the 475GB volume can be used as follows:
- 150GB - 170GB (one iRefIndex version) for MySQL
- or 300GB - 340GB (two iRefIndex versions) for MySQL
- 100GB - 120GB for a single database dump
- 60GB - 70GB for source data
Clearly, avoiding two co-resident versions of iRefIndex in MySQL is desirable.
Once a database is dropped from MySQL, although this space should be reusable by MySQL, it will not be recovered for general filesystem use. Thus, a policy is required of removing databases that do not need to reside on cn1, and since the resources of cn1 are only really needed when building databases or generating products from databases, should a live instance of a database need to be available, it could arguably reside on another machine.
Removing source data and database dumps after use can also help to avoid disk space issues. Source files can be copied from other machines when needed, removed after use, and any output or dump files can be moved away after being generated.