Back to Blog
Nov 07 2014
Why Mydumper?How many times in your last web development project have you had to load a mysql/mariadb database? If your answer was "too many", and you've been frustrated by how slow the process can be, this article may be for you. "Mydumper" and its companion "myloader", are relatively new open source database utilities written by Domas Mituzas and colleagues, providing built-from-scratch reimplementations of
mysqlload functionality. They clean up a bunch of command line inconsistencies found in the mysql utilities, making them easier to script. They also fix some errors and omissions. But their main feature is speed. By parallelizing the operation of dumping and loading database tables, quite dramatic speedups can be obtained. An additional advantage (note the standard mysql utilities can be coerced to do this too) is that the database dumps are in one-file-per-table format in a directory, making it possible to version control (with appropriate care) your DB. This also makes keeping two large remote databases in sync a little faster (e.g. sync from production to staging) since utilities like
rsynccan compute smaller deltas.
Some example numbersSample Database: a real world mySQL Drupal database with a 1.9GB database dump file, totalling over 15 million rows. Environment: 4 x Xeon(R) CPU E5-2680 v2 @ 2.80GHz running Ubuntu 14.04LTS mysqldump foo > dump.sql: real: 1m5.197s mysql foo < dump.sql: real: 10m5.720s mydumper: real: 0m37.230s myloader: real: 3m17.649s As we can see, for this combination of database and environment, mydumper is roughly twice as fast as mysqldump, and myloader is more than three times faster than importing using mysql as a client. In our sample database, there are several large logging/debugging tables, which are significantly larger than other tables. The task associated to dumping or loading these will determine the running time of the entire job, no matter how many independent threads we have available (c.f. Amdahl's law). With many tables of a similar size, roughly linear speedup (with respect to the number of cores) can be expected. For example, this benchmarking article at palominodb.com suggests that on a 2x6 core machine, myloader was able to restore a 31GB database in 9m12.548s. So, if you find yourself needing to dump and restore very large databases, or dump and restore databases quite frequently (think automated testing and continuous integration), mydumper and myloader might be the tools you've been looking for.
Obtaining and installing MydumperThe good news is that Mydumper is available as a package for most popular Linux distros, convenient for installation on your server. It is also apparently available using MacPorts, though I have not tried this. The bad news is that if you're not a MacPorts users, but you still you want to install mydumper on MacOS X, you'll have to do a bit more work. There used to be a convenient homebrew formula for mydumper, but it ran into issues with the packaging of the homebrew mysql recipe. Nevertheless, brew makes the installation of mydumper relatively straightforward. Here's an overview of the process:
- Download the mydumper source code.
- Use homebrew to install the build tools and pre-requisites.
- Download mysql source code (to get the one file missing from the homebrew installation).
- Move this file to a location visible to the mydumper build process.
- Build mydumper using the supplied build configuration.
- Run some tests to make sure it built without obvious errors.
- Install the resulting mydumper and myloader binaries locally.
MacOS X build in more detailThis is assuming you have Homebrew set up and working properly, and that it installs files in the default location (
- Download the mydumper source code (generic linux) from https://launchpad.net/mydumper.Note: I had success with version 0.6.1, but 0.6.2 introduces some changes that I've not been able to work around yet on MacOS X.
- Unpack to MYDUMPER_SOURCE_DIR
- Make sure you have the necessary dependencies installed. These can all be obtained using Homebrew as follows: % brew tap homebrew/dupes % brew install cmake % brew install glib % brew install zlib % brew install pcre % brew install pgconfig % brew install mysql Note: the brew version of mysql I have is 5.6.21.
- Download the source code Generic Linux (Architecture Independent), Compressed TAR Archive distribution for MySQL
- Unpack to MYSQL_SOURCE_DIR (in this example
mysql-5.6.21) % tar -zxf mysql-5.6.21.tar.gz % ls mysql-5.6.21-tar.gz mysql-5.6.21/
hash.hfrom the MySQL source to the brew mysql installation % cp MYSQL_SOURCE_DIR/include/hash.h /usr/local/Cellar/mysql/5.xx.yy/include/mysql/hash.h
- Go to the MYDUMPER_SOURCE_DIR. Follow the instructions in the README to build mydumper (the build should complete without errors). % cd ~/mydumper-0.6.1 % cmake -DCMAKE_INSTALL_PREFIX=/opt/local \ -DZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.8/lib/libz.dylib \ -DGLIB2_LIBRARIES=/usr/local/Cellar/glib/2.42.0/lib/libglib-2.0.0.dylib \ -DBUILD_DOCS=OFF % make Replace the paths to
libglibwith the actual paths to the versions installed in
/usr/local/Cellarby homebrew. The option
makewhere to install the resulting files. The given prefix will install the binaries in
/opt/local/bin, a reasonable option if your homebrew is based in
- Try some tests % ./mydumper -V mydumper 0.6.1, built against MySQL 5.6.21 % ./mydumper --database=mysql --user=root \ --password=mysql_rootpw --outputdir=~/dumpertest % ls -d ~/dumpertest /Users/imagex/dumpertest/ Verify that the
dumpertestdirectory is created in your home directory and contains one SQL file per table present in the mysql database. % mysqladmin -u root -p create myloader_testdb Enter password: mysql_rootpw % ./myloader --database=myloader_testdb --directory=/tmp/foobar \ --user=root --password='mysql_rootpw' Explore the
myloader_testdbdatabase to verify that it has the structure and contents you expect.
- If you are satisfied with the results of your test, install the binaries. % sudo make install
Coming up nextIn my next post, I'll show how to make
drushto better integrate with Drupal development processes, and how this can be applied to automated testing and continuous integration workflows.