How many times in your last web development project have you had to
load a mysql/mariadb database? If your answer was "too many", and
you've been frustrated by how slow the process can be, this article
may be for you.
"Mydumper" and its companion "myloader", are relatively new open
source database utilities written by Domas Mituzas and colleagues, providing
built-from-scratch reimplementations of
functionality. They clean up a bunch of command line inconsistencies
found in the mysql utilities, making them easier to script. They also fix
some errors and omissions. But their main feature is speed. By
parallelizing the operation of dumping and loading database tables, quite dramatic
speedups can be obtained.
An additional advantage (note the standard mysql utilities can be
coerced to do this too) is that the database dumps are in
one-file-per-table format in a directory, making it possible to
version control (with appropriate care) your DB. This also
makes keeping two large remote databases in sync a little
faster (e.g. sync from production to staging) since utilities
rsync can compute smaller deltas.
Some example numbers
Sample Database: a real world mySQL Drupal database with a 1.9GB
database dump file, totalling over 15 million rows.
Environment: 4 x Xeon(R) CPU E5-2680 v2 @ 2.80GHz running Ubuntu 14.04LTS
mysql foo < dump.sql:
As we can see, for this combination of database and environment,
mydumper is roughly twice as fast as mysqldump, and myloader is more
than three times faster than importing using mysql as a client.
In our sample database, there are several large logging/debugging
tables, which are significantly larger than other tables. The task
associated to dumping or loading these will determine the running time
of the entire job, no matter how many independent threads we have
available (c.f. Amdahl's law).
With many tables of a similar size, roughly linear speedup (with
respect to the number of cores) can be expected.
For example, this benchmarking article at palominodb.com suggests that on a
2x6 core machine, myloader was able to restore a 31GB database in
So, if you find yourself needing to dump and restore very large
databases, or dump and restore databases quite frequently (think
automated testing and continuous integration), mydumper and myloader
might be the tools you've been looking for.
Obtaining and installing Mydumper
The good news is that Mydumper is available as a package for most
popular Linux distros, convenient for installation on your server.
It is also apparently available using MacPorts, though I have not
The bad news is that if you're not a MacPorts users, but you still
you want to install mydumper on MacOS X, you'll have to do a bit
more work. There used to be a convenient homebrew formula for mydumper, but it ran into
issues with the packaging of the homebrew mysql recipe. Nevertheless,
brew makes the installation of mydumper relatively
straightforward. Here's an overview of the process:
- Download the mydumper source code.
- Use homebrew to install the build tools and pre-requisites.
- Download mysql source code (to get the one file missing from the homebrew installation).
- Move this file to a location visible to the mydumper build process.
- Build mydumper using the supplied build configuration.
- Run some tests to make sure it built without obvious errors.
- Install the resulting mydumper and myloader binaries locally.
MacOS X build in more detail
This is assuming you have Homebrew set up
and working properly, and that it installs files in the default
- Download the mydumper source code (generic linux) from https://launchpad.net/mydumper.
Note: I had success with version 0.6.1, but 0.6.2 introduces some
changes that I've not been able to work around yet on MacOS X.
- Unpack to MYDUMPER_SOURCE_DIR
- Make sure you have the necessary dependencies installed. These can
all be obtained using Homebrew as follows:
% brew tap homebrew/dupes
% brew install cmake
% brew install glib
% brew install zlib
% brew install pcre
% brew install pgconfig
% brew install mysql
Note: the brew version of mysql I have is 5.6.21.
- Download the source code Generic Linux (Architecture
Independent), Compressed TAR Archive distribution for MySQL
- Unpack to MYSQL_SOURCE_DIR (in this example
% tar -zxf mysql-5.6.21.tar.gz
hash.hfrom the MySQL source to the brew mysql
% cp MYSQL_SOURCE_DIR/include/hash.h /usr/local/Cellar/mysql/5.xx.yy/include/mysql/hash.h
- Go to the MYDUMPER_SOURCE_DIR. Follow the instructions in the README to build mydumper (the build
should complete without errors).
% cd ~/mydumper-0.6.1
% cmake -DCMAKE_INSTALL_PREFIX=/opt/local \
Replace the paths to
libglibwith the actual paths to the versions installed in
install the resulting files. The given prefix will install the
/opt/local/bin, a reasonable option if
your homebrew is based in
- Try some tests
% ./mydumper -V
mydumper 0.6.1, built against MySQL 5.6.21
% ./mydumper --database=mysql --user=root \
% ls -d ~/dumpertest
Verify that the
dumpertestdirectory is created in your home
directory and contains one SQL file per table present in the mysql
% mysqladmin -u root -p create myloader_testdb
Enter password: mysql_rootpw
% ./myloader --database=myloader_testdb --directory=/tmp/foobar \
myloader_testdbdatabase to verify that
it has the structure and contents you expect.
- If you are satisfied with the results of your test, install the binaries.
% sudo make install
Coming up next
In my next post, I'll show how to make
drush to better integrate with Drupal development processes, and
how this can be applied to automated testing and continuous
I'm grateful to Mark Labrecque for taking time to review the
instructions I've presented here, and offering useful improvements.