Why Mydumper?

How many times in your last web development project have you had to
load a mysql/mariadb database? If your answer was "too many", and
you've been frustrated by how slow the process can be, this article
may be for you.

"Mydumper" and its companion "myloader", are relatively new open
source database utilities written by Domas Mituzas and colleagues, providing
built-from-scratch reimplementations of mysqldump and mysql load
functionality. They clean up a bunch of command line inconsistencies
found in the mysql utilities, making them easier to script. They also fix
some errors and omissions. But their main feature is speed. By
parallelizing the operation of dumping and loading database tables, quite dramatic
speedups can be obtained.

An additional advantage (note the standard mysql utilities can be
coerced to do this too) is that the database dumps are in
one-file-per-table format in a directory, making it possible to
version control (with appropriate care) your DB. This also
makes keeping two large remote databases in sync a little
faster (e.g. sync from production to staging) since utilities
like rsync can compute smaller deltas.

Some example numbers

Sample Database: a real world mySQL Drupal database with a 1.9GB
database dump file, totalling over 15 million rows.

Environment: 4 x Xeon(R) CPU E5-2680 v2 @ 2.80GHz running Ubuntu 14.04LTS

mysqldump foo > dump.sql:
real: 1m5.197s

mysql foo < dump.sql:
real: 10m5.720s

real: 0m37.230s

real: 3m17.649s


As we can see, for this combination of database and environment,
mydumper is roughly twice as fast as mysqldump, and myloader is more
than three times faster than importing using mysql as a client.

In our sample database, there are several large logging/debugging
tables, which are significantly larger than other tables. The task
associated to dumping or loading these will determine the running time
of the entire job, no matter how many independent threads we have
available (c.f. Amdahl's law).
With many tables of a similar size, roughly linear speedup (with
respect to the number of cores) can be expected.

For example, this benchmarking article at palominodb.com suggests that on a
2x6 core machine, myloader was able to restore a 31GB database in

So, if you find yourself needing to dump and restore very large
databases, or dump and restore databases quite frequently (think
automated testing and continuous integration), mydumper and myloader
might be the tools you've been looking for.

Obtaining and installing Mydumper

The good news is that Mydumper is available as a package for most
popular Linux distros, convenient for installation on your server.
It is also apparently available using MacPorts, though I have not
tried this.

The bad news is that if you're not a MacPorts users, but you still
you want to install mydumper on MacOS X, you'll have to do a bit
more work. There used to be a convenient homebrew formula for mydumper, but it ran into
issues with the packaging of the homebrew mysql recipe. Nevertheless,
brew makes the installation of mydumper relatively
straightforward. Here's an overview of the process:

  1. Download the mydumper source code.
  2. Use homebrew to install the build tools and pre-requisites.
  3. Download mysql source code (to get the one file missing from the homebrew installation).
  4. Move this file to a location visible to the mydumper build process.
  5. Build mydumper using the supplied build configuration.
  6. Run some tests to make sure it built without obvious errors.
  7. Install the resulting mydumper and myloader binaries locally.

MacOS X build in more detail

This is assuming you have Homebrew set up
and working properly, and that it installs files in the default
location (/usr/local)

  1. Download the mydumper source code (generic linux) from https://launchpad.net/mydumper.

    Note: I had success with version 0.6.1, but 0.6.2 introduces some
    changes that I've not been able to work around yet on MacOS X.

  3. Make sure you have the necessary dependencies installed. These can
    all be obtained using Homebrew as follows: 

    % brew tap homebrew/dupes
    % brew install cmake
    % brew install glib
    % brew install zlib
    % brew install pcre
    % brew install pgconfig
    % brew install mysql

    Note: the brew version of mysql I have is 5.6.21.

  4. Download the source code Generic Linux (Architecture
    Independent), Compressed TAR Archive distribution for MySQL
  5. Unpack to MYSQL_SOURCE_DIR (in this example mysql-5.6.21) % tar -zxf mysql-5.6.21.tar.gz
    % ls
  6. Copy hash.h from the MySQL source to the brew mysql

    % cp MYSQL_SOURCE_DIR/include/hash.h /usr/local/Cellar/mysql/5.xx.yy/include/mysql/hash.h

  7. Go to the MYDUMPER_SOURCE_DIR. Follow the instructions in the README to build mydumper (the build
    should complete without errors). 

    % cd ~/mydumper-0.6.1
    % cmake -DCMAKE_INSTALL_PREFIX=/opt/local \
    -DZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.8/lib/libz.dylib \
    -DGLIB2_LIBRARIES=/usr/local/Cellar/glib/2.42.0/lib/libglib-2.0.0.dylib \
    % make

    Replace the paths to libz and libglib with the actual paths to the versions installed in
    /usr/local/Cellar by homebrew.

    The option MAKE_INSTALL_PREFIX directs make where to
    install the resulting files. The given prefix will install the
    binaries in /opt/local/bin, a reasonable option if
    your homebrew is based in /usr/local.

  8. Try some tests % ./mydumper -V
    mydumper 0.6.1, built against MySQL 5.6.21
    % ./mydumper --database=mysql --user=root \
    --password=mysql_rootpw --outputdir=~/dumpertest
    % ls -d ~/dumpertest

    Verify that the dumpertest directory is created in your home
    directory and contains one SQL file per table present in the mysql

    % mysqladmin -u root -p create myloader_testdb
    Enter password: mysql_rootpw
    % ./myloader --database=myloader_testdb --directory=/tmp/foobar \
    --user=root --password='mysql_rootpw'

    Explore the myloader_testdb database to verify that
    it has the structure and contents you expect.

  9. If you are satisfied with the results of your test, install the binaries. % sudo make install

Coming up next

In my next post, I'll show how to make mydumper and myloader available
via drush to better integrate with Drupal development processes, and
how this can be applied to automated testing and continuous
integration workflows.


I'm grateful to Mark Labrecque for taking time to review the
instructions I've presented here, and offering useful improvements.