Using Mydumper with Drush
In my first post about mydumper
I explained why developers would want to use it (it's fast!), and how to build it for MacOS. Since that post, there's been some positive developments. First, as pointed out by Max Bube in the comments, the latest stable version of mydumper (0.6.2) now builds cleanly on MacOS 10.9 and later. Secondly, there is a pull request in for a Homebrew formula, so I'm hoping that soon installation via brew
will be a snap again for Mac users.
Integration with Drush
In this post I will describe how to extend drush
to be able to dump and restore database using mydumper and myloader.
As a Drupal developer, you're no doubt familiar with using drush to make backups of your site databases.
drush sql-dump > ./mydb-dump.sql
The beauty of drush is that it handles all of the ugly details of the database connections and remote credentials, by looking up a given site's configuration from settings.php
and drush "rc" files.
Our goal is to enable drush to use mydumper rather than the standard mysql utilities in commands such as sql-dump
. We're going to set this up for Drush version 7.0.
We need to address the following differences between the standard MySQL tools and mydumper.
- Mydumper creates a directory containing the backup, while mysqldump generates a single dumpfile.
- Mydumper has no direct provisions for skipping tables in its dump.
- Mydumper requires RELOAD privileges on the DB server.
- Mydumper seems to have no way to generate a schema-only dump.
Additionally, the names and presentations of command line arguments differ between the two tools sets.
Implementation
The latest version of this code is available in my fork on GitHub (on this branch). Please fork this and give it a try. I'm looking forward to hearing about your experiences. I hope to be submitting this as a pull request against the official repo once I've finished tweaking and writing some more tests.
To deal with the issues we've identified we will need to override the dumpCmd()
, dumpFile()
, and dump()
methods in the Sql drivers. We'll also implement new loadCmd()
and load()
methods to support loading backups via myloader.
Let's consider the permissions issue first. In order to create backups, mydumper needs RELOAD privileges on the mysql server. This is a global (not per-table) privilege, unlike most other privileges. So, at the mysql CLI (or via your favourite GUI), create a user mydumper (@localhost) on the mysql server, and give it global RELOAD privileges, as well as the usual Drupal mysql DB privileges. Give it a good password.
Now comes the interesting part. Most developers never have to look beyond the standard 'default' database target in their sites' settings.php
files. This is a perfect application for creating an additional database target for your database. We're going to be accessing the same database, but using a different driver and a different user. In the following example, I've added the target 'backup'
. This is identical to the default
target, except for the database user (we use the mydumper
user we just created) and the database driver, which we'll specify as 'mydumper'
.
<?php $databases['default']['default'] = array( 'driver' => 'mysql', 'database' => 'dumpertest', 'username' => 'dbuser', 'password' => 'abc123', 'host' => 'localhost', 'prefix' => '', 'collation' => 'utf8_general_ci', 'pdo' => array( PDO::MYSQL_ATTR_LOCAL_INFILE=>1, ), ); $databases['default']['backup'] = array( 'driver' => 'mydumper', 'database' => 'dumpertest', 'username' => 'mydumper', 'password' => '123xyz', 'host' => 'localhost', 'prefix' => '', 'collation' => 'utf8_general_ci', 'pdo' => array( PDO::MYSQL_ATTR_LOCAL_INFILE=>1, ), );
Where does this mydumper
driver live? Well, we have to create it. While experimenting with this, I've used the drush
installed by homebrew, and it provides version 7.0.0
by default. It gets installed in /usr/local/Cellar/drush/7.0.0
for me, and within this directory, libexec/lib/Drush/Sql/
contains the SQL drivers.
Drush 7 has a beautiful, clean, object oriented design. Sqlmysql.php
provides the class Sqlmysql
which extends the base class SQL
. Drush finds the appropriate classes from the drivers defined in the database targets in settings.php
. So providing a class Sqlmydumper
extending Sqlmysql
in a fileSqlmydumper.php
in the Sql drivers directory does what we need.
Our jumping off point is the drush_sql_dump
command, which implements sql-dump
. This can be found in libexec/commands/sql/sql.
.
/** * Command callback. Outputs the entire Drupal database in SQL format using mysqldump or equivalent. */ function drush_sql_dump() { drush_sql_bootstrap_further(); $sql = drush_sql_get_class(); return $sql->dump(drush_get_option('result-file', FALSE)); }
We see that we need to implement dump()
. Tweaking the definition from SqlBase.php
, we arrive at the following. Note that the dumpFile()
method takes care of generating a default dump directory if no filepath is specified. This takes care of the first issue in our list above.
/* * Dump the database using mydumper and return the path to the resulting dump directory. * * @param string|bool @file * The path where the dump directory should be created. If TRUE, generate a path * based on usual backup directory and current date. */ public function dump($output_dir = '') { $table_selection = $this->get_expanded_table_selection(); $output_dir = drush_escapeshellarg($this->dumpFile($output_dir)); $cmd = $this->dumpCmd($table_selection, $output_dir); // Avoid the php memory of the $output array in drush_shell_exec(). if (!$return = drush_op_system($cmd)) { if ($output_dir) { drush_log(dt('Database dump saved to !path', array('!path' => $output_dir)), 'success'); drush_backend_set_result($output_dir); } } else { return drush_set_error('DRUSH_SQL_DUMP_FAIL', 'Database dump failed'); } }
Most of the work goes on in dumpCmd()
. We need to optionally take care of 'structure-only' tables. These can be specified in a drushrc.php
file, as in this example:
/** * List of tables whose *data* is skipped by the 'sql-dump' and 'sql-sync' * commands when the "--structure-tables-key=common" option is provided. * You may add specific tables to the existing array or add a new element. */ $options['structure-tables']['common'] = array( 'cache', 'cache_*', 'history', 'search_*', 'sessions', 'watchdog', );
We use mysqldump
to generate a schema-only dump of these 'structure' tables. Also, we explicitly generate the list of tables to dump, excluding structure and 'skipped' tables.
public function dumpCmd($table_selection, $output_dir = '') { $parens = FALSE; $skip_tables = $table_selection['skip']; $structure_tables = $table_selection['structure']; $tables = $table_selection['tables']; $skip_tables = array_merge($structure_tables, $skip_tables); $data_only = drush_get_option('data-only'); $exec = 'mydumper '; // Start building up arguments for the command // Silent operation. $extra = " --verbose 0 --build-empty-files "; $output_dir = drush_escapeshellarg($output_dir); if (!empty($output_dir)) { $extra .=" --outputdir $output_dir "; } // Mydumper can't read credentials from a file, yet $exec .= $this->creds(FALSE); if (drush_get_option('gzip')) { $extra .= ' --compress'; } if (isset($data_only)) { $extra .= ' --no-schemas'; } $exec .= $extra; if (!empty($tables)) { $exec .= ' --tables-list ' . implode(',', $tables); } else { $parens = TRUE; $tables = array_diff(parent::listTables(), $skip_tables); $exec .= ' --tables-list '. implode(',', $tables); // Output_dir is not empty in default case where this is called from dump(). if (empty($output_dir)) { $output_dir = '.'; } // Run mysqldump and append output if we need some structure only tables. if (!empty($structure_tables)) { $only_db_name = str_replace('--database=', ' ', $this->creds()); $extra = ' --no-autocommit --single-transaction --opt -Q'; // NB: myloader is fussy about the files in $output_dir. // Hence schema_sql is ignored, but schema.sql causes a segfault. $exec .= " && mysqldump " . $only_db_name . " --no-data $extra " . implode(' ', $structure_tables) . " > $output_dir/schema_sql"; } } return $parens ? "($exec)" : $exec; }
Restoring (Loading) Databases
So far our integration has looked at the sql-dump
command. What about sql-load
? It turns out that there is no such drush command. I think this is partly because sql-cli
does the job pretty well for mysql dump files. However, in the case of restoring with myloader
, we need to be able to deal with the dump directories created by mydumper
. So a symmetrical sql-load
command makes sense. We've implemented this in the file sqlload.drush.inc
. This relies on load()
and associated methods from the SQLmydumper
driver.
The main thing to take care of in loading a drush-generated dump directory is the optional 'structure-only' tables. We need to reverse the process used to create the schema_sql
file and create the required empty tables from this schema file.
Next up: I still need to improve the help and examples, add tests, decide how to pass additional parameters in to mydumper
and myloader
. It would be great to get this to work nicely with sql-sync
.
Conclusions
Mydumper and myloader are great time-savers, but that's only the start of the fun. Having a one-file-per-table dump format can be useful for doing "database diffs", for versioning database-stored configuration, and other tasks. I hope that having these tools more conveniently available via drush will speed up and simplify your workflows and inspire some interesting experimentation. Happy Drupal Hacking!