Recursively fetch dependent rows with mysqldump

July 5th, 2021 by Philip Iezzi 5 min read
cover image

How to fetch a row from a MySQL database recursively, going through all foreign key (FK) constraints and fetch all dependent rows as well? That's the question that bothered me during the last 20+ years as MySQL administrator. Isn't there a standard tool like some extended mysqldump that comes with that power? Short answer: No, there is no such tool. I gave up searching. It's just too complex to write a general-purpose tool that works for any kind of database schema.

Bill Karwin brings it to the point:

It's sometimes easier to write a custom script just for your specific schema, than for someone to write a general-purpose tool that works for everyone's schema.

Now you ask why we actually need this? Simple use case: You have accidentally deleted a customers record which resulted in the deletion of thousands related rows spread over 20 different tables. Restoring the full database from backup may not be an option as other customers have made tons of changing operations (DML like UPDATE/INSERT/DELETE) in the meantime, or restoring the full database would just result in a long downtime which is not acceptable.

So let's assume we can get a full database backup dump restored on some other MySQL server (maybe locally), where we can grab the relevant data from. We pick our favorite scripting language for such tasks, which is Python (really, if you don't use Python yet for such helper scripts, learn it! It's mostly a much better choice than using plain old Bash or a PHP script, Python is just simple and powerful and you can mostly do without any external libraries). We then end up with something like this:

IMPORTANT: Below code is just a snippet, an example of how you could accomplish this for your own db schema. It is not an all-purpose script that can be used out of the box. Make sure, you define your own queries for every related table.

I also assume you have correctly configured your db credentials in .my.cnf or .mylogin.cnf (using mysql_config_editor), so you don't need to provide username/password to mysqldump.

mysql-recursive-dump.py
#!/usr/bin/env python3
#
# Copyright (c) 2021 Philip Iezzi, Onlime GmbH - https://www.onlime.ch
#

import argparse
import subprocess

header = """SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT;
SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS;
SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION;
SET NAMES utf8mb4;
SET @OLD_TIME_ZONE=@@TIME_ZONE;
SET TIME_ZONE='+00:00';
SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO';
SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0;
"""
footer = """SET TIME_ZONE=@OLD_TIME_ZONE
SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;
SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT;
SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS;
SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION;
SET SQL_NOTES=@OLD_SQL_NOTES;
"""

def prepend_line(filename: str, line: str):
    with open(filename, 'r') as f:
        data = f.read()
    with open(filename, 'w') as f:
        f.write(line + data)

def append_line(filename: str, line: str):
    with open(filename, 'a') as fd:
        fd.write(line)

class RecursiveDumper:

    def __init__(self, database: str, model: str, id: int, dryrun: bool = False):
        self.dumpfile = f'dump-{model}-{id}.sql'
        self.database = database
        self.model = model
        self.id = id
        self.dryrun = dryrun

    def dump(self):
        self.dump_customer() if self.model == 'customer' else self.dump_webabo()

    def dump_customer(self):
        custid = self.id
        customers_query = f'customer_id = {custid}'
        webabos_query = f'webabo_id IN (SELECT id FROM webabos WHERE {customers_query})'
        # ... (stripped down)
        queries = {
            'customers': f'id = {custid}',
            'webabos': f'customer_id = {custid}',
            'subdomains': webabos_query,
            'mailaccounts': webabos_query,
            # ...
        }
        self.run_dumps(queries)

    def dump_webabo(self):
        webaboid = self.id
        webabos_query = f'webabo_id = {webaboid}'
        subdomains_query = f'subdomain_id IN (SELECT id FROM subdomains WHERE {webabos_query})'
        addondomains_query = f'addondomain_id IN (SELECT id FROM addondomains WHERE {webabos_query} OR {subdomains_query})'
        dnszones_query = f'{webabos_query} OR {addondomains_query}'
        # ... (stripped down)
        queries = {
            'webabos': f'id = {webaboid}',
            'logins': webabos_query,
            'subdomains': webabos_query,
            'mailaccounts': webabos_query,
            # ...
        }
        self.run_dumps(queries)
    
    def run_dumps(self, queries: object):
        with open(self.dumpfile, 'w') as f:
            for table, where in queries.items():
                cmd = f"mysqldump --skip-extended-insert --skip-triggers --replace --compact --no-create-info --lock-all-tables --where '{where}' {self.database} {table}"
                print(cmd) if self.dryrun else subprocess.run(cmd, stdout=f, shell=True)
        prepend_line(self.dumpfile, header)
        append_line(self.dumpfile, footer)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('model', choices=['customer', 'webabo'], help='Object to dump')
    parser.add_argument('id', type=int, help='Customer or webabo ID')
    parser.add_argument('--database', type=str, default='mydb', help='Airpane database')
    parser.add_argument('--dryrun', '--dry-run', action='store_true', help='Dry-run without dumping any data')
    args = parser.parse_args()

    dumper = RecursiveDumper(args.database, args.model, args.id, args.dryrun)
    dumper.dump()

This script generates a dump of all related data using selects and subselects in mysqldump --where. All data is merged into a single dump which contains header/footer statements that ensure the data can be loaded correctly - most important turning off FK constraints during the restore process.

The heart of that script are simple mysqldump statements that look like this:

$ mysqldump --skip-extended-insert --skip-triggers --replace --compact --no-create-info --lock-all-tables --where '<where_condition>' <db_name> <tbl_name>"

Explanation:

  • --skip-extended-inserts (optional): This could be omitted, but we prefer having one INSERT statement per line for easier review of the full dump.
  • --skip-triggers: Don't dump any trigger information as we simply want to restore data.
  • --replace (optional): Write REPLACE statements rather than INSERT statements - just in case we're dumping a bit too much that may have not been deleted by our accidential fatal deletion action.
  • --compact: Only dump INSERT lines, no extras, as we merge all dumps together into one file and care about header/footer statements by ourselves.
  • --no-create-info: Do not write CREATE TABLE statements that create each dumped table.
  • --lock-all-tables: This is needed as we want to use the power of subselects in --where conditions.
  • --where: Dump only rows selected by the given WHERE condition.

The above script could be used like this:

$ ./backup-data-dump.py --help
usage: backup-data-dump.py [-h] [--database DATABASE] [--dryrun] {customer,webabo} id

$ ./backup-data-dump.py customer 1234 --dryrun
# only prints out mysqldump commands without dumping any data

$ ./backup-data-dump.py customer 1234
# produces: dump-customer-1234.sql

$ ./backup-data-dump.py webabo 999
# produces: dump-webabo-999.sql

After reviewing the dump, you could upload it to your production server and restore data from it:

$ mysql db_name < dump-customer-1234.sql

In case you have triggers set up on any affected table you're restoring data to, consider removing them before the restore and re-creating them right afterwards.