Using mysqldump with WHERE clause but WHERE clause only applying to specific table?

by 0xC0000022L   Last Updated October 19, 2019 22:06 PM

This question is related to a question I posted on another StackExchange site.

Question: How to apply the --where argument only to a specific table when mysqldump-ing a whole database?


So I have a database with the DB contents of a WordPress blog. The table names look like this:

MariaDB [blog]> show tables;
+-------------------------+
| Tables_in_blog          |
+-------------------------+
| wp_commentmeta          |
| wp_comments             |
| wp_links                |
| wp_options              |
| wp_postmeta             |
| wp_posts                |
| wp_term_relationships   |
| wp_term_taxonomy        |
| wp_termmeta             |
| wp_terms                |
| wp_usermeta             |
| wp_users                |
+-------------------------+
12 rows in set (0.00 sec)

There is one table here (wp_comments) whose contents I'd like to filter during backups (without dropping said records from the actual table; i.e. dropping them, then taking a backup isn't an option!).

MariaDB [blog]> describe wp_comments;
+----------------------+---------------------+------+-----+---------------------+----------------+
| Field                | Type                | Null | Key | Default             | Extra          |
+----------------------+---------------------+------+-----+---------------------+----------------+
| comment_ID           | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| comment_post_ID      | bigint(20) unsigned | NO   | MUL | 0                   |                |
| comment_author       | tinytext            | NO   |     | NULL                |                |
| comment_author_email | varchar(100)        | NO   | MUL |                     |                |
| comment_author_url   | varchar(200)        | NO   |     |                     |                |
| comment_author_IP    | varchar(100)        | NO   |     |                     |                |
| comment_date         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| comment_date_gmt     | datetime            | NO   | MUL | 0000-00-00 00:00:00 |                |
| comment_content      | text                | NO   |     | NULL                |                |
| comment_karma        | int(11)             | NO   |     | 0                   |                |
| comment_approved     | varchar(20)         | NO   | MUL | 1                   |                |
| comment_agent        | varchar(255)        | NO   |     |                     |                |
| comment_type         | varchar(20)         | NO   |     |                     |                |
| comment_parent       | bigint(20) unsigned | NO   | MUL | 0                   |                |
| user_id              | bigint(20) unsigned | NO   |     | 0                   |                |
+----------------------+---------------------+------+-----+---------------------+----------------+
15 rows in set (0.00 sec)

The filtering condition should be where comment_approved not in ('spam', 'trash'). For comparison of the number of comments:

MariaDB [blog]> select count(*) from wp_comments where comment_approved not in ('spam', 'trash');
+----------+
| count(*) |
+----------+
|     1392 |
+----------+
1 row in set (0.01 sec)

MariaDB [blog]> select count(*) from wp_comments;
+----------+
| count(*) |
+----------+
|    12477 |
+----------+
1 row in set (0.00 sec)

Almost a 1:10 ratio of ham:spam for the number of comments. Anyway, the gist is that I'd like to thin out my backups a bit and considering that - at this moment - the SQL dump gets blown up by (wp_comments table alone) roughly 42 MiB by the spam/trash comments, I'd like to exclude them from the hourly backups altogether:

42.957 MiB      ./spam.sql
928.000 KiB     ./nospam.sql

Now I understand that I could write myself some ugly shell script in order to achieve what I want by basically:

  1. enumerating the tables of the DB to be dumped
  2. dump the tables one by one, all the while
    • applying where clauses specific to respective tables

However, if I am not careful (e.g. if I'd hardcode the list of tables) this could turn out to be a rather brittle (and lossy) backup solution down the road. For example when WordPress introduces a new table and I only find out months later ...

I guess what I'd need would be something along the lines of a command line option that allows to give the predicate of a WHERE clause in a table-specific fashion, e.g. in a text file with tablename: predicate (instead of globally --where=predicate) ... or a more forgiving --where command line option which allows me to say comment_approved not in ('spam', 'trash') and does not barf if the columns in the predicate don't exist. However, I can see how this would be impossible for more complex WHERE clauses (where t1.colx = t2.colz). The last option that comes to mind would be something like --where:tablename=predicate which would complicate command line parsing and possibly introduce ambiguity as well.

Does anyone have a concise (i.e. not involving too much scripting on my part) solution for the problem of backing up one whole database while filtering out rows for a specific table based on a WHERE clause?

NB: I am using (on a Ubuntu 16.04 machine):

$ mysql --version
mysql  Ver 15.1 Distrib 10.0.38-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
Tags : mysqldump


Answers 1


Tablenames that are added in (future) version of WordPress should not be a problem when you are scripting this solution

If a take a look at my 'test' database:

SELECT 
  TABLE_NAME, 
  CASE TABLE_NAME WHEN 'big' THEN "i<10" ELSE "1" END AS filter 
FROM information_schema.tables 
WHERE TABLE_SCHEMA='test';

I do see something like this:

+---------------------------+--------+
| TABLE_NAME                | filter |
+---------------------------+--------+
| a                         | 1      |
| abd                       | 1      |
| below                     | 1      |
| big                       | i<10   |
| categories                | 1      |

These results could also be stored in a table (i.e. MY_BACKUP_FILTERS), and a real script-guru could easily create a script around this output.

something like:

#!/bin/bash

DATABASE=test
HOST=******
USERNAME=luuk
PASSWORD=*******
FILTERSQL='SELECT
  TABLE_NAME,
  CASE TABLE_NAME
  WHEN "big" THEN "i<10"
  ELSE "1" END AS filter
FROM information_schema.tables
WHERE TABLE_SCHEMA="test";'
TMP=/tmp/tmp.$$

mysql -h $HOST -u $USERNAME -p$PASSWORD -B -N -e "$FILTERSQL" >$TMP

cat $TMP | while read line
do
        table=`echo -e "$line" | cut -f 1`
        filter=`echo -e "$line" | cut -f 2`
        #echo TABLE:$table, FILTER:$filter
        mysqldump -h $HOST $DATABASE $table -u $USERNAME -p$PASSWORD -w "$filter"
done

This script will dump all tables except the table 'big', for which only the records are dump WHERE i<10

Adding/changing 'WHEN ...' to the statement a line #12 in the script will add more filters for other tables.

Luuk
Luuk
October 20, 2019 09:09 AM

Related Questions


Updated April 04, 2015 03:02 AM

Updated April 15, 2015 20:02 PM

Updated April 17, 2015 21:02 PM

Updated February 15, 2017 14:02 PM