The Mysqldumpsplitter can extract database, table, all databases, all tables or tables matching on regular expression from a mysqldump. It can produce the compressed output of sqls in specified directories.
The tool has undergone a complete rewrite, and there are many more functionalities added. Please follow GitHub Repository for future changes or further improvements.
Download the MySQL Dump Splitter utility from GitHub Repository.
I have prepared a separate blog post to getting started with the mysqldumpsplitter. It’s a sort of guide for doing required and possible operations with the mysql-dump-splitter script. Read the mysqldumpsplitter recipe blog post here.
Current Mysqldumpsplitter version is 6.1.
Watch the video of mysqldumpsplitter in action (Ver 4.0):
– I will provide more videos with latest version.
Suggestions are welcome. I’ll update it whenever time permits.
Download mysqldumpsplitter script:
MySQLDump Splitter
The shell script is as follows, prefer to download:
#!/bin/sh # Current Version: 6.1 # Extracts database, table, all databases, all tables or tables matching on regular expression from the mysqldump. # Includes output compression options. # By: Kedar Vaijanapurkar # Website: http://kedar.nitty-witty.com # Original Blog Post: http://kedar.nitty-witty.com/mydumpsplitter-extract-tables-from-mysql-dump-shell-script # Follow GIT: https://github.com/kedarvj/mysqldumpsplitter/ ## Version Info: # Ver. 1.0: Feb 11, 2010 # ... Initial version extract table(s) based on name, regexp or all of them from database-dump. # Ver. 2.0: Feb, 2015 # ... Added database extract and compression # Ver. 3.0: March, 2015 # ... Complete rewrite. # ... Extract all databases. # Ver. 4.0: March, 2015 # ... More validations and bug fixes. # ... Support for config file. # ... Detecting source dump types (compressed/sql). # ... Support for compressed backup and bz2 format. # Credit: Andrzej Wroblewski (andrzej.wroblewski@packetstorm.pl) for his inputs on compressed backup & bz2 support. # Ver. 5.0: Apr, 2015 # ... Describing the dump, listing all databases and tables # ... Extracting one or more tables from single database # Ver. 6.1: Oct, 2015 # ... Bug fixing in REGEXP extraction functionlity # ... Bug fixing in describe functionality # ... Preserving time_zone & charset env settings in extracted sqls. # Credit: @PeterTheDBA helped understanding the possible issues with environment variable settings included in first 17 lines of mysqldump. ## # ToDo: Work with straming input ## Formating Colour # Text color variables txtund=$(tput sgr 0 1) # Underline txtbld=$(tput bold) # Bold txtred=$(tput setaf 1) # Red txtgrn=$(tput setaf 2) # Green txtylw=$(tput setaf 3) # Yellow txtblu=$(tput setaf 4) # Blue txtpur=$(tput setaf 5) # Purple txtcyn=$(tput setaf 6) # Cyan txtwht=$(tput setaf 7) # White txtrst=$(tput sgr0) # Text reset ## Variable Declaration SOURCE=''; MATCH_STR=''; EXTRACT=''; OUTPUT_DIR='out'; EXT="sql.gz"; TABLE_NAME=''; DB_NAME=''; COMPRESSION='gzip'; DECOMPRESSION='cat'; VERSION=6.1 ## Usage Description usage() { echo "\n\t\t\t\t\t\t\t${txtgrn}${txtund}************ Usage ************ \n"${txtrst}; echo "${txtgrn}sh mysqldumpsplitter.sh --source filename --extract [DB|TABLE|DBTABLES|ALLDBS|ALLTABLES|REGEXP] --match_str string --compression [gzip|pigz|bzip2|none] --decompression [gzip|pigz|bzip2|none] --output_dir [path to output dir] [--config /path/to/config] ${txtrst}" echo "${txtund} ${txtrst}" echo "OPTIONS:" echo "${txtund} ${txtrst}" echo " --source: mysqldump filename to process. It could be a compressed or regular file." echo " --desc: This option will list out all databases and tables." echo " --extract: Specify what to extract. Possible values DB, TABLE, ALLDBS, ALLTABLES, REGEXP" echo " --match_str: Specify match string for extract command option." echo " --compression: gzip/pigz/bzip2/none (default: gzip). Extracted file will be of this compression." echo " --decompression: gzip/pigz/bzip2/none (default: gzip). This will be used against input file." echo " --output_dir: path to output dir. (default: ./out/)" echo " --config: path to config file. You may use --config option to specify the config file that includes following variables." echo "\t\tSOURCE= \t\tEXTRACT= \t\tCOMPRESSION= \t\tDECOMPRESSION= \t\tOUTPUT_DIR= \t\tMATCH_STR= " echo "${txtund} ${txtrst}" echo "Ver. $VERSION" exit 0; } ## Parsing and processing input parse_result() { ## Validate SOURCE is provided and exists if [ -z $SOURCE ]; then echo "${txtred}ERROR: Source file not specified or does not exist. (Entered: $SOURCE)${txtrst}" elif [ ! -f $SOURCE ]; then echo "${txtred}ERROR: Source file does not exist. (Entered: $SOURCE)${txtrst}" exit 2; fi ## Parse Extract Operation case $EXTRACT in ALLDBS|ALLTABLES|DESCRIBE ) if [ "$MATCH_STR" != '' ]; then echo "${txtylw}Ignoring option --match_string.${txtrst}" fi; ;; DB|TABLE|REGEXP|DBTABLE) if [ "$MATCH_STR" = '' ]; then echo "${txtred}ERROR: Expecting input for option --match_string.${txtrst}" exit 1; fi; ;; * ) echo "${txtred}ERROR: Please specify correct option for --extract.${txtrst}" usage; esac; ## Parse compression if [ "$COMPRESSION" = 'none' ]; then COMPRESSION='cat'; EXT="sql" echo "${txtgrn}Setting no compression.${txtrst}"; elif [ "$COMPRESSION" = 'pigz' ]; then which $COMPRESSION &>/dev/null if [ $? -ne 0 ]; then echo "${txtred}WARNING:$COMPRESSION appears having issues, using default gzip.${txtrst}"; COMPRESSION="gzip"; fi; echo "${txtgrn}Setting compression as $COMPRESSION.${txtrst}"; EXT="sql.gz" elif [ "$COMPRESSION" = 'bzip2' ]; then which $COMPRESSION &>/dev/null if [ $? -ne 0 ]; then echo "${txtred}WARNING:$COMPRESSION appears having issues, using default gzip.${txtrst}"; COMPRESSION="gzip"; fi; echo "${txtgrn}Setting compression as $COMPRESSION.${txtrst}"; EXT="sql.bz2"; else COMPRESSION='gzip'; echo "${txtgrn}Setting compression $COMPRESSION (default).${txtrst}"; EXT="sql.gz" fi; ## Parse decompression if [ "$DECOMPRESSION" = 'none' ]; then DECOMPRESSION='cat'; echo "${txtgrn}Setting no decompression.${txtrst}"; elif [ "$DECOMPRESSION" = 'pigz' ]; then which $DECOMPRESSION &>/dev/null if [ $? -ne 0 ]; then echo "${txtred}WARNING:$DECOMPRESSION appears having issues, using default gzip.${txtrst}"; DECOMPRESSION="gzip -d -c"; else DECOMPRESSION="pigz -d -c"; fi; echo "${txtgrn}Setting decompression as $DECOMPRESSION.${txtrst}"; elif [ "$DECOMPRESSION" = 'bzip2' ]; then which $DECOMPRESSION &>/dev/null if [ $? -ne 0 ]; then echo "${txtred}WARNING:$DECOMPRESSION appears having issues, using default gzip.${txtrst}"; DECOMPRESSION="gzip -d -c"; else DECOMPRESSION="bzip2 -d -c"; fi; echo "${txtgrn}Setting decompression as $DECOMPRESSION.${txtrst}"; else DECOMPRESSION="gzip -d -c"; echo "${txtgrn}Setting decompression $DECOMPRESSION (default).${txtrst}"; fi; ## Verify file type: filecommand=`file $SOURCE` echo $filecommand | grep "compressed" 1>/dev/null if [ `echo $?` -eq 0 ] then echo "${txtylw}File $SOURCE is a compressed dump.${txtrst}" if [ "$DECOMPRESSION" = 'cat' ]; then echo "${txtred} The input file $SOURCE appears to be a compressed dump. \n While the decompression is set to none.\n Please specify ${txtund}--decompression [gzip|bzip2|pigz]${txtrst}${txtred} argument.${txtrst}"; exit 1; fi; else echo "${txtylw}File $SOURCE is a regular dump.${txtrst}" if [ "$DECOMPRESSION" != 'cat' ]; then echo "${txtred} Default decompression method for source is gzip. \n The input file $SOURCE does not appear a compressed dump. \n ${txtylw}We will try using no decompression. Please consider specifying ${txtund}--decompression none${txtrst}${txtylw} argument.${txtrst}"; DECOMPRESSION='cat'; ## Auto correct decompression to none for regular files. fi; fi; # Output directory if [ "$OUTPUT_DIR" = "" ]; then OUTPUT_DIR="out"; fi; mkdir -p $OUTPUT_DIR if [ $? -eq 0 ]; then echo "${txtgrn}Setting output directory: $OUTPUT_DIR.${txtrst}"; else echo "${txtred}ERROR:Issue while checking output directory: $OUTPUT_DIR.${txtrst}"; exit 2; fi; echo "${txtylw}Processing: Extract $EXTRACT $MATCH_STR from $SOURCE with compression option as $COMPRESSION and output location as $OUTPUT_DIR${txtrst}"; } # Include first 17 lines of full mysqldump - preserve time_zone/charset/environment variables. include_dump_info() { if [ $1 = "" ]; then echo "${txtred}Couldn't find out-put file while preserving time_zone/charset settings!${txtrst}" exit; fi; OUTPUT_FILE=$1 echo "Including environment settings from mysqldump." $DECOMPRESSION $SOURCE | head -17 | $COMPRESSION > $OUTPUT_DIR/$OUTPUT_FILE.$EXT echo "" | $COMPRESSION >> $OUTPUT_DIR/$MATCH_STR.$EXT echo "/* -- Splitted with mysqldumpsplitter (http://goo.gl/WIWj6d) -- */" | $COMPRESSION >> $OUTPUT_DIR/$OUTPUT_FILE.$EXT echo "" | $COMPRESSION >> $OUTPUT_DIR/$MATCH_STR.$EXT } ## Actual dump splitting dump_splitter() { case $EXTRACT in DB) # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $MATCH_STR echo "Extracting Database: $MATCH_STR"; $DECOMPRESSION $SOURCE | sed -n "/^-- Current Database: \`$MATCH_STR\`/,/^-- Current Database: /p" | $COMPRESSION >> $OUTPUT_DIR/$MATCH_STR.$EXT echo "${txtbld} Database $MATCH_STR extracted from $SOURCE at $OUTPUT_DIR${txtrst}" ;; TABLE) # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $MATCH_STR #Loop for each tablename found in provided dumpfile echo "Extracting $MATCH_STR." #Extract table specific dump to tablename.sql $DECOMPRESSION $SOURCE | sed -n "/^-- Table structure for table \`$MATCH_STR\`/,/^-- Table structure for table/p" | $COMPRESSION >> $OUTPUT_DIR/$MATCH_STR.$EXT echo "${txtbld} Table $MATCH_STR extracted from $SOURCE at $OUTPUT_DIR${txtrst}" ;; ALLDBS) for dbname in $($DECOMPRESSION $SOURCE | grep -E "^-- Current Database: " | awk -F"\`" {'print $2'}) do # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $dbname echo "Extracting Database $dbname..." #Extract database specific dump to database.sql.gz $DECOMPRESSION $SOURCE | sed -n "/^-- Current Database: \`$dbname\`/,/^-- Current Database: /p" | $COMPRESSION >> $OUTPUT_DIR/$dbname.$EXT DB_COUNT=$((DB_COUNT+1)) echo "${txtbld}Database $dbname extracted from $SOURCE at $OUTPUT_DIR/$dbname.$EXT${txtrst}" done; echo "${txtbld}Total $DB_COUNT databases extracted.${txtrst}" ;; ALLTABLES) for tablename in $($DECOMPRESSION $SOURCE | grep "Table structure for table " | awk -F"\`" {'print $2'}) do # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $tablename #Extract table specific dump to tablename.sql $DECOMPRESSION $SOURCE | sed -n "/^-- Table structure for table \`$tablename\`/,/^-- Table structure for table/p" | $COMPRESSION >> $OUTPUT_DIR/$tablename.$EXT TABLE_COUNT=$((TABLE_COUNT+1)) echo "${txtbld}Table $tablename extracted from $DUMP_FILE at $OUTPUT_DIR/$tablename.$EXT${txtrst}" done; echo "${txtbld}Total $TABLE_COUNT tables extracted.${txtrst}" ;; REGEXP) TABLE_COUNT=0; for tablename in $($DECOMPRESSION $SOURCE | grep -E "Table structure for table \`$MATCH_STR" | awk -F"\`" {'print $2'}) do # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $tablename echo "Extracting $tablename..." #Extract table specific dump to tablename.sql $DECOMPRESSION $SOURCE | sed -n "/^-- Table structure for table \`$tablename\`/,/^-- Table structure for table/p" | $COMPRESSION >> $OUTPUT_DIR/$tablename.$EXT echo "${txtbld}Table $tablename extracted from $DUMP_FILE at $OUTPUT_DIR/$tablename.$EXT${txtrst}" TABLE_COUNT=$((TABLE_COUNT+1)) done; echo "${txtbld}Total $TABLE_COUNT tables extracted.${txtrst}" ;; DBTABLE) MATCH_DB=`echo $MATCH_STR | awk -F "." {'print $1'}` MATCH_TBLS=`echo $MATCH_STR | awk -F "." {'print $2'}` if [ "$MATCH_TBLS" = "*" ]; then MATCH_TBLS=''; fi; TABLE_COUNT=0; for tablename in $( $DECOMPRESSION $SOURCE | sed -n "/^-- Current Database: \`$MATCH_DB\`/,/^-- Current Database: /p" | grep -E "^-- Table structure for table \`$MATCH_TBLS" | awk -F '\`' {'print $2'} ) do echo "Extracting $tablename..." #Extract table specific dump to tablename.sql # Include first 17 lines of standard mysqldump to preserve time_zone and charset. include_dump_info $tablename $DECOMPRESSION $SOURCE | sed -n "/^-- Current Database: \`$MATCH_DB\`/,/^-- Current Database: /p" | sed -n "/^-- Table structure for table \`$tablename\`/,/^-- Table structure for table/p" | $COMPRESSION >> $OUTPUT_DIR/$tablename.$EXT echo "${txtbld}Table $tablename extracted from $DUMP_FILE at $OUTPUT_DIR/$tablename.$EXT${txtrst}" TABLE_COUNT=$((TABLE_COUNT+1)) done; echo "${txtbld}Total $TABLE_COUNT tables extracted from $MATCH_DB.${txtrst}" ;; *) echo "Wrong option, exiting."; usage; exit 1;; esac } missing_arg() { echo "${txtred}ERROR:Missing argument $1.${txtrst}" exit 1; } if [ "$#" -eq 0 ]; then usage; exit 1; fi # Accepts Parameters while [ "$1" != "" ]; do case $1 in --source|-S ) shift if [ -z $1 ]; then missing_arg --source fi; SOURCE=$1 ;; --extract|-E ) shift if [ -z $1 ]; then missing_arg --extract fi; EXTRACT=$1 ;; --compression|-C ) shift if [ -z $1 ]; then missing_arg --compression fi; COMPRESSION=$1 ;; --decompression|-D) shift if [ -z $1 ]; then missing_arg --decompression fi; DECOMPRESSION=$1 ;; --output_dir|-O ) shift if [ -z $1 ]; then missing_arg --output_dir fi; OUTPUT_DIR=$1 ;; --match_str|-M ) shift if [ -z $1 ]; then missing_arg --match_str fi; MATCH_STR=$1 ;; --desc ) EXTRACT="DESCRIBE" parse_result echo "-------------------------------"; echo "Database\t\tTables"; echo "-------------------------------"; $DECOMPRESSION $SOURCE | grep -E "(^-- Current Database:|^-- Table structure for table)" | sed 's/-- Current Database: /-------------------------------\n/' | sed 's/-- Table structure for table /\t\t/'| sed 's/`//g' ; echo "-------------------------------"; exit 0; ;; --config ) shift; if [ -z $1 ]; then missing_arg --config fi; if [ ! -f $1 ]; then echo "${txtred}ERROR: Config file $1 does not exist.${txtrst}" exit 2; fi; . ./$1 ;; -h ) usage exit ;; * ) echo ""; usage exit 1 esac shift done parse_result dump_splitter exit 0;
MySQL Dump Splitter Version Info:
Version-1: Initial script (Feb-11, 2010)
Version-2: (Feb,2015)
– The script is updated to generate compressed output of sqls.
– Script can be used to extract a database.
Version-3: (March-2, 2015)
– Complete rewrite of the script
– Script can extract all databases.
Version-4: (March-15, 2015)
– More validations and bug fixes.
– Detecting source dump types (compressed/sql).
– Support for compressed backup and bz2 format.
– Support for config file.
Version-5: (April-25, 2015)
– Describing the dump, listing all databases and tables
– Extracting one or more tables from single database
Version-6.1: Oct, 2015
– Bug fixing in REGEXP extraction functionality
– Bug fixing in describe functionality
– Preserving time_zone & charset env settings in extracted sqls.
Old Version (very first) of mysqldumpsplitter is available for download.
58 comments
Hey, looks like this script is just like what I need. But, it doesn’t work for me.
Could you please help? I get this when run without any arguments. The filname is “sp.sh”. The output is:
: command not found
: command not found
sp.sh: line 23: syntax error near unexpected token `elif’
‘p.sh: line 23: `elif [ $# = 1 ]; then
Please help.
omg, thanx man!!
nice tips, thanks for share.
@guroot: Welcome dude!
@will pink: So where can we have it anyways ? Glad to know it helped as it meant to be 🙂
I definitely owe you a beer, started scripting one out then did a quick google and found your scripts many thanks!
You saved my life ! Thanks dude !!!
1. Script Updated.
2. Video “How to use” attached!!
p.s.: It’s my first screen cast – I’ll improve soon.
Hi Robert,
I’m glad to see it’s working as expected.
Now about your regexp, for sure if you want to extract “tasks” table you shouldn’t have used REGEXP mode… you should have directly specified the table name as “tasks_XYZ” will obviously a match to the regexp!
I hope I got what you’ve said?!!
Meanwhile, for extracting Databases from a fulldump following will help:
#Shell Script To Extract Database from Full Dump
for dbz in $(grep “\-\- Current Database” $1 | sed ‘s/`//g’ | awk -F “: ” {‘print $2’}|sort|uniq)
do
echo $dbz;
sed -n “/^– Current Database: \`$dbz\`/,/^– Current Database: \`/p” $1 > $dbz.db.sql
done;
For extracting databases from a mysql dump the following php script is much quicker. The shell script parses the file for table names, then passes the whole file to sed once per table. This takes quite a while for a large dump (I’m currently looking at a 30GB dump). The php script gets the tables, splitting the file in one pass. It uses little memory as it only holds the currently examined line in memory. The first argument to the script is the filename of the full dump. Script is quick and dirty, so no error checking or option to just get one db, but both could be added trivially. Hope someone finds this useful.
<?php
$db = null;
$outfp = null;
$infp = fopen($argv[1], 'r');
while($line = fgets($infp)){
if(preg_match('/– Current Database: `([^`]+)`/',$line,$matches)){
if($outfp){
fclose($outfp);
}
$db = $matches[1];
$outfp = fopen($db.'.db.sql','w');
}
if($outfp){
fwrite($outfp, $line);
}
}
Thank you! It works great. Exactly what I was looking for!
Thanks Nick,
That’s a great piece of code. Somehow I missed writing linux sed command for doing that!
For extracting database from a complete dump we may use following:
sed -n “/CREATE DATABASE.*\`DATABASE_NAME\`/,/CREATE DATABASE/p” COMPLETE_DUMP_FILE > DATABASE_NAME.sql
Thanks,
Kedar.
Thanks! This saved me a bunch of time. I had a 35 GB dump file from which I needed to extract two tables. It took about 28 minutes to process the file. That was faster than loading the whole dumpfile and then dumping just the tables I needed.
One issue I had was that the script didn’t separate tables with the same name, but in different databases. So, I ended up with one .sql file with the drop, create and insert statements for two tables. I was confused for a while, because the resulting table was much too small. Finally, I realized that the smaller second table was replacing the first table when I loaded that .sql file.
Also, when specifying the table name, a regular expression is used that causes any suffix to also match. I had a table named “tasks” and a much larger table named “tasks_archive”, but I wanted only the statements for the tasks table. I assume it took a little extra time to also write out the tasks_archive table statements, but it wasn’t that big of a deal, since I could obviously ignore it.
Hi Kedar,
Good work!
I was also considering writing such a script (python). What I see is still missing is as follows:
– working with streaming (e.g. mysqldump mydb | mydumpsplit.sh)
– Managing those little thinks, such as disabling FK checks per split table, charsets…
– Handling triggers, stored routines.
PS, doing the reverse: loading the distinct tables back into mysql can be easily accomplished with Maatkit’s mk-parallel-restore (which also serves as a general-purpose parallelizing tool last I checked).