How To Generate Random test Data In MySQL

mysql-random-data-generator
mysql-random-data-generator

Are you tired of manually generating test data for your MySQL tables? If you’re looking for random data generator, look no further! Introducing the MySQL Random Data Generator, a powerful tool designed to effortlessly create random test data for your tables. There are tools that will generate random data for you but they’re not free, though random data generator is free and open source. Importantly it remains in your control.

Why Use MySQL Random Data Generator

  • Table-Aware: My tool leverages MySQL’s deep understanding of its own tables to automatically generate data based on table definitions.
  • Effortless: Say goodbye to time-consuming perl or shell scripts with loops. Let MySQL handle the data generation efficiently.
  • Free and Open Source: Download the tool from my GitHub repository for free. I encourage you to report bugs and contribute to improvements.

I hate generating dummy data, yes I do and I assume you do too! I think that’s the major reason I wrote these MySQL functions and procedures for Generating dummy test data.
I wrote combination of MySQL store routines and procedures that will generate the random test data for you without needing to specify table information. The tool will automatically generate the random data based on the table definition. You can download the data generator tool for MySQL from Github for free. Please consider reporting bugs and improvements.

Download: MySQL Random Data Generator.


People can manage these things with a simple perl / shell script with loops but again that always need your time.
Why can’t MySQL generate data for it’s own table when MySQL better knows the table than anyone else !!

Below are a set of functions and a stored procedure that will make our life easy and of-course it’s free.

Download Random Data Generator Tool For MySQL:

1. Download the SQL code: Generate dummy data for MySQL.
2. Download SQL for generating random data for foreign-key dependent child tables: populate_fk.sql

How to install and generate dummy data:

mysql -uUSER -pPASSWORD DATABASE < populate_dummy_data.txt

How generate test data

To generate test data of 1000 rows for sakila.film table execute following SQL command:

call populate('sakila','film',1000,'N');

MySQL set of functions to get random values generated for individual data-types.

Important Note: This is very first version (2012) of the script consider getting the latest from github.

## MySQL function to generate random string of specified length
DROP function if exists get_string;
delimiter $
CREATE FUNCTION get_string(in_strlen int) RETURNS VARCHAR(500) DETERMINISTIC
BEGIN 
set @var:='';
while(in_strlen>0) do
set @var:=concat(@var,IFNULL(ELT(1+FLOOR(RAND() * 53), 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',' ','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'),'Kedar'));
set in_strlen:=in_strlen-1;
end while;	
RETURN @var;
END $
delimiter ;


## MySQL function to generate random Enum-ID from specified enum definition
DELIMITER $
DROP FUNCTION IF EXISTS get_enum $
CREATE FUNCTION get_enum(col_type varchar(100)) RETURNS VARCHAR(100) DETERMINISTIC
	RETURN if((@var:=ceil(rand()*10)) > (length(col_type)-length(replace(col_type,',',''))+1),(length(col_type)-length(replace(col_type,',',''))+1),@var);
$
DELIMITER ;


## MySQL function to generate random float value from specified precision and scale.
DELIMITER $
DROP FUNCTION IF EXISTS get_float $
CREATE FUNCTION get_float(in_precision int, in_scale int) RETURNS VARCHAR(100) DETERMINISTIC
	RETURN round(rand()*pow(10,(in_precision-in_scale)),in_scale) 
$
DELIMITER ;



## MySQL function to generate random date (of year 2012).
DELIMITER $
DROP FUNCTION IF EXISTS get_date $
CREATE FUNCTION get_date() RETURNS VARCHAR(10) DETERMINISTIC
	RETURN DATE(FROM_UNIXTIME(RAND() * (1356892200 - 1325356200) + 1325356200))
#	Below will generate random data for random years
#	RETURN DATE(FROM_UNIXTIME(RAND() * (1577817000 - 946665000) + 1325356200))
$
DELIMITER ;


## MySQL function to generate random time.
DELIMITER $
DROP FUNCTION IF EXISTS get_time $
CREATE FUNCTION get_time() RETURNS INTEGER DETERMINISTIC
	RETURN TIME(FROM_UNIXTIME(RAND() * (1356892200 - 1325356200) + 1325356200))
$
DELIMITER ;

## MySQL function to generate random int.
DELIMITER $
DROP FUNCTION IF EXISTS get_int $
CREATE FUNCTION get_int() RETURNS INTEGER DETERMINISTIC
	RETURN floor(rand()*10000000) 
$
DELIMITER ;

## MySQL function to generate random tinyint.
DELIMITER $
DROP FUNCTION IF EXISTS get_tinyint $
CREATE FUNCTION get_tinyint() RETURNS INTEGER DETERMINISTIC
	RETURN floor(rand()*100) 
$
DELIMITER ;

## MySQL function to generate random varchar column of specified length(alpha-numeric string).
DELIMITER $
DROP FUNCTION IF EXISTS get_varchar $
CREATE FUNCTION get_varchar(in_length varchar(500)) RETURNS VARCHAR(500) DETERMINISTIC
	RETURN SUBSTRING(MD5(RAND()) FROM 1 FOR in_length)
$
DELIMITER ;

## MySQL function to generate random datetime value (any datetime of year 2012).
DELIMITER $
DROP FUNCTION IF EXISTS get_datetime $
CREATE FUNCTION get_datetime() RETURNS VARCHAR(30) DETERMINISTIC
	RETURN FROM_UNIXTIME(ROUND(RAND() * (1356892200 - 1325356200)) + 1325356200)
$
DELIMITER ;

The MySQL Stored procedure that populates MySQL tables with dummy data

DELIMITER $

DROP PROCEDURE IF EXISTS populate $
CREATE PROCEDURE populate(in_db varchar(50), in_table varchar(50), in_rows int, in_debug char(1)) 
BEGIN
/*
|
| Developer: Kedar Vaijanapurkar
| USAGE: call populate('DATABASE-NAME','TABLE-NAME',NUMBER-OF-ROWS,DEBUG-MODE);
| EXAMPLE: call populate('sakila','film',100,'N');
| Debug-mode will print an SQL that's executed and iterated.
|
*/

DECLARE col_name VARCHAR(100);
DECLARE col_type VARCHAR(100); 
DECLARE col_datatype VARCHAR(100);
DECLARE col_maxlen VARCHAR(100); 
DECLARE col_extra VARCHAR(100);
DECLARE col_num_precision VARCHAR(100);
DECLARE col_num_scale VARCHAR(100);
DECLARE func_query VARCHAR(1000);
DECLARE i INT;

DECLARE done INT DEFAULT 0;
DECLARE cur_datatype cursor FOR
 SELECT column_name,COLUMN_TYPE,data_type,CHARACTER_MAXIMUM_LENGTH,EXTRA,NUMERIC_PRECISION,NUMERIC_SCALE FROM information_schema.columns WHERE table_name=in_table AND table_schema=in_db;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;


SET func_query='';
OPEN cur_datatype;
datatype_loop: loop

FETCH cur_datatype INTO col_name, col_type, col_datatype, col_maxlen, col_extra, col_num_precision, col_num_scale;
#SELECT CONCAT(col_name,"-", col_type,"-", col_datatype,"-", IFNULL(col_maxlen,'NULL'),"-", IFNULL(col_extra,'NULL')) AS VALS;
  IF (done = 1) THEN
    leave datatype_loop;
  END IF;

CASE 
WHEN col_extra='auto_increment' THEN SET func_query=concat(func_query,'NULL, ');
WHEN col_datatype in ('int','bigint') THEN SET func_query=concat(func_query,'get_int(), ');
WHEN col_datatype in ('varchar','char') THEN SET func_query=concat(func_query,'get_string(',ifnull(col_maxlen,0),'), ');
WHEN col_datatype in ('tinyint', 'smallint','year') or col_datatype='mediumint' THEN SET func_query=concat(func_query,'get_tinyint(), ');
WHEN col_datatype in ('datetime','timestamp') THEN SET func_query=concat(func_query,'get_datetime(), ');
WHEN col_datatype in ('float', 'decimal') THEN SET func_query=concat(func_query,'get_float(',col_num_precision,',',col_num_scale,'), ');
WHEN col_datatype in ('enum','set') THEN SET func_query=concat(func_query,'get_enum("',col_type,'"), ');
WHEN col_datatype in ('GEOMETRY','POINT','LINESTRING','POLYGON','MULTIPOINT','MULTILINESTRING','MULTIPOLYGON','GEOMETRYCOLLECTION') THEN SET func_query=concat(func_query,'NULL, ');
ELSE SET func_query=concat(func_query,'get_varchar(',ifnull(col_maxlen,0),'), ');
END CASE;


end loop  datatype_loop;
close cur_datatype;

SET func_query=trim(trailing ', ' FROM func_query);
SET @func_query=concat("INSERT INTO ", in_db,".",in_table," VALUES (",func_query,");");
	IF in_debug='Y' THEN
		select @func_query;
	END IF;
SET i=in_rows;
populate :loop
	WHILE (i>0) DO
	  PREPARE t_stmt FROM @func_query;
	  EXECUTE t_stmt;
	SET i=i-1;
END WHILE;
LEAVE populate;
END LOOP populate;
SELECT "Kedar Vaijanapurkar" AS "Developed by";
END
$
DELIMITER ;

Let me know if you like it and share if you find it useful. Please report any bug.

Conclusion

In the realm of MySQL database management, the process of generating test data has long been a tedious endeavor. My MySQL Random Data Generator seeks to revolutionize this experience by offering a free, efficient, and automated solution. Consider star-ing and forking the repository if you’re keep to contribute.

19 comments
  1. I like this approach very much. Unfortunately in my case it doesn’t work with MySql 8.x. When I run the procedure, it keeps saying that there is an error in the SQL syntax near ‘NULL’ – any idea ?

  2. Very useful, very nice.

    Proc populate() needs …

    WHEN col_datatype in (‘date’) THEN
    SET func_query=concat(func_query,’get_date(), ‘);

    … before WHEN … (‘datetime’… )

  3. hey,
    I am getting this error:
    ERROR 1172 (42000): Result consisted of more than one row
    any suggestions ?

    1. Well that sounds like a bug, though I use this tool fine! Can you provide with more info (like say schema/command)?

      Thanks,
      Kedar.

  4. Hi Kedar, thanks for this wonderful utility. I have a query for you. Often the tables are linked using PK-FK. How do i take care of this relationship in the test data generated. i.e The entries in FK column should be based on PK values. thanks a ton.

    1. Somehow missed this!! Well presently we do not have this implemented here. What our function does is actually disable fk check and loads the dummy data.
      To make actual relation more work is required here. I will keep it in my todo list but not sure when I’d be able to justify that!
      If you can, do share.

      We do have git if you want to fork: https://github.com/kedarvj/mysql-random-data-generator

      Regards,
      Kedar

  5. hi, I keep getting this error on calling populate
    “ERROR 1267 (HY000) : Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation ‘='”

  6. I like your approach. It’s quick and simple, however when I generate data it inserts nulls and 0 into the database. Any idea?

    I have one suggestion for the size of the table names. 20 is too short. It needs to be longer to support longer table names.

    1. I figured out my problem. I wasn’t passing in the correct db name. Thanks much for the script! It’s exactly what I needed. Not many scripts out there that can:

      1. generate data based on the schema
      2. generate random data
      3. can generate data without any configuration

      This is nice. I hope you continue to build on it!

  7. The routine to create test data can and should be created from PHP (or your programming language). The logic should never be in the database, even for the testing logic.

    The above code can be confusing and I’m not sure if it works across the board – what if you have images stored in the database? Will the code above work?

    1. Hello itoctopus,

      As already said, these are mysql stored procedure & functions combinely generates dummy data that aimed to be used only for testing purpose. I found it useful and hence shared.

      Logically this script fetches table’s mata data and as per each field’s datatype, it generates random data for respective datatype. If any of the datatype is not handled it generates a varchar data.

      I’ve tested it and use it for testing purposes. I do keep updating it as per requirements.

      About your question, ofcourse BLOB datatype hasn’t been considered here but it will generate dummy VARCHAR data for that! You can ofcourse write a function and update procedure to generate BLOB data.

      I hope I cleared your concern. Do try and let me know.

      Thanks,
      Kedar.

    2. Alright, most of us know that. This is an SQL, not a PHP solution.

      Maybe. You haven’t tried it yet have only negative things to say.

      I personally like seeing his method and also look and find how I might change things.

  8. Its nice..

    Reduced my work to insert records for testing purpose. Just need to access the function…

Leave a Reply to Ralph Smith Cancel reply

Your email address will not be published. Required fields are marked *