Loading csv or delimited data files to MySQL database is a very common task frequently questioned about and almost everytime LOAD DATA INFILE come into rescue.
Here we will try to understand some of the very common scenarios for loading data into MySQL Database.
The Load Data Syntax:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE ‘file_name’
[REPLACE | IGNORE]
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY ‘char’]
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number LINES]
[(col_name_or_user_var,...)]
[SET col_name = expr,...]
Consider we have to load file with following contents:
#File-name: example.csv
col-1,col-2,col-3
a,2,3
b,4,5
c,6,7
** Excel file can be easily exported as comma separated / delimited file (csv) by File-Save As option to load data.
1. A simple comma-saperated file with column header:
#table structure: example
col-1 col-2 col-3
Considering our MySQL table having same column sequence we can issue:
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES ;
This is a very common and simple scenario.
Quick updates:
- Ofcourse, if we don’t have column headers (col-1,col-2,col-3) in example.csv, IGNORE 1 LINES is not required.
- Note the file path. Here you should make sure your slashes are proper.
You may give path as: C:\\path\\file.csv or C:/path/file.csv.
- If we have datafile to be loaded stored on client ( Not on server ), we will add LOCAL keyword as given in Syntax.
So, the command will become:
LOAD DATA LOCAL INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES ;
- If we want to replace existing data by data being loaded from file, we will add REPLACE keyword before INTO TABLE.
Similarly if we want input rows that duplicate an existing row on a unique key value to be skipped, we will use IGNORE keyword before INTO TABLE.
2. Column sequence in file and table are different.
#table structure: example
col-2 col-1 col-3
In this case we need to specify column-name sequence of csv file in order to get data loaded in to proper columns.
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES (col-1,col-2,col-3);
3. csv / load data file have lesser number of columns than targetted table
#table structure: example
col-1 col-2 col-3 col-4
Consider, col-1 is auto-increment and not provided in csv.
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES (col-2,col-3,col-4) set col-1=null;
Passing null value will make col-1 to take an auto-increment value.
Using SET you can assign values to those columns which were not available in csv and are not-null.
You may also use a function for doing some perticular task and set a value.
eg. SET col-x=rand();
4. Filling the extra date columns:
This is very similar to 3. Here, we are required col-4 to be filled with present timestamp value, a very simple way to do is altering table
ALTER TABLE example CHANGE COLUMN col-4 col-4 TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
And then,
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES (col-1,col-2,col-2=3) set col-4=null;
It should automatically fill the current_timestamp values for us.
5. Loading data with caculated columns:
#table: example
col-1 col-2 col-3 col-4
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES (col-1,col-2,col-3, @var1) SET col-4 = @var1/100;
Similarly we can alter a string variable as well by altering the variable as follows:
SET col-4 = replace(@var1,"find","replace")
6. Loading csv with table value lookup:
Consider you’ve got a csv with col1 and col2 data and the 3rd column data is availble in another table. You can load the referenced data using a sub-query as follows. You have to make sure you get single row in return may be by using distinct or limit clause.
LOAD DATA INFILE ‘path/to/example.csv’ INTO TABLE example FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES (col-1,col-2) SET col-3 = (SELECT field-name FROM linkedTable where keyfield=col-1);
7. Other ways of loading separated files to MySQL:
CREATE TABLE csv_foo LIKE foo;
ALTER TABLE csv_foo MODIFY COLUMN id INT(10) UNSIGNED NOT NULL;
// remove auto increment
ALTER TABLE csv_foo DROP PRIMARY KEY;
// drop key as no keys are supported in csv storage engine
Alternatively you may do:
CREATE TABLE csv_foo AS SELECT * FROM FOO LIMIT 0;
// Ignores key definitions and auto-increment
// Make sure you don't have any nullable columns.
Now,
STOP MYSQL SERVER
under data directory replace csv_foo.csv file by available data-file.csv. (Rename it to csv_foo.csv)
START MYSQL SERVER
We may need to do: REPAIR TABLE csv_foo;
Well, this is not a “good” way though.
8. Loading multiple files:
Documentation says that MYSQL LOAD DATA will not be able to do it for us.
We have a separate option available for the same.
Refer: mysqlimport
Conclusion: I hope we have covered common scenarios which shall mostly help; rest will always be answered here.
Finally, If you want to load data to MySQL Server, LOAD DATA.
You might also like::
- MySQL Stored procedure – Split Delimited string into Rows
- Quick Multi MySQL Server Installation with Master-Master Replication on Same Windows Box
- Search-find through all databases, tables, columns in mysql
- Using VLookup like Batch script to compare two excel / csv
- Linux Shell Commands – quick how-tos
Just tried it on my site. Works flawlessly. Thanks a bunch!
Hi paul,
Good to hear your successful try.
You can of-course go through other topics if you already haven’t.
Great article!
There is a third party utility to do the same.
http://www.sqldbu.com/eng/sections/tips/normalize.html
how can i convert date from excel csv to date in mysql.
i have to put column in mysql as VARCHAR so then it can read from csv. but if i change to DATE it read as 00-00-0000.
any idea?
Hi mcheali!
I’m not sure in what format you have your date stored in csv.
If you have csv as:
#x.csv
1,2010-05-21
2,2010-05-22
3,2010-05-20
#table x:
CREATE TABLE `x` (
`a` int(10) NOT NULL,
`b` date default NULL
) ENGINE=MyISAM;
then a very simple load data query will fill the date values correctly.
#Query:
LOAD DATA INFILE “c:/x.txt” INTO TABLE x
FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’;
….greetings….
i have bunch of csv file where some of the files has column name order differ from each other and also differ from the table in database..
how can i load the data into the correct field?
the table have about 50 columns and each csv file have about 20-30 columns only..
any suggestion?
Greetings zeera,
Well I believe if you have column order different from each other and table as well, it’s case #3 describled above. You have to find out proper field positions and then changing the order in load data command should work. It’s like you must specify a column list if the order of the fields in the input file differs from the order of the columns in the table.
Eg.
LOAD DATA INFILE ‘file.csv’ INTO TABLE tablename (col1,col3,col5,col2…);
About 50 vs 20/30 Columns, if you have nullable columns, things will be fine.
thanks Kedar… i know it will work..
but how about the csv file is uploaded by the user?
how am i supposely know the table order?
and i won’t know what column is nullable for the file..
- i’m using php with mysql -
any idea to solve my solution? thanks in advance
sorry for the wrong word..
-what if the file is uploded by the user? how to know the column order?
Zeera,
- You should ask a formatted input, may be you can publish a template and reject user uploads which are not as per template.
- You can use ‘DESC tablename’, will show you column names in order. It will also show you data type information and whether the column is ‘allow-null’ or not.
- About parsing user inputs I believe instead of using “load data” you might need some data parsing script when your inputes are not consistent.
Hope this helps.