How to minimize impact on SQL Server
But what field terminator?
But one of the key problems is what field terminator should you use to extract out the data. Traditional field terminators are tabs (\t), commas (,), or pipes (|) – yet with the types of events, messages, and logs stored these days the previously mentioned field terminators will likely be in the data extract as well. For example, if a Hive external table is built against comma-delimited files with the fields themselves containing commas, your schema will get messed up.
By default, Hive itself uses the ^A (01) as its field terminator; if you do not specify a field terminator when creating a Hive table, the file(s) Hive generates it will terminate fields using ^A. So if you could create bulk extract from SQL Server that is ^A-delimited, you’re set!
So how do I create a ^A-delimited file via BCP from SQL Server
To create a ^A-delimited file via BCP from SQL Server, the syntax is as follows:
bcp “select * from [table] with (no lock) where tx_date >= ’12/26/2013 00:00:00′ and tx_date < ’12/27/2013 00:00:00′” queryout test.log -S $servername -T -c -t 0x01 -a 65535 -b 100000
As noted in the BCP syntax above, the -t (field terminator parameter) has the value of 0x01 which corresponds to ^A (01) in Windows.
By bulk exporting your data in this fashion, you can now copy / put your data into your Hadoop cluster in Hive’s default field terminator.
Reblogged this on Ayad Shammout's SQL & BI Blog.
Nice. Is there any way by using BCP that we can directly create data file on hdfs, instead of copying from local to hdfs.
I’m afraid there isn’t a way to do this via BCP. BCP will allow you to import / export data from SQL Server. But to get it to HDFS, you will probably want to use soemthing like SQOOP or upload the data to HDFS via the hdfs -put commands.