How to minimize impact on SQL Server
But what field terminator?
But one of the key problems is what field terminator should you use to extract out the data. Traditional field terminators are tabs (\t), commas (,), or pipes (|) – yet with the types of events, messages, and logs stored these days the previously mentioned field terminators will likely be in the data extract as well. For example, if a Hive external table is built against comma-delimited files with the fields themselves containing commas, your schema will get messed up.
By default, Hive itself uses the ^A (01) as its field terminator; if you do not specify a field terminator when creating a Hive table, the file(s) Hive generates it will terminate fields using ^A. So if you could create bulk extract from SQL Server that is ^A-delimited, you’re set!
So how do I create a ^A-delimited file via BCP from SQL Server
To create a ^A-delimited file via BCP from SQL Server, the syntax is as follows:
bcp “select * from [table] with (no lock) where tx_date >= ’12/26/2013 00:00:00′ and tx_date < ’12/27/2013 00:00:00′” queryout test.log -S $servername -T -c -t 0x01 -a 65535 -b 100000
As noted in the BCP syntax above, the -t (field terminator parameter) has the value of 0x01 which corresponds to ^A (01) in Windows.
By bulk exporting your data in this fashion, you can now copy / put your data into your Hadoop cluster in Hive’s default field terminator.