Ramblings of a data dork: from BI and Big Data to Travel and Food
One of the odd HiveODBC error messages that I recently encountered on a project is that when I am extracting data from my Hive/Hadoop cluster using the HiveODBC driver, I end up getting an error message similar to:
OLE DB provider ‘MSDASQL’ for linked server ‘[MyHadoopCluster]‘ returned data that does not match expected data length for column ‘[MSDASQL].error_message’. The (maximum) expected data length is 334, while the returned data length is 387.
In this case, the error message was generated from connecting my SQL Server to a Mac OSX Hadoop 1.0.1 cluster running Hive 0.8.1 using the HiveODBC driver (why I insist on experimenting on my Macbook Air is a blog post for another time!). This is similar to the design as called by the SQL Server Analysis Services to Hive- A Klout Case Study. The first I thought that was curious was that if I had created a HiveODBC linked server connection from my sql server, and then ran a select * into statement,
select * from openquery(‘MyHadoopCluster’,
select * from hive_weblogs
then all the string columns were automatically set to varchar(334).
Digging into the data, I had in fact had a string column where a bunch of rows had a length much greater than 334 characters. In my case, I had error messages that were potentially thousands of characters long. As reminded to me by Dave Mariani (@dmariani) and some colleagues over at Platon (thanks Stig Torngaard Hammeken and Morten Post), this in fact is a bug noted in jira Hive-3382: Strings truncated to length 334 when hive data is accessed through hive ODBC driver.
So until this is fixed, what can you do about it?
A tad frustrating at times, but in many BI-related cases, the full string isn’t required so the workarounds suggested above work fine.
Hopefully this can help you too!