While I’m a big fan of CDH5 and Hue – sometimes I will see some funkiness that’s a tad irritating. Specifically, there is a database with a name similar to cloudera_manager_metastore_canary_test_db_hive_hivemetastore_$guid$_2014_10_06_11_20_41 Even more irritating there is a table called cm_test_table which cannot be deleted (or renamed or even described). hive> describe cm_test_table; FAILED: SemanticException [Error 10001]: Table not found cm_test_table hive> alter table cm_test_table RENAME to cm_test_table2; FAILED: SemanticException [Error 10001]: Table not found cm_test_table hive> drop table cm_test_table; FAILED: SemanticException [Error 10001]: Table not found cm_test_table To work around this problem, its a matter of using the CASCADE reference to…
Tag: hive
Quick Tip for extracting SQL Server data to Hive
While I have documented various techniques to transfer data from Hadoop to SQL Server / Analysis Services (e.g. How Klout changed the landscape of social media with Hadoop and BI Slides Updated, SQL Server Analysis Services to Hive, etc.), this post calls out the reverse – how to quickly extract SQL Server data to Hadoop / Hive. This is a common scenario where SQL Server is being used as your transactional store and you want to push data to some other repository for analysis where you are mashing together semi-structured and structured data. How to minimize impact on SQL Server…
Hive and Windows Auth – the curse of the backslash
Captain Avery: Put down the sword. A sword could kill us all, girl. Amy: Yeah. Thanks. That’s actually why I’m pointing it at you. — from “Doctor Who: The Curse of the Black Spot” Background Typically when you get Hive / Hadoop up and running, everything runs pretty smoothly especially if you use one of the demo VMs (e.g. I’m currently using the Cloudera QuickStart VM). But if you are in production and you want to secure login access to your environment, you may have Windows authentication turned on for access to one of the boxes on your Hadoop cluster…
Quick Tech Tip: SETting Cloudera Hue Beeswax to create a compressed Hive table
I’m currently playing with CDH 4.1 and was having fun with Hue – specifically Beeswax to execute Hive queries from a nice web UI. As noted in Hadoop compression codecs and optimizing Hive joins (and using compression to do it), using compression gives you more space and in many cases can improve query performance. Yet to my dismay, when I tried to execute a bunch of SET statements, I ended up getting the OK FAILED parse exception. Of course this is what happens when you haven’t played a particular tech in awhile and don’t bother to do tutorials! On the…
Quick Tips and Q&A for SQL Server Analysis Services to Hive
Over the last few weeks I’ve fielded some questions concerning the paper that Dave Mariani (@dmariani) and I had contributed to the whitepaper / case study SQL Server Analysis Services to Hive; below is an aggregate of those tips – hope this helps! Q: I’m running into the HiveODBC Error message “..expected data length is 334…” A: Check out the post for details on how to potentially resolve this: HiveODBC error message “..expected data length is 334…” . Q: Can I connect Analysis Services Tabular to Hive instead of Multidimensional? A: Yes! Ayad Shammout (@aashammout) has a couple of great…
Project “ChâteauKebob”: Big Data to BI End-to-End Healthcare Auditing Compliance
Originally posted on Ayad Shammout's SQL & BI Blog:
Authors: Ayad Shammout & Denny Lee It may sound like a rather odd name for an End-to-End Auditing Compliance project – and the roots admittedly enough are based on the authors’ prediliction toward great food in the city of Montréal – but there actually is an analogous association! Château means manor house or palace and kebob refers to meat that is cooked over or next to flames; large or small cuts of meat, or even ground meat, it may be served on plates, or in sandwiches (mouth watering yet). Château…
Hadoop/Hive – Writing a Custom SerDe (Part 1)
Rui Martin has started a great blog post series on how to write a custom SerDe. The first post of the series, he provides the fundamentals as noted in the diagram below. For more information, please review Rui’s great post: Hadoop/Hive – Writing a Custom SerDe (Part 1).
Import Hadoop Data into SQL BI Semantic Model Tabular
Originally posted on Ayad Shammout's SQL & BI Blog:
Hadoop brings scale and flexibility that don’t exist in the traditional data warehouse. Using Hive as a data warehouse for Hadoop to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets. Although Hive supports ad-hoc queries for Hadoop through HiveQL, query performance is often prohibitive for even the most common BI scenarios. A better solution is to bring relevant Hadoop data into SQL Server Analysis Services Tabular model by using HiveQL. Analysis Services can then serve up the data for ad-hoc analysis and reporting. But, there…