Import Hadoop Data into SQL BI Semantic Model Tabular

Originally posted on Ayad Shammout's SQL & BI Blog:
  Hadoop brings scale and flexibility that don’t exist in the traditional data warehouse. Using Hive as a data warehouse for Hadoop to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets. Although Hive supports ad-hoc queries for Hadoop through HiveQL, query performance is often prohibitive for even the most common BI scenarios. A better solution is to bring relevant Hadoop data into SQL Server Analysis Services Tabular model by using HiveQL. Analysis Services can then serve up the data for ad-hoc analysis and reporting. But, there…

Rate this:

Compile and add Hive UDF via ADD JAR in HDInsight on Azure

To compile a Hive UDF and if you have the Hadoop source code, the right way to do this is to use maven with the Hive repository so you can compile your JAR using the exact version of the source code / jars that you are working against.  For more information on how to use maven, check out: http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html But in situations where you do not have access to the source code, but you have all the necessary jars (like the jars within the lib folder) you can workaround this by manually compiling the Hive UDFs.  To do this, let’s…

Rate this:

Add Built-In Hive UDFs on HDInsight Azure

In the last few weeks, I have had a number of customers ping me about how to utilize various Hive UDFs.  The first ask was how to use some of the UDFs that are already built into Hive.  For example, if you wanted a generated row sequence number (i.e. an IDENTITY column), you can use the Hive UDF UDFRowSequence.  This UDF is already built and included in the hive-contrib-0.9.0.jar that is not already loaded in the distributed cache (run list jars from the Hive CLI to verify).  Below is a quick code snippet that allows you to run the generated…

Rate this:

Optimizing Joins running on HDInsight Hive on Azure at GFS

.“…to look at the stars always makes me dream, as simply as I dream over the black dots of a map representing towns and villages…” — Vincent Van Gogh Image Source: Vincent Van Gogh Painting Tilt Shifted: http://coolvibe.com/2011/16-van-gogh-paintings-tilt-shifted/tilt-shift-van-gogh-15/ . Introduction To analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.  A common scenarios is to perform joins between the various tables of data.  This quick blog post provides a little context on how we managed take a query from >2h to <10min and…

Rate this:

Healthcare Compliance with Big Data and BI

Originally posted on Ayad Shammout's SQL & BI Blog:
Healthcare Compliance with Big Data and BI Over the past few years Denny Lee  (Technical Principal Program Manager within Microsoft’s SQL Business Intelligence Group) and I are always working on a very exciting SQL Server projects, earlier this month we presented “Big Data, BI, and Compliance in Healthcare” at PASS BA Conference in Chicago, IL. Few years ago, we implemented “Centralized Audit Framework” to manage and view the audits of entire SQL Server environment that will parse, load, and report all of audit logs. Expanding on the “Reaching Compliance: SQL…

Rate this:

Using Avro with HDInsight on Azure at 343 Industries

By Michael Wetzel, Tamir Melamed, Mark Vayman, Denny Lee Reviewed by Pedro Urbina Escos, Brad Sarsfield, Rui Martins Thanks to Krishnan Kaniappan, Che Chou, Jennifer Yi, and Rob Semsey As noted in the Windows Azure Customer Solution Case Study, Halo 4 developer 343 Industries Gets New User Insights from Big Data in the Cloud, a critical component to achieve faster Hadoop query and processing performance AND keep file sizes small (thus Azure storage savings, faster query performance, and reduced network overhead) was to utilize Avro sequence files. Avro was designed for Hadoop to help make Hadoop more interoperable with other…

Rate this:

Getting Hadoop and protobufs up and running with Elephant Bird on Mac OSX Mountain Lion

. “No, not Angry Bird – Elephant Bird!” — said no one . . In a few of my customer projects, we started diving into using protocol buffers (protobufs) as our sequence file to be stored within our Hadoop infrastructure.  While these were HDInsight on Azure projects, most of the native Hadoop code is written originally in Linux and has implied assumptions as directory paths, etc.  Therefore, one of the first things I usually do is try to install said software on my handy MacBook Air so that way if and when I run into issues getting the code to…

Rate this: