Import Hadoop Data into SQL BI Semantic Model Tabular

Originally posted on Ayad Shammout's SQL & BI Blog:
  Hadoop brings scale and flexibility that don’t exist in the traditional data warehouse. Using Hive as a data warehouse for Hadoop to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets. Although Hive supports ad-hoc queries for Hadoop through HiveQL, query performance is often prohibitive for even the most common BI scenarios. A better solution is to bring relevant Hadoop data into SQL Server Analysis Services Tabular model by using HiveQL. Analysis Services can then serve up the data for ad-hoc analysis and reporting. But, there…

Rate this:

Compile and add Hive UDF via ADD JAR in HDInsight on Azure

To compile a Hive UDF and if you have the Hadoop source code, the right way to do this is to use maven with the Hive repository so you can compile your JAR using the exact version of the source code / jars that you are working against.  For more information on how to use maven, check out: http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html But in situations where you do not have access to the source code, but you have all the necessary jars (like the jars within the lib folder) you can workaround this by manually compiling the Hive UDFs.  To do this, let’s…

Rate this:

Add Built-In Hive UDFs on HDInsight Azure

In the last few weeks, I have had a number of customers ping me about how to utilize various Hive UDFs.  The first ask was how to use some of the UDFs that are already built into Hive.  For example, if you wanted a generated row sequence number (i.e. an IDENTITY column), you can use the Hive UDF UDFRowSequence.  This UDF is already built and included in the hive-contrib-0.9.0.jar that is not already loaded in the distributed cache (run list jars from the Hive CLI to verify).  Below is a quick code snippet that allows you to run the generated…

Rate this:

Optimizing Joins running on HDInsight Hive on Azure at GFS

.“…to look at the stars always makes me dream, as simply as I dream over the black dots of a map representing towns and villages…” — Vincent Van Gogh Image Source: Vincent Van Gogh Painting Tilt Shifted: http://coolvibe.com/2011/16-van-gogh-paintings-tilt-shifted/tilt-shift-van-gogh-15/ . Introduction To analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.  A common scenarios is to perform joins between the various tables of data.  This quick blog post provides a little context on how we managed take a query from >2h to <10min and…

Rate this:

Healthcare Compliance with Big Data and BI

Originally posted on Ayad Shammout's SQL & BI Blog:
Healthcare Compliance with Big Data and BI Over the past few years Denny Lee  (Technical Principal Program Manager within Microsoft’s SQL Business Intelligence Group) and I are always working on a very exciting SQL Server projects, earlier this month we presented “Big Data, BI, and Compliance in Healthcare” at PASS BA Conference in Chicago, IL. Few years ago, we implemented “Centralized Audit Framework” to manage and view the audits of entire SQL Server environment that will parse, load, and report all of audit logs. Expanding on the “Reaching Compliance: SQL…

Rate this:

Updated HDInsight on Azure ASV paths for multiple storage accounts

If you’ve joined the HDInsight Preview – you will notice many new changes including the tight integration with Windows Azure and that HDInsight defaults to ASV.  As noted in Why use Blob Storage with HDInsight on Azure, there are some interesting technical (performance) and business reasons for utilizing Azure storage accounts. But if you had been playing with the HadoopOnAzure.com beta and switched over to the Windows Azure HDInsight Service Preview – you’ll may have noticed a quick change in the way asv paths work.  Here’s a quick cheat sheet for you. In general, to access ASV sources #ls asv://$container$@$storage_account$.blob.core.windows.net/$path$…

Rate this:

Why use Blob Storage with HDInsight on Azure

By Brad Sarsfield and Denny Lee One of the questions we are commonly asked concerning HDInsight, Azure, and Azure Blob Storage is why one should store their data into Azure Blob Storage instead of HDFS on the HDInsight Azure Compute nodes.  After all, Hadoop is all about moving compute to data vs. traditionally moving data to compute as noted in Moving data to compute or compute to data? That is the Big Data question.  The network is often the bottleneck and making it performant can be expensive.  Yet the practice for HDInsight on Azure is to place the data into…

Rate this: