Using Avro with HDInsight on Azure at 343 Industries

By Michael Wetzel, Tamir Melamed, Mark Vayman, Denny Lee Reviewed by Pedro Urbina Escos, Brad Sarsfield, Rui Martins Thanks to Krishnan Kaniappan, Che Chou, Jennifer Yi, and Rob Semsey As noted in the Windows Azure Customer Solution Case Study, Halo 4 developer 343 Industries Gets New User Insights from Big Data in the Cloud, a critical component to achieve faster Hadoop query and processing performance AND keep file sizes small (thus Azure storage savings, faster query performance, and reduced network overhead) was to utilize Avro sequence files. Avro was designed for Hadoop to help make Hadoop more interoperable with other…

Rate this:

Presentation: Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together

About three and a half years ago, I had virtually joined the Yahoo! Targeting, Analytics, and Optimization (TAO) Engineering team where we embarked on an incredible journey to create the largest single instance Analysis Services cube.  Mind you, that was not our actual goal – our actual goal was to create fast interactive analytics against a massive amount of display advertising data from Yahoo! sites.  The requirements were staggering as noted in the slide below.   Ultimately, we took 2PB of data from one of Yahoo!’s large Hadoop cluster and created a 24TB Analysis Services cube so users could do…

Rate this:

Getting Hadoop and protobufs up and running with Elephant Bird on Mac OSX Mountain Lion

. “No, not Angry Bird – Elephant Bird!” — said no one . . In a few of my customer projects, we started diving into using protocol buffers (protobufs) as our sequence file to be stored within our Hadoop infrastructure.  While these were HDInsight on Azure projects, most of the native Hadoop code is written originally in Linux and has implied assumptions as directory paths, etc.  Therefore, one of the first things I usually do is try to install said software on my handy MacBook Air so that way if and when I run into issues getting the code to…

Rate this:

#PASSBAC – Ensuring Compliance of Patient Data with Big Data and BI

Over the past seven years, Ayad Shammout (@aashamout), Principal Business Intelligence Consultant at Beth Israel Deaconess Medical Center (a teaching hospital of Harvard Medical School), and I have worked on a variety of very exciting SQL Server projects including (but not limited to) Healthcare Group Upgrading to SQL Server 2008 to Better Protect 2 Terabytes of Data,  Healthcare Group Improves Availability and Security of Mission-Critical Databases, Healthcare Group to Enhance Information Access with Powerful Business Intelligence Tools, and SQL Server Reporting Services Disaster Recovery Case Study We’ve worked on some pretty hinke ones including the infamous PowerPivot for SharePoint / Windows Authenticated Users…

Rate this:

Yahoo! 24TB SSAS Big Data Case Study + Slides

In my post from last year, I had asked the rhetorical question What’s so BIG about “Big Data”.  I had the honor of announcing the largest known Analysis Services cube – at 24TB – in which the source of this cube is 2PB from a huge Hadoop cluster. For those whom had attended the PASS 2011 session Tier-1 BI in the world of Big Data, Thomas Kejser, myself, and Kenneth Lieu were honored to discuss the details surrounding this uber-cube.   At that time, I had promised the case study would only be months away… Alas, it took a little while…

Rate this:

Getting your Pig to eat ASV blobs in Windows Azure HDInsight

Recently I was asked how could I get my Pig scripts to access files stored in Azure Blob Storage through the command line prompt.  While it is possible to do this from HDInsight Interactive JavaScript console, to automate scripts and use the grunt interactive shell, it is easier to run these commands from the command line.  To do this, you will need to: Ensure your HDInsight Azure cluster is connected to Azure Blob Storage subscription / account Familiarize yourself with the pig / grunt interactive shell Connecting HDInsight Azure to Azure Blob Storage 1) To do this, go to the…

Rate this: