The Future of Hadoop: A deeper look at Apache Spark

Understand why Apache Spark has experienced such wide adoption and learn about some Spark use cases today. There is also a technical deep dive into the architecture, and our vision for the Hadoop ecosystem and why we believe Spark is the successor to MapReduce for Hadoop data processing. As well, here’s the link to The Future of Hadoop: A deeper look at Apache Spark webinar.

Rate this:

Simplifying Big Data: An Introductory Hadoop Primer

Back in July, I had the honor to speak with Michael Zeller moderating the July 2014 AM webinar on Big Data. If you are interested in learning more about Big Data from a business / analyst perspective – here is our webinar on YouTube. Abstract: What’s the Big Deal with Big Data? And, more importantly, what is the business case for Big Data? In this session, we will focus on the fundamentals of Hadoop as it is the foundation for Big Data. We’ll talk the technology but also the business cases on what you can do and not do with…

Rate this:

Quick Tip for Compressing Many Small Text Files within HDFS via Pig

One of the good (or bad depending on your point of view) habits when working with Hadoop is that you can push your files into the Hadoop cluster and worry about making sense of this data at a later time.  One of the many issues with this approach is that you may rapidly run out of disk space on your cluster or your cloud storage.  A good way to alleviate this issue (outside of deleting the data) is to compress the data within HDFS. More information on how the script works are embedded within the comments. /* ** Pig Script:…

Rate this:

Quick Tip for extracting SQL Server data to Hive

While I have documented various techniques to transfer data from Hadoop to SQL Server / Analysis Services (e.g. How Klout changed the landscape of social media with Hadoop and BI Slides Updated, SQL Server Analysis Services to Hive, etc.), this post calls out the reverse – how to quickly extract SQL Server data to Hadoop / Hive.   This is a common scenario where SQL Server is being used as your transactional store and you want to push data to some other repository for analysis where you are mashing together semi-structured and structured data. How to minimize impact on SQL Server…

Rate this:

Quick Tips and Q&A for SQL Server Analysis Services to Hive

Over the last few weeks I’ve fielded some questions concerning the paper that Dave Mariani (@dmariani) and I had contributed to the whitepaper / case study SQL Server Analysis Services to Hive; below is an aggregate of those tips – hope this helps! Q: I’m running into the HiveODBC Error message “..expected data length is 334…” A: Check out the post for details on how to potentially resolve this: HiveODBC error message “..expected data length is 334…”  .  Q: Can I connect Analysis Services Tabular to Hive instead of Multidimensional? A: Yes! Ayad Shammout (@aashammout) has a couple of great…

Rate this:

Project “ChâteauKebob”: Big Data to BI End-to-End Healthcare Auditing Compliance

Originally posted on Ayad Shammout's SQL & BI Blog:
Authors: Ayad Shammout & Denny Lee It may sound like a rather odd name for an End-to-End Auditing Compliance project – and the roots admittedly enough are based on the authors’ prediliction toward great food in the city of Montréal – but there actually is an analogous association! Château means manor house or palace and kebob refers to meat that is cooked over or next to flames; large or small cuts of meat, or even ground meat, it may be served on plates, or in sandwiches (mouth watering yet).  Château…

Rate this:

Import Hadoop Data into SQL BI Semantic Model Tabular

Originally posted on Ayad Shammout's SQL & BI Blog:
  Hadoop brings scale and flexibility that don’t exist in the traditional data warehouse. Using Hive as a data warehouse for Hadoop to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets. Although Hive supports ad-hoc queries for Hadoop through HiveQL, query performance is often prohibitive for even the most common BI scenarios. A better solution is to bring relevant Hadoop data into SQL Server Analysis Services Tabular model by using HiveQL. Analysis Services can then serve up the data for ad-hoc analysis and reporting. But, there…

Rate this: