About three and a half years ago, I had virtually joined the Yahoo! Targeting, Analytics, and Optimization (TAO) Engineering team where we embarked on an incredible journey to create the largest single instance Analysis Services cube. Mind you, that was not our actual goal – our actual goal was to create fast interactive analytics against a massive amount of display advertising data from Yahoo! sites. The requirements were staggering as noted in the slide below.
Ultimately, we took 2PB of data from one of Yahoo!’s large Hadoop cluster and created a 24TB Analysis Services cube so users could do fast interactive analytics in a matter of seconds!
Where can I learn more?
There are some pretty good information concerning how we built this cube including:
– SQLPASS 2010 Keynote with Ted Kummert – Yahoo! VP User Data Analytics David Mariani (@dmariani) speaks about the Yahoo! TAO cube (16TB at the time).
– Yahoo! TAO Case Study excerpt from SQLPASS 2011 conference with Thomas Kejser (@tkejser), Kenneth Lieu (Yahoo!), and myself
But a great place to learn more deeply about this solution and get to ask questions about it is at the PASS Business Analytics Conference! Yahoo! Lead Developer Dianne Eckloff and I will be co-presenting about the Yahoo! TAO cube from Hadoop to BI.
Speaker(s): Denny Lee Dianne Eckloff
Duration: 60 minutes
Track: Big Data Innovations and Integration
Would you like to know more about how Yahoo! built one of the world’s largest SQL Server Analysis Services cubes at 24TB? Join us for this technical deep-dive showcasing how Yahoo! leverages Analysis Services with Hadoop to deliver more meaningful and useful analytical data faster. As a leading digital media company, Yahoo! provides a range of online services, including a group of popular consumer websites that attract more than 700 million unique visitors a month. To improve ad campaign effectiveness and increase revenue, Yahoo! implemented a solution integrating the Yahoo! Hadoop data processing framework with Analysis Services. This session will focus on the design decisions and best practices for implementing a large- scale analysis environment as learned through this implementation.
Hope to see you there!