Feeds:
Posts
Comments

I realized that from my tweets and Facebook posts (thanks Facebook Timeline) – I often go outside of the realm of being a nerd and display all sort of passion on travel, outdoor stuff, and of course food!

So one of the things, I’m going to start adding to my repertoire of blog posts will be Foodie Friday.  Every Friday (well, maybe every 2nd Friday), I’ll blog post about some awesome food that I like and that you may want to try!

Taiwanese “Small Eats” and Night Markets

So for my first post, let’s talk about 芋圓 – also known as taro circles.  In Taiwanese – it’s pronounced like ong-yi.  Taiwan is famous for many of its 小吃 – or “small eats”.  Instead of eating a large portion of one particular item, the key is to have many small portions of A LOT of different variety.   You walk to down the many 夜市 – night markets – in any almost any Taiwanese cities – and there are tons of small vendors that make just a few items – specializing in it – for a matter of years.  It’s the old adage of the old Taiwanese cook that has used the same iron pot for 30 years to make fried rice.  But the fried rice tastes better because the pot has many different layers of the same fried rice made the same way for the last 30 years.

Taro Circles or 芋圓

So back to the 芋圓 – basically it’s a simple dessert made typically made with multi-colored taro circles, herbal black jelly, tapioca, green beans and/or red beans on top of crushed ice swimming in sugar / glucose water.

Yet, it is deliciously yummy and very light – very typical of many Taiwanese desserts.

There are many locations in Taiwan and you pretty much find it at any night market and there are usually plenty of stands on every major street.  Typical of many Taiwanese foods, people will have different takes and preferences – and debate adamantly about which stand serves better 芋圓 than another.  There could be three stands all next to each other – and you’ll see the vast majority of people lined up at just one of those stands just because it was the first, or has the better red bean, or better peanuts, or better taro, etc. etc.

The thought has occurred to many many times as I visit my favorite place in 斗六 – Doulieu, it’s in central Taiwan about an 40min southwest of 台中 (Taichung) – called 青麥芋圓 – to create one in Seattle.  I wonder how well it would do …

P.S. if you ever are in 斗六, do check out 青麥芋圓 and order the 芋圓 #4 and #5.  Enjoy!

Doctor Who Amys Choice

.

Funny how you can say something in your head and it sounds fine

– Doctor Who (Matt Smith in Amy’s Choice)

.

Recently I had posted the wiki article: Hadoop on Azure Scenario: Query a web log via HiveQL.

In it described how to analyze a sample web log using HiveQL on the Hadoop on Azure CTP (HadoopOnAzure.com) using

  • Interactive Hive console,
  • Interactive Javascript console,
  • Secure FTP using curl to transfer data to HDFS
  • Creating an EXTERNAL table against compressed log files
  • Executing some simple HiveQL queries.

HiveQL weblog

Screenshot of the Hadoop on Azure Interactive Hive Console executing a HiveQL query.

A bunch of people had followed up to me directly on the article with the question:

Gee – do I have to do all of these steps to do work with Hadoop on Azure?!

The quick answer is: No, the purpose of the wiki article was to showcase how one can interact with Hadoop on Azure – which is very different from the traditional command line interface (CLI).  A great blog post on how the design of Hadoop on Azure was conceived, jump over to Dave Vronay’s blog post: The Design of the Portal for HadoopOnAzure.com.

The wiki entry is a “scenarios” article – so that way you can use all of the different functionality available to you on the Hadoop on Azure portal – like using the interactive consoles, running HiveQL queries, etc.

image

Screenshot of the Hadoop on Azure portal where you interact with live tiles to open ports, remote desktop in, and use an interactive JavaScript or Hive console to run your queries.

Most importantly – because of the cool design, try out the scenarios not only on your desktop/laptop but also on your mobile devices, eh?!

Enjoy!

Doctor Who Earthshock

 

For some people, small, beautiful events is what life is all about!

– Doctor Who: Earthshock, Peter Davison as the 5th Doctor (Story 122)

 

 

 

 

 

 

A tad off course, but today, I was just wow’ed by my Apple Magic Touch mouse.

Why do you ask?

Because of the beauty of industrial design and the wonderful merge of function and form.  The mouse itself is quite sleek and low profile and I like using the multi-gesture swipes all within the the comfort of a few finger moves.

 

But the thing that got my attention was the inside of the mouse and how it works.  When the mouse was running low on power, it did so in an unobtrusive fashion, the mouse still worked but the gestures didn’t operate any more.  So it didn’t fill up my laptop screen real estate with an unnecessary dialog box nor did it wait until the battery was completely out of power.  Instead, I could still use it but didn’t have the convenience of the multi-gesture swipes – I could still do my work but it also reminded me that the mouse was running low on power.  To quickly check, one quick turn over to the back and sure enough the light was red so I knew what the problem was.

magic mouse inside

 

But what made it even nicer was that opening up the mouse, it was not the usual hanging springs and half enclosed case, but it was a fully enclosed battery case with metal covers around the springs to ensure that the springs do not get damaged.

 

Yes, its a nerdy thing to point out – but whether its about Analysis Services I/O, Hadoop performance, …. or a mouse, it is all about the little things inside – i.e. how well you design and engineer it – that count.

All your HBase are belong to us

Friends, coders, beta testers – lend me your data!

.

We have more tag lines like that all within the Hadoop on Azure portal!  Click on the image so you can check out the Hadoop on Azure video by clicking on the link – in addition to funky electronica, it showcases how you can integrate Hadoop, Hive, Pig Latin, Hadoop Javascript, Azure DataMarket, and Excel!

.

After over the past eight months of working with the Isotope Development team led by Alexander Stojanovic (@stojanovic) – the Founder and General Manager of Hadoop on Windows and Azure – I’m proud to personally announce the that Hadoop on Azure is available for Community Technology Preview.

.

For more reference, please review the linked blog posts:

.

But what’s great about all of this effort is that we’re not just about making Hadoop performant on the Windows platform, integrating it with the Microsoft stack so that Open Source and Microsoft technologies can work better together, or making our distribution integrate many other cool technologies (more on this later).  We’re also making it easier.  As noted in the SQLCAT blog post, we build a Metro UI so you can easily work with Hadoop in the cloud whether you are an Apache Hadoop expert, Microsoft BI / DW specialist, Information Worker, or wanna-be-geek.

.

What’s even cooler is that below is a screenshot of the same Hadoop on Azure Metro UI running on my iPad. Isn’t it cool that i can spin up a multi-node Hadoop cluster in the cloud and run Hive / Pig-Latin / Javascript MR queries – all from my iPad, iPhone, Windows Phone, etc.!

We have more to do – Hadoop on Windows is next on the docket – and lots of other cool integrations.  But being there from almost the beginning, it’s great to see something so amazing released in such short amount of time!

Thanks!

Soldier: Damn Trekkies, always crashing the party pretending they’re time travelers.
Claudia: What a nerd.

– Warehouse 13 episode “Queen for a Day”

I was asked the question “What’s the point of a data warehouse if I have Hadoop”?  And if you know me – I tend to answer in terms of an analogy:

The best way to think about it is that a data warehouse (in the relational world) is like a Costco – lots of different stuff but still has a particular business domain (e.g. consumer goods from toilet paper for a year to laptops). While Hadoop is a the uber-warehouse – like a UPS warehouse which contains anything and everything. All of it is in pre-sized boxes (Hadoop breaks everything down into 64MB or 128MB chunks) and can be sent anywhere and everywhere.

Both are needed and important – they do similar things conceptually, yet do very different things implementation wise.

00036ess

 

Aggressive optimism and a no enemies policy pays dividends

– Eric14 aka Eric Baldeschwieler (@jeric14) during ApacheCon 2011 North America Thursday Keynote.

 

 

After all of these years doing BI, why have I jumped on the bandwagon of Hadoop and Big Data? Well, if you read some of my previous blog posts like Revelations – rolling the Hard Six to SQL BI and Hadoop or At a Crossroads … from SSAS to BigData! – I never left the Big Data wagon in the first place. Whether it is Tier-1 BI, web analytics, or Big Data – it’s all about solving some really complex problems.

It’s not just your “Big Data” problems, it’s about your BIG “Data Problems”. – Alexander Stojanovic (@stojanovic), General Manager and Founder Hadoop on Azure / Windows Server, at ApacheCon North America 2011 Meetup

It’s about the openness of the Open Source community (apologies for the pun) that allows us to focus on solving the actual problem instead of trying to understand how a particular system works. It’s the rich interactions on the various Apache mailing lists to Facebook and Netflix providing details on how they solved their problems. Some of my favorites great examples include:

And that’s why I have an aggressive optimism for Hadoop – because it allows us to solve the data problem instead of needlessly focusing on the technical problem.

.

image

 

Sometimes, you have to roll a hard six.

– Commander Adama in Battlestar Galactica “Revelations”

 

 

In May of this year, I had noted that I was At a Crossroads … from SSAS to Big Data!.  After I had written this post, many people ping me wondering if I had left the world of BI and/or left Microsoft altogether. [Note, I have called out that I AM a Microsoft employee but the opinions here are my own]  My subsequent posts ranged from Cloud to a little bit of Analysis Services.

By the way, I did forget to create a separate blog post on the Analysis Services 2008 R2 Performance Guide (fortunately, I did do it on the sqlcat.com site – whew!).

 

Over time, I became a bit more explicit on the subject of big data including posts like the potential of Big Data and “Hadoop: A movement, not just a technology”.  But all of this time, what I was excited about was when we would be able to finally showcase some of our cool stuff including the embracing of Apache HadoopTM – yes, this may sound like marketing speak, but there are good reasons why I’m using it (more later).    I even got a little cheeky last week with my recent blog post You know that I’m tired when… – especially with the last two lines

 

image

Every time I think about big data, I conjure up proboscidae

Proboscidae is the scientific classification order for elephant – and the icon for Hadoop is a yellow elephant (Doug Cutting named Hadoop after his son’s toy elephant).

 

macallan_12

 

I’ve been hearing “go in bar” a lot lately … or perhaps that’s my dyslexia

No, I wasn’t actually thinking about drinking … that much … but the sound of the phrase “go in bar”, if you flip it sounds like “embargo”.  That is, until today 10/12 9:00am PST, our work with Hadoop had to be kept quiet or “embargoed”.

 

 

 

 

Get to the point!

Okay!  With today’s Ted Kummert’s Day 1 Keynote of the SQL Server PASS Summit 2011, I had the honor of demonstrating how SQL BI and Hadoop rock together!   As you can see from the Port 25 Microsoft, Hadoop, and Big Data and the Microsoft News Center for SQL Server 2012 posts there are a number of cool things that are happening:

  • It started with the Hadoop connectors for SQL Server and PDW.  Key call out here is that these connectors are bi-directional to allow data movement back and forth between SQL Server and Hadoop.
  • Windows Server and Windows Azure optimized Hadoop distributions; out of the box (or cloud), the distributions includes support for HDFS, Hive, Pig-Latin, FTP, etc.
  • Our partnership with Hortonworks to help us push forward faster with optimizing Hadoop to run on Windows as noted in their post Bringing Apache Hadoop to Windows.
  • As part of the demo today, I showed the integration of the SQL BI stack with Hadoop by having PowerPivot (for Excel and SharePoint) interact with Hadoop for Windows cluster via Hive and the soon to be released HiveODBC driver.
  • Not shown today, but just as cool will be the release of the Excel Hive Add-in

More information will be posted at www.microsoft.com/bigdata as it becomes available, eh?!

 

Cool, so why did I use “embrace Hadoop”?

A key call out during my conversation with Ted during the keynote is that our offering is 100% compatible with Apache Hadoop – if your code works on Apache Hadoop then it will work on ours and vice versa.  But, it’s not just about the code, it’s also about this shift that we are embracing the open source community!

For example, one of the key demos that I have shown is the ability to write Map Reduce jobs in JavaScript (as opposed to Java). This is what I would like to call:

Our VB moment in Big Data

 

That is, we had made Visual Basic a powerful language for developers and with .NET opened the door for these developers to go into the enterprise.  By making JavaScript a first class language for Big Data, we are helping to enable the millions of JavaScript developers to enter the realm of Big Data.  Even more awesome is the JavaScript on Hadoop is an example of one of our proposals back to the Apache Hadoop community.

 

So why is Big Data / Hadoop important for a BI dude or dudette?

I’ll probably have a number of posts to for this question alone, but let me give you one answer right now – this is an excerpt from my post: “Hadoop: A movement, not just a technology”

 

Why am I excited about Hadoop and Big Data even though I’m a Microsoft BI person for most of my career? Because first and foremost, BI is all about making sense of the information. And the greatness of Big Data isn’t just about exploring, understanding, and asking even more questions of this information, but doing it in distribution (vs. silos) and putting more emphasis on the data (i.e. this is where the real IP is)

 

Any other cool information on Big Data at SQLPASS this week?

Both Ted Kummert and David DeWitt’s keynotes will cover Big Data.  If you cannot attend, check out the SQL Server PASS Summit 2011 Live Streaming.  As well, there are two breakout sessions on Big Data, both on Thursday:

 

Also don’t forget that I will be hosting the Big Data table at the Birds of Feather luncheon and a bunch of us will be floating around the Big Data Kiosk in the product pavilion.

Whew! I think that’s it for today!

The Big Bang (19)

Okay kid, this is where it gets complicated.

– Amy Pond from Doctor Who’s “The Big Bang”

The SQL PASS Summit 2011 starts tomorrow and lots of craziness and technical learnings from SQL Karoke to the Birds of Feather luncheon.   Just a friendly reminder that on Thursday, October 13th, Thomas Kejser (@thomaskejser) and I – with special guest Kenneth Lieu from Yahoo! – will be presenting the SQLPASS breakout: SQLCAT: Tier-1 BI in the world of Big Data.

image

If you want to learn more about how to build more complex BI systems utilizing the SQL Server BI stack – including some of the details behind Yahoo!’s 24TB Analysis Services cube (this is correct, I did not do a typo here, I wrote terabytes).  For example, the data source behind this massively large but cool cube is no other than a multi-petabyte Hadoop cluster (again, no typos here).

So loop on by and enjoy the session – we’re in room 602-604 – and/or follow the guys wearing both greenshirts and lab coats!

Enjoy!

Sorry that I haven’t blogged in awhile, but there’s lots of interesting things happening these days all in prep for the SQLPASS summit, eh?!  It’s been long nights and lots of fun chaos!

Saying this, you know that I’m tired when…

  • you’re getting <3h of sleep / night
  • you’re running all of your team meetings via conference calls from your car
  • you walk into a customer executive briefing wearing shorts
  • realized my Mandarin Chinese is so bad that I’m learning new words while watching “Ni Hao, Kai Lan”
  • was actively thinking about joining sales
  • was losing a real debate with my three year old
  • Every time I think about big data, I conjure up proboscidae
  • I’ve been hearing “go in bar” a lot lately … or perhaps that’s my dyslexia

But most importantly, the main reason I know I’m tired is because I have five episodes of “Doctor Who” still in the queue.

541172-vlcsnap_2011_07_28_17h42m18s187

image

 

First things first, but not necessarily in that order!

– words of wisdom from Doctor Who

 

 

 

 

The tweet below got me thinking about the importance of privacy.

Data mining demands sound privacy policies in age of ‘big data’ http://t.co/1N0gZhz #datamining #BI #bigdata

 

After all, in this day of age, with access to all sorts of disparate information, it becomes easier and easier to uniquely identify a person by their behaviors and external patterns.   In Dr. Latanya Sweeney’s ground breaking paper k-Anonymity: a model for protecting privacy, she had noted the following startling observations:

  • Based on the 1990 census, over the 80% of the US population was personally identifiable based on the three attributes of 5-digit zip code, birth date, and gender
  • By combining the state of Massachusetts voter’s list with PII healthcare records, she was able to identify the medical records of then Governor William Weld

 

With the power of Big Data, it is easy to forget that the more we dig and the mine, the more we are potentially invading privacy.   So now more than ever, we need to make use of privacy mechanisms such as k-anonymity or privacy preserving histogram such as episilon noise via Analyzing Data while Protecting Privacy – A Case Study

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 869 other followers