Ramblings of a data dork: from BI and Big Data to Travel and Food
Over the last few days, @PowerPivotPro has been calling me out for diving into the deep end into his world of Excel and getting the proverbial egg on my face (Joker picture not-withstanding). He’s called me out, challenged my geek hood, and sliced / diced my postings (pun intended). I was going to let this all slide…and then he started using “eh?!” in his postings!!!! So the gloves are coming off!!!
… Or are they?
For starters, in paying homage to someone who is vastly superior in Excel (I’m sure in other things too, but let’s focus on Excel for now), Rob (@PowerPivotPro) has re-introduced the cube formulas concept in his posting: Introducing CubeSet() and CubeRankedMember().
So how do I respond to this posting? ….
Read it and make sure you download the workbook as well. It’s a very cool and fast way to create a Top 10 without using pivots. It does not require MDX and has sort_order and sort_by parameters. And once you get used to the logic of using these cube formulas, it’s a really nice way to do this.
Wait…didn’t I just say the gloves were coming off?
For starters, Kasper De Jonge (@kjonge) made the first salvo before I could even start (thanks Kasper!). Please read his comments directly in the Rob’s posting (as he is much more eloquent than I am) but the key thing here is that you cannot easily filter the data that is used by the CubeSet() function. If you want to do something straightforward, by all means cube formulas are great but it doesn’t look like you can filter them by slicers. Meanwhile, the technique I showed in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 3 of 3), you will notice that you can filter by slicers!
Though, I do have to admit, if you do not have to use slicers in this context, i.e. to filter the top 10 users so you can create a report that needs to be filtered by those top 10 users, it would be easier by using CubeSet() and CubeRankedMember() because it is much more straightforward to create a PowerPivot Linked Table against the CubeRankedMember() values then my workaround (Excel Expressions against a PivotTable since you cannot created a linked table against a PivotTable). But hopefully that will be resolved in the next version of PowerPivot (there is always hope!)
Another interesting point is that it appears that the CubeSet() uses more memory then if I did a NamedSet as I talked about in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 2 of 3).
By the way, this is where i get all geeky now and stand resolute that my geekiness has indeed NOT been challenged … yes Rob, I’m calling YOU out now
Why do I say this, well I had used a customer workbook (sorry, I cannot share this yet) where there were 543K distinct users and 44M rows in the fact data as noted below.
And i wanted to run a quick test to see whether NamedSet or CubeSets used more memory? And while this was a real quick test, the repeatable outcome was this:
The concern here of course is that once you go to millions of unique users, how much more memory is allocated? Now of course, not using CubeSet() because of 500MB is hardly a good reason (you probably have plenty of memory anyways). But I had to add something nice and geeky here and figured looking through memory allocation was geeky enough, eh?!
For starters, you have to read Rob’s posting Pivots, I have not forgotten ye! if you’re a “The Princess Bride” fan – In-con-CEIV-able! But after that, there is some great information about pivots and how to do Top 10 within the report. Even better still, it goes into misconceptions and clarifies how filtering and sorting work. And if you’re going to be doing with a number of rows that Excel can easily handle (e.g. 18,000+ unique customer emails), then I would absolutely do as Rob suggested since its easier and more straightforward way of doing Top 10.
So why did I go talk about the need of named sets for Top 10 calculations within Excel? Well, as noted in the posting Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 1 of 3), my concern here was
Even Rob had to admit that:
…but when you first add the field to the pivot, before you’ve had a chance to define the top N filter…ok, there’s a place where it’s gonna be slow (and potentially truncated).
In my posting, I was talking about dealing with millions of users – and while that’s great for PowerPivot, if you were to put millions of users into the row label of an Excel pivot, it would be veeeerry slow.
No, I’m not doing a Verizon commercial here, though if the iPhone ever goes to Verizon I might…but I digress.
As you can see, even after all of Rob’s call outs and challenge on my geekiness, using named sets allows you to easily filter the top 10 list by slicers, potentially use less memory, and handle millions of rows easily very quickly. All you need is a little MDX … and as for Rob’s comment on staying away from “MDX Madness” by using cube formulas – geez, it really that much difference going from a cubeset to named set?
|CUBESET(“Sandbox”, “[Table].[Column].children”, “All Users”, 2, “[Measures].[Events]“)||TOPCOUNT(“[Table].[Column].children”, 10, “[Measures].[Events]”)|
Heck, there’s less writing involved with Named Sets because it’s a set that gets added to your PowerPivot Field list while cube formulas require you to write a formula for each cell! … admittedly, you can just copy/paste most of it
But all calling out aside, Rob’s posts are definitely enlightening in that you can use cube formulas like CubeSet and CubeRankedMember. As well, there’s a lot of misconceptions kyboshed concerning how “Top 10” works within Excel. My three part posting provide an interesting way to use named sets to solve the problem so you can filter the list by slicers and handle millions of unique users and tens of millions of rows. But if you’re dealing with a more regular data set, this may be akin to taking a jackhammer to the problem if all you need is a mallet.
So in the end – choose which method you want by simply choosing the right tool for the job, eh?!
Oh, and as for you Rob – I’ve come back to reclaim my “eh?!”, eh?!