Pivots, Cube Formulas, Named Sets … Oh my!

Over the last few days, @PowerPivotPro has been calling me out for diving into the deep end into his world of Excel and getting the proverbial egg on my face (Joker picture not-withstanding).  He’s called me out, challenged my geek hood, and sliced / diced my postings (pun intended).  I was going to let this all slide…and then he started using “eh?!” in his postings!!!!  So the gloves are coming off!!! 

… Or are they?

CubeSet and CubeRankedMember cube formulas are great!

For starters, in paying homage to someone who is vastly superior in Excel (I’m sure in other things too, but let’s focus on Excel for now), Rob (@PowerPivotPro) has re-introduced the cube formulas concept in his posting: Introducing CubeSet() and CubeRankedMember().

So how do I respond to this posting? ….

Read it and make sure you download the workbook as well.  It’s a very cool and fast way to create a Top 10 without using pivots.  It does not require MDX and has sort_order and sort_by parameters.  And once you get used to the logic of using these cube formulas, it’s a really nice way to do this.

Wait…didn’t I just say the gloves were coming off?

Cannot use CubeSet() within the context of slicers

For starters, Kasper De Jonge (@kjonge) made the first salvo before I could even start (thanks Kasper!).  Please read his comments directly in the Rob’s posting (as he is much more eloquent than I am) but the key thing here is that you cannot easily filter the data that is used by the CubeSet() function.  If you want to do something straightforward, by all means cube formulas are great but it doesn’t look like you can filter them by slicers. Meanwhile, the technique I showed in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 3 of 3), you will notice that you can filter by slicers! 

image24[1]

Though, I do have to admit, if you do not have to use slicers in this context, i.e. to filter the top 10 users so you can create a report that needs to be filtered by those top 10 users, it would be easier by using CubeSet() and CubeRankedMember() because it is much more straightforward to create a PowerPivot Linked Table against the CubeRankedMember() values then my workaround (Excel Expressions against a PivotTable since you cannot created a linked table against a PivotTable).  But hopefully that will be resolved in the next version of PowerPivot (there is always hope!)

Another interesting point is that it appears that the CubeSet() uses more memory then if I did a NamedSet as I talked about in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 2 of 3).

By the way, this is where i get all geeky now and stand resolute that my geekiness has indeed NOT been challenged … yes Rob, I’m calling YOU out now 🙂

Why do I say this, well I had used a customer workbook (sorry, I cannot share this yet) where there were 543K distinct users and 44M rows in the fact data as noted below.

image

And i wanted to run a quick test to see whether NamedSet or CubeSets used more memory?  And while this was a real quick test, the repeatable outcome was this:

Type Memory Allocated
Named Set ~100MB
CubeSet ~600MB

 

The concern here of course is that once you go to millions of unique users, how much more memory is allocated?  Now of course, not using CubeSet() because of 500MB is hardly a good reason (you probably have plenty of memory anyways).  But I had to add something nice and geeky here and figured looking through memory allocation was geeky enough, eh?!

Pivots, I have always remembered you!

For starters, you have to read Rob’s posting Pivots, I have not forgotten ye! if you’re a “The Princess Bride” fan – In-con-CEIV-able! But after that, there is some great information about pivots and how to do Top 10 within the report.  Even better still, it goes into misconceptions and clarifies how filtering and sorting work.  And if you’re going to be doing with a number of rows that Excel can easily handle (e.g. 18,000+ unique customer emails), then I would absolutely do as Rob suggested since its easier and more straightforward way of doing Top 10.

So why did I go talk about the need of named sets for Top 10 calculations within Excel?  Well, as noted in the posting Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 1 of 3), my concern here was

  • If you have a lot (e.g. millions) of users in your data set, it would take a long time for the native Pivot Table to render this – if at all.
  • You cannot easily apply these Top 10 users by any other criteria or combination of criteria to have better insight on what is going on.

Even Rob had to admit that:

…but when you first add the field to the pivot, before you’ve had a chance to define the top N filter…ok, there’s a place where it’s gonna be slow (and potentially truncated).

In my posting, I was talking about dealing with millions of users – and while that’s great for PowerPivot, if you were to put millions of users into the row label of an Excel pivot, it would be veeeerry slow.

Rob … can you hear me now?!

No, I’m not doing a Verizon commercial here, though if the iPhone ever goes to Verizon I might…but I digress. 

As you can see, even after all of Rob’s call outs and challenge on my geekiness, using named sets allows you to easily filter the top 10 list by slicers, potentially use less memory, and handle millions of rows easily very quickly.  All you need is a little MDX … and as for Rob’s comment on staying away from “MDX Madness” by using cube formulas – geez, it really that much difference going from a cubeset to named set?

CubeSet NamedSet
CUBESET(“Sandbox”, “[Table].[Column].children”, “All Users”, 2, “[Measures].[Events]”) TOPCOUNT(“[Table].[Column].children”, 10, “[Measures].[Events]”)

 

Heck, there’s less writing involved with Named Sets because it’s a set that gets added to your PowerPivot Field list while cube formulas require you to write a formula for each cell!  … admittedly, you can just copy/paste most of it 😉

Okay, the gloves come back on…almost

But all calling out aside, Rob’s posts are definitely enlightening in that you can use cube formulas like CubeSet and CubeRankedMember.  As well, there’s a lot of misconceptions kyboshed concerning how “Top 10” works within Excel.  My three part posting provide an interesting way to use named sets to solve the problem so you can filter the list by slicers and handle millions of unique users and tens of millions of rows.  But if you’re dealing with a more regular data set, this may be akin to taking a jackhammer to the problem if all you need is a mallet. 

So in the end – choose which method you want by simply choosing the right tool for the job, eh?!

Oh, and as for you Rob – I’ve come back to reclaim my “eh?!”, eh?! 🙂

8 Comments

  1. […] Pivots, Cube Formulas, Named Sets … Oh my! […]

  2. Good article,

    And this is pretty much where I am in my reporting and generating a fully dynamic excel sheet. I ran into one problem and it revolves around the use of the “TOPCOUNT()” MDX.

    In my report I have a User Inputted time frame in the format yyyy-mm-dd. I have set up cells to go into my cube sets and pull back all the relevent measures and summarize them on the report (and even a neat little validation function that allows the users to change the column headers to a generated list of ALL measures availible and auto-replace those values!)

    The problem if I now want to do a top 10 by location. But I want the top ten in the provided timeframe.

    So the user puts in: 2009-01-01 to 2009-01-15

    Using a cubeset() with the time values of {[2009-01-01]:[2009-01-15]} works great! And thats were the summaries work perfectly.

    So I tried applying it to a top 10 cube set…

    =CUBESET(“myCube”,”TopCount([Dim Payto Provider].[Provider Name].Children,10,([Measures].[Paid Amount],{[2009-01-01]:[2009-01-15]}))”,”PTPName”)

    And you get an error!

    =CUBESET(“myCube”,”TopCount([Dim Payto Provider].[Provider Name].Children,10,([Measures].[Paid Amount],[2008]))”,”PTPName”)

    ^Works fine

    The top 10 for a particular Unit of time may not be the same for a time span within it. And that brings me to my problem!

    (The goal is to use the top 10 locations as row headers and string them into a cubevalue function to get my measures)

    Any ideas?

  3. Apologies Derek – slammed due to RTM stuff and finally coming back up to breathe. Were you able to resolve the above issue?

    1. Denny,

      I’m struggling with the exact same problem as Derek.

      Any ideas?

      1. Off hand I do not know if this has been solved – not sure if this is a DAX or Excel engine issue. Would advise that if you can create a repro, please fill out a bug at http://connect.microsoft.com. HTH!

  4. Andrew Cliff

    Denny – I have a curly one for you. Is there a way of using cell references in a spread sheet when using “sets” in formula’s , because as far as I can see when you use { } in a formula ie =CUBEMEMBER(“Connection”,{“[ItemLineClassRefFullName].[ALL].[Council Contribution]”,”[VendorRefFullName].[ALL].[ADMINISTRATION]”,”[Description].[ALL].[Town Planning]”}) any thing in the middle turns to text and the formula does not like if you insert cell references in to the formula. With out the set {} convention it is fine.
    Any ideas would be appreciated
    Andrew Cliff

    1. I’m not sure about this one – best to head over to http://powerpivotpro.com as they are easily the Excel experts. Sorry that I couldn’t be more helpful, eh?!

    2. A year on.. INDIRECT is fantastic for generating cell references in texf form. You can use conditionality in formulas to build strings.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s