Ramblings of a data dork: from BI and Big Data to Travel and Food
I was recently working with a customer to help them solve an interesting PowerPivot problem where they had wanted to get the Top 10 users within their data set and create reports based on that. One approach is to create a rank function within PowerPivot as noted in a Rank function thread on the SQL Server 2008 R2 PowerPivot for Excel forum? Another way is to create a Pivot Table where the UserID was placed in the [rows] and the measure in question in the [data] area; and sort the data from largest to smallest.
Figure 1: Events by UserID sorted by Events Largest to Smallest
You can even do value filter so you only see the Top 10 by right clicking on the Row Labels icon > Value Filters > Top 10
Figure 2: Choose Row Labels Icon > Value Filters > Top 10 to only view the top 10 values
In this hospital scenario example, the business analyst was trying to understand what are the most commonly accessed systems by the top 10 users of the system. To do this, the analyst simply put the systems category into column labels and voila, she could identify that the ICU and the Lab systems were the most commonly accessed systems.
Figure 3: Adding systems category, could see that the top 10 users typically accessed the ICU and the Lab systems
While this is great, there are a number of issues that come to mind.
So perhaps there is another way to do this?
Thanks to Siva Harinath (from the Analysis Services Team) for reminding me about Excel Named Sets which allow you to input your own MDX statement to query your VertiPaq database. A great blog posting on the topic can be found at PivotTable Named Sets in Excel 2010.
To do this, create a simple pivot with the measure only.
Figure 4: Simple Pivot with only the measure added to the PivotTable.
Leaving the cursor within the PivotTable Data area, go to the Office Ribbon and choose PivotTable Tools > Options > Field, Items, & Sets
Figure 5: Accessing Excel Named Sets
From here click Manage Sets… > New to create a new Named Sets. Below is the new “Top 10 Users by Events” Named Set that I have created.
Figure 6: Creating an Excel Named Set to query my VertiPaq database within my workbook
The red oval call outs is for the MDX statement that I used to create my TopCount statement. The statement in this example is
[Measures].[Measures].[Sum of event_count]
You will want to modify it for your scenario, you can find more information at TopCount (MDX).
Once you have created this new Named Set, it will show up under the “Sets” in the PowerPivot Field List. Click on the checkbox for your newly created set and then you will have just to the top 10 Users by Event Count listed in your PivotTable workbook.
Figure 7: Top 10 Users by Events using Named Sets
What is great about using Excel Named Sets is that it will only show the top 10 users. So if you have millions of users, Excel will not be spending its time to render this information. It will simply show you those Top 10 users and you’re good to go.
But if you look at Named Sets, one of the issues with them is that you can only put them in the Column Labels and Row Labels. That is, you cannot filter by them so you can look at the most common departmental systems that the top 10 users are accessing. Another business problem that may be easier to relate to (with the similar business logic to this hospital departmental scenario) is what are the most watched TV programs of the top 10 markets (i.e. cities).
Now, you have the option to filter all the data by these top 10 users. Since the most recent PivotTable (Figure 7) has the top ten UserIDs, you could create a calculated column using the DAX IF function to exclude all UserIDs that are not listed in the above PivotTable. e.g.:
([Table].[UserID].members = 52159102 || …
[Table].[UserID].members = 54159102),
This DAX statement would create a new column in your table where 1 indicates the user is in the “Top 10” and 0 – the user is not in the “Top 10”. This would be a new column within the PivotTable Field list that you could then choose and filter by 1 so that all queries to the pivot table are now filtered by the Top 10 users.
The other options include
The reason I haven’t blogged either of these techniques more in-depth is because in the next blog posting, we will cover how to “filter” by using the Named Set you just created. In the third posting of this series, we’ll also cover how one can dynamically change the criteria for the Top 10 Users (e.g. also filter by state, year, other columns in your data source, etc.).
The next blog posting is: Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 2 of 3) which will be available shortly.