Feeds:
Posts
Comments

Over the last few days, @PowerPivotPro has been calling me out for diving into the deep end into his world of Excel and getting the proverbial egg on my face (Joker picture not-withstanding).  He’s called me out, challenged my geek hood, and sliced / diced my postings (pun intended).  I was going to let this all slide…and then he started using “eh?!” in his postings!!!!  So the gloves are coming off!!! 

… Or are they?

CubeSet and CubeRankedMember cube formulas are great!

For starters, in paying homage to someone who is vastly superior in Excel (I’m sure in other things too, but let’s focus on Excel for now), Rob (@PowerPivotPro) has re-introduced the cube formulas concept in his posting: Introducing CubeSet() and CubeRankedMember().

So how do I respond to this posting? ….

Read it and make sure you download the workbook as well.  It’s a very cool and fast way to create a Top 10 without using pivots.  It does not require MDX and has sort_order and sort_by parameters.  And once you get used to the logic of using these cube formulas, it’s a really nice way to do this.

Wait…didn’t I just say the gloves were coming off?

Cannot use CubeSet() within the context of slicers

For starters, Kasper De Jonge (@kjonge) made the first salvo before I could even start (thanks Kasper!).  Please read his comments directly in the Rob’s posting (as he is much more eloquent than I am) but the key thing here is that you cannot easily filter the data that is used by the CubeSet() function.  If you want to do something straightforward, by all means cube formulas are great but it doesn’t look like you can filter them by slicers. Meanwhile, the technique I showed in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 3 of 3), you will notice that you can filter by slicers! 

image24[1]

Though, I do have to admit, if you do not have to use slicers in this context, i.e. to filter the top 10 users so you can create a report that needs to be filtered by those top 10 users, it would be easier by using CubeSet() and CubeRankedMember() because it is much more straightforward to create a PowerPivot Linked Table against the CubeRankedMember() values then my workaround (Excel Expressions against a PivotTable since you cannot created a linked table against a PivotTable).  But hopefully that will be resolved in the next version of PowerPivot (there is always hope!)

Another interesting point is that it appears that the CubeSet() uses more memory then if I did a NamedSet as I talked about in Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 2 of 3).

By the way, this is where i get all geeky now and stand resolute that my geekiness has indeed NOT been challenged … yes Rob, I’m calling YOU out now :-)

Why do I say this, well I had used a customer workbook (sorry, I cannot share this yet) where there were 543K distinct users and 44M rows in the fact data as noted below.

image

And i wanted to run a quick test to see whether NamedSet or CubeSets used more memory?  And while this was a real quick test, the repeatable outcome was this:

Type Memory Allocated
Named Set ~100MB
CubeSet ~600MB

 

The concern here of course is that once you go to millions of unique users, how much more memory is allocated?  Now of course, not using CubeSet() because of 500MB is hardly a good reason (you probably have plenty of memory anyways).  But I had to add something nice and geeky here and figured looking through memory allocation was geeky enough, eh?!

Pivots, I have always remembered you!

For starters, you have to read Rob’s posting Pivots, I have not forgotten ye! if you’re a “The Princess Bride” fan – In-con-CEIV-able! But after that, there is some great information about pivots and how to do Top 10 within the report.  Even better still, it goes into misconceptions and clarifies how filtering and sorting work.  And if you’re going to be doing with a number of rows that Excel can easily handle (e.g. 18,000+ unique customer emails), then I would absolutely do as Rob suggested since its easier and more straightforward way of doing Top 10.

So why did I go talk about the need of named sets for Top 10 calculations within Excel?  Well, as noted in the posting Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 1 of 3), my concern here was

  • If you have a lot (e.g. millions) of users in your data set, it would take a long time for the native Pivot Table to render this – if at all.
  • You cannot easily apply these Top 10 users by any other criteria or combination of criteria to have better insight on what is going on.

Even Rob had to admit that:

…but when you first add the field to the pivot, before you’ve had a chance to define the top N filter…ok, there’s a place where it’s gonna be slow (and potentially truncated).

In my posting, I was talking about dealing with millions of users – and while that’s great for PowerPivot, if you were to put millions of users into the row label of an Excel pivot, it would be veeeerry slow.

Rob … can you hear me now?!

No, I’m not doing a Verizon commercial here, though if the iPhone ever goes to Verizon I might…but I digress. 

As you can see, even after all of Rob’s call outs and challenge on my geekiness, using named sets allows you to easily filter the top 10 list by slicers, potentially use less memory, and handle millions of rows easily very quickly.  All you need is a little MDX … and as for Rob’s comment on staying away from “MDX Madness” by using cube formulas – geez, it really that much difference going from a cubeset to named set?

CubeSet NamedSet
CUBESET(“Sandbox”, “[Table].[Column].children”, “All Users”, 2, “[Measures].[Events]“) TOPCOUNT(“[Table].[Column].children”, 10, “[Measures].[Events]”)

 

Heck, there’s less writing involved with Named Sets because it’s a set that gets added to your PowerPivot Field list while cube formulas require you to write a formula for each cell!  … admittedly, you can just copy/paste most of it ;-)

Okay, the gloves come back on…almost

But all calling out aside, Rob’s posts are definitely enlightening in that you can use cube formulas like CubeSet and CubeRankedMember.  As well, there’s a lot of misconceptions kyboshed concerning how “Top 10” works within Excel.  My three part posting provide an interesting way to use named sets to solve the problem so you can filter the list by slicers and handle millions of unique users and tens of millions of rows.  But if you’re dealing with a more regular data set, this may be akin to taking a jackhammer to the problem if all you need is a mallet. 

So in the end – choose which method you want by simply choosing the right tool for the job, eh?!

Oh, and as for you Rob – I’ve come back to reclaim my “eh?!”, eh?! :-)

Background

This blog posting is a continuation of previous post, Create a #PowerPivot report filtered by the Top X Users by (Part 1 of 3), where I had created an Excel Named Set based on the Top 10 Users by Events as noted in Figure 1.  What is handy about using named sets is that it will only render the top 10 rows as opposed to the millions of rows of users this data set has.  While it is easy to show the Top 10 users, it is not straightforward to filter by those Top 10 users because Excel Named Sets cannot be used as a filter.  Example reports with  this type of business logic include “most common hospital departmental systems that the top 10 users are accessing” or “what are the most watched TV programs of the top 10 markets (i.e. cities)”.

image

Figure 1: Top 10 Users by Events using an Excel  Named Set

 

In the second blog posting of this series, Create a #PowerPivot report filtered by the Top X Users by (Part 2 of 3), I showed how to filter by this same named set in a static fashion.    In this third part of this blog series, let’s figure out how to filter by a named set in a dynamic fashion.

 

“Filtering” by the Named Set (Dynamic)

So the ask here is how can I “filter” by the named set that I just created.  One potential idea would be to create a linked table using the PivotTable containing the Named Set which you can then join to and filter by.  Alas, for SQL Server 2008 R2 PowerPivot, creating a linked table against an existing PivotTable is not supported.  But while you cannot create a direct link table, you can create an indirect link table.   That is, use Excel expressions to create a new regular table based on your PivotTable and then create the linked table based on the former. Thanks to Bradley Ouellette from the Analysis Services for reminding me of that trick!

 

image

Figure 2: PivotTable created based on the Top 10 Named Set with slicers and regular table (red).

 

As you can see in Figure 2, a PivotTable based on the Top 10 Users by Events Named Set has been created and a number of slicers have been included like year, state, database and department (server_group).  Any time I click on a slicer, a different listing of UserIDs are shown in my PivotTable.

The red oval is a regular table that uses Excel expressions to copy the data from the PivotTable.  The Excel expression for the first row of the UserID table with the value 112103102 is simply the Excel expression:

=D12

This is repeated for all ten rows of the PivotTable.

 

image

Figure 3: PivotTable, Regular Table, and PowerPivot window all noting the same set of UserIDs

 

Now I have a regular table which I can then create as a Linked Table within the PowerPivot window in Figure 3.

 

ReportByTop10Users-AllData

Figure 4: Department / Database Report filtered by Top 10 Users (of all users within the data set)

 

Based on the top 10 users within the entire data set (all states, all years), these ten users managed to create almost 28.6 million events for the Lab department and General database.  But now I want to view the top 10 users for only Washington State.  To do this, I can go back to my PivotTable created based on the Top 10 Named Set and click on the State slicer.

 

Top10Pivot-WAState

Figure 5: Altering the PivotTable created based on the Top 10 Named Set to include only Washington State.

 

Notice how clicking on the WA state code updated both the PivotTable and the linked regular table with a different set of UserIDs. Now that you have the new UserIDs in your Pivot Table, click on the Update All button

 

image

Figure 6: Click on the Update All button to refresh the linked data set within the VertiPaq database.

 

The linked table within the PowerPivot window will now be updated with the new values.

 

image

Figure 7: Linked table within the PowerPivot window updated based on the regular table, based on the PowerPivot table for your Top 10 users named set.

 

Now you can go back to your original Department / Database Report filtered by Top 10 Users (of all users within the data set).  Note you may need to refresh the Top 10 users filter so that way you will only include the ten users.  Note the last checkbox is not checked which represents the users that are not in the top 10 list.

 

ReportByTop10Users-WAState-Refresh

Figure 8: Top 10 Users Named Set within the PivotTable Filter

 

As well, it may be necessary to click on PivotTable Tools > Options > Refresh to ensure any schema changes to the pivot due to the change of the Top 10 are taken into account.

 

Once these steps are done, you will notice the same Department / Database report has been updated for the top 10 users of Washington state.

ReportByTop10Users-WAState

Figure 9: Department / Database Report filtered by Top 10 Users (of Washington State users)

 

Discussion

While not the most straightforward way of doing this by making use of named sets and regular tables linked to a PivotTable, you can dynamically change your report to filter by different set of Top 10 users by clicking on the slicers (and the extra few steps). 

 

Enjoy!

To help troubleshoot your PowerPivot for Excel workbook, you can click on Settings within the PowerPivot selection in the Office Ribbon which will bring up the Support & Diagnostics tab within the PowerPivot Options & Diagnostics dialog.  Click on the checkbox next to “Client tracing is enabled” and after opening the PowerPivot window, you will have a trace recorded on your desktop.

image

Figure 1: Click the checkbox to enable client tracing

 

In order to view this trace, you will use SQL Server Profiler to view it.  Yet, if you do not have a SQL Server 2008 R2 client components installed on your box, you will get the following error:

image

Figure 2: SQL Server Profiler cannot open the trace

 

The error message says:

Unable to location trace definition file Microsoft Analysis Services TraceDefinition 10.50.0.xml for open trace file …

 

In many cases, you will not since your business analysts are using Excel as their primary tool.  If they have SQL Server client components installed, commonly it is the SQL Server 2008 components as opposed to the SQL Server 2008 R2 components.

 

So how can your SQL Server 2008 Profiler read this trace so that way you do not have to uninstall / install the SQL Server 2008 R2 client components?  You have two options:

  1. Connect your SQL Server 2008 Profiler to a SQL Server 2008 R2 Analysis Services server.  The profiler caches these tracedefinition files within the \Program Files (x86)\Microsoft SQL Server\100\Tools\Profiler\Tracedefinitions folder.  So when you connect to a SQL Server 2008 R2 Analysis Services server, the profiler will obtain from the server the trace definition of that server if it does not already have it.
  2. Copy the trace definition file to your profiler directory.  Since the trace definitions are located in the \Program Files (x86)\Microsoft SQL Server\100\Tools\Profiler\TraceDefinitions, just go to this folder onto a server that has R2 running on it.  Copy the Microsoft Analysis Services TraceDefinition 10.50.0 file to your own box.

Thanks to Mike Vovchik from the Analysis Services team for helping me solve the problem as well.

Enjoy!

Background

This blog posting is a continuation of previous post, Create a #PowerPivot report filtered by the Top X Users by (Part 1 of 3), where I had created an Excel Named Set based on the Top 10 Users by Events as noted in Figure 1.  What is handy about using named sets is that it will only render the top 10 rows as opposed to the millions of rows of users this data set has.

image

Figure 1: Top 10 Users by Events using an Excel  Named Set

While it is easy to show the Top 10 users, it is not straightforward to filter by those Top 10 users because Excel Named Sets cannot be used as a filter.  Example reports with  this type of business logic include “most common hospital departmental systems that the top 10 users are accessing” or “what are the most watched TV programs of the top 10 markets (i.e. cities)”.

“Filtering” by the Named Set (Static)

As noted in the figure above and the previous blog, with your named set you have a list of the Top 10 Users by Events.  Now you want to filter your data by those ten users.  A quick way to do this is to simply copy/paste the top ten values from the PivotTable you created to a new PowerPivot Table.  To do this, copy the entire table and then open the PowerPivot window.  Because there is data in a tabular format within the clipboard, the Paste from Clipboard panel will be active and you will be able to click the To New Table button

image

Figure 2: Copy and Paste the PivotTable data into a new PowerPivot table

Once you’ve clicked on the To New Table button, you will be then provided the Paste Preview dialog (right side of Figure 3) so you can create a new table within the PowerPivot Client window.

image

Figure 3: The original PivotTable of Top 10 Users (left) and creating a new PowerPivot table (right)

Don’t forget to create your relationship between the UserIDs within the pasted table.  In this example, this is the Row Labels column within your “Top 10 All Users by Events” table which contains the top 10 UserIDs and the UserID column within your fact table.

image

Figure 4: Creating the relationship between the fact table and the related lookup table.

Note: You probably want to actually rename the “Row Labels” column to UserID so its more easy to understand but I kept the “Row Labels” column here for the purpose of this example.

 

Now that you’ve created the relationship, let’s go back and create the report of interest – in this case this is a department / database report as noted in Figure 5.

image

Figure 5: Department by Database Report

So how do I filter this by my Top 10 users Named Set?  Drag the Row Labels column Top 10 All Users by Events to the Report Filter pane of the PowerPivot Field List.  Open up the filter and note the last checkbox is not checked which represents the users that are not in the top 10 list.

ReportByTop10Users-WAState-Refresh

Figure 6: Top 10 Users Named Set within the PivotTable Filter

Once these steps are done, you will notice the same Department / Database report has been updated for the top 10 users of Washington state.

ReportByTop10Users-WAState

Figure 7: Department / Database Report filtered by Top 10 Users (of Washington State users)

Discussion

So now you have the ability to filter by your Top 10 users for your reports.   This will get you a long way but every time you have a new Top 10 set of users, you’ll need to update the original table and/or create a new table and perform more relationships to do this.

In the next blog posting Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 3 of 3), I’ll go over how to setup the Top 10 named set so it is dynamic in nature so that way you can change the top 10 criteria and with a few button clicks automatically update the report based on that criteria as well.

Enjoy!

Background

I was recently working with a customer to help them solve an interesting PowerPivot problem where they had wanted to get the Top 10 users within their data set and create reports based on that.  One approach is to create a rank function within PowerPivot as noted in a Rank function thread on the SQL Server 2008 R2 PowerPivot for Excel forum?  Another way is to create a Pivot Table where the UserID was placed in the [rows] and the measure in question in the [data] area; and sort the data from largest to smallest.

image

Figure 1: Events by UserID sorted by Events Largest to Smallest

You can even do value filter so you only see the Top 10 by right clicking on the Row Labels icon > Value Filters > Top 10

image

Figure 2: Choose Row Labels Icon > Value Filters > Top 10 to only view the top 10 values

In this hospital scenario example, the business analyst was trying to understand what are the most commonly accessed systems by the top 10 users of the system.  To do this, the analyst simply put the systems category into column labels and voila, she could identify that the ICU and the Lab systems were the most commonly accessed systems.

 

image 

Figure 3: Adding systems category, could see that the top 10 users typically accessed the ICU and the Lab systems

While this is great, there are a number of issues that come to mind.

  • It’s not a very clean looking report
  • If you have a lot (e.g. millions) of users in your data set, it would take a long time for the native Pivot Table to render this – if at all.
  • You cannot easily apply these Top 10 users by any other criteria or combination of criteria to have better insight on what is going on.

 

Solving Using Excel Named Sets

So perhaps there is another way to do this? 

Thanks to Siva Harinath (from the Analysis Services Team) for reminding me about Excel Named Sets which allow you to input your own MDX statement to query your VertiPaq database.  A great blog posting on the topic can be found at PivotTable Named Sets in Excel 2010

To do this, create a simple pivot with the measure only.

image

Figure 4: Simple Pivot with only the measure added to the PivotTable.

Leaving the cursor within the PivotTable Data area, go to the Office Ribbon and choose PivotTable Tools > Options > Field, Items, & Sets

image

Figure 5: Accessing Excel Named Sets

From here click Manage Sets… > New to create a new Named Sets.  Below is the new “Top 10 Users by Events” Named Set that I have created.

image

Figure 6: Creating an Excel Named Set to query my VertiPaq database within my workbook

The red oval call outs is for the MDX statement that I used to create my TopCount statement.  The statement in this example is

TopCount(

     {[vAuditLog_ServerActions].[UserID].[UserID].members},

      10,

      [Measures].[Measures].[Sum of event_count]

)

You will want to modify it for your scenario, you can find more information at TopCount (MDX).

Once you have created this new Named Set, it will show up under the “Sets” in the PowerPivot Field List.  Click on the checkbox for your newly created set and then you will have just to the top 10 Users by Event Count listed in your PivotTable workbook.

image

Figure 7: Top 10 Users by Events using Named Sets

What is great about using Excel Named Sets is that it will only show the top 10 users.  So if you have millions of users, Excel will not be spending its time to render this information.  It will simply show you those Top 10 users and you’re good to go.

But if you look at Named Sets, one of the issues with them is that you can only put them in the Column Labels and Row Labels.  That is, you cannot filter by them so you can look at the most common departmental systems that the top 10 users are accessing.  Another business problem that may be easier to relate to (with the similar business logic to this hospital departmental scenario) is what are the most watched TV programs of the top 10 markets (i.e. cities). 

Discussion

Now, you have the option to filter all the data by these top 10 users.  Since the most recent PivotTable (Figure 7) has the top ten UserIDs, you could create a calculated column using the DAX IF function to exclude all UserIDs that are not listed in the above PivotTable.  e.g.:

if(

      ([Table].[UserID].members = 52159102 || …

       [Table].[UserID].members = 54159102),

      1,

      0

)

This DAX statement would create a new column in your table where 1 indicates the user is in the “Top 10” and 0 – the user is not in the “Top 10”.  This would be a new column within the PivotTable Field list that you could then choose and filter by 1 so that all queries to the pivot table are now filtered by the Top 10 users.

The other options include

  • To query the original source system to get your Top 10 users, import it in as a new table in the PowerPivot window, and then join to that table.
  • Use a CUBESET function to query the Top 10 out of the VertiPaq database directly.
  • Another option is to use VBA to create your named set and then you can filter on the VBA function

The reason I haven’t blogged either of these techniques more in-depth is because in the next blog posting, we will cover how to “filter” by using the Named Set you just created.  In the third posting of this series, we’ll also cover how one can dynamically change the criteria for the Top 10 Users (e.g. also filter by state, year, other columns in your data source, etc.).

The next blog posting is: Create a #PowerPivot report filtered by the Top X Users by <Logic> (Part 2 of 3) which will be available shortly.

HTH!

As many of you already know, in order to figure out which workbooks had succeeded or failed execute in their data refresh, you need only to go the PowerPivot Management Dashboard (Central Administration > General Application Settings > PowerPivot Management Dashboard) in order to see the status of your data refresh as noted in the figure below.

image

Figure 1: PowerPivot Management Dashboard: Data Refresh Web Parts

 

But how about if you wanted to create a SSRS report of the same (or even more) information so that way you can create alerts and/or have an atom feed of this information?  After all, as an IT Pro you’ll have tens or hundreds of these workbooks.  You really want to have an automated process that notifies you of the status of these workbooks instead of constantly reviewing that information.

 

Well, let’s take advantage of SSRS integrated within our SharePoint Farm to produce this report.  Your data source can be either the ITOps_<Guid> Analysis Services database that is the source for your PowerPivot Management Dashboard or it can be the SQL database (PowerPivotServiceApplication_<Guid>) that is the source of that ITOps_<Guid> cube.  In this case, I’ve chosen the SQL database since I have a multi-server SharePoint Farm and I can just query my dedicated SQL Server database to get this data.  To do this, create a Report Builder 3.0 report that uses the SQL database as its data source.  You’ll notice that there are tables with the DataRefresh schema within that database.  A rather simple query that you can modify for your purposes to query for today and yesterday’s Data Refresh run is noted below.

 

select f.FriendlyName, r.Result, f.RunStartTime, f.RunEndTime, f.[Source], f.[Catalog], f.ConnectionString
  from DataRefresh.RunDetails f
    inner join DataRefresh.RunResults r
      on r.ResultKey = f.ResultKey
where RunEndTime >= dateadd(dd, -1, cast(cast(GETDATE() as varchar(11)) as smalldatetime))
order by
       f.FriendlyName

 

With this query as my source, I created a chart and table to go against this dataset.  As you can see from the figure below, now you have a simple SSRS report that provides you with the day’s refresh events in a chart and sort-able tablix.  You can also get the sample RDL at my Skydrive Location Data Refresh RDL (you will need to replace the $<>$ values to your environment)

 

image

Figure 2: PowerPivot Data Refresh SSRS Report

 

You will probably want to modify it so it lists the data by day, or put a date parameter, or … anything!  Nevertheless, now that you have a SSRS report you can subscribe to it and/or export it as a data feed within the PowerPivot Gallery.  Or you can place the report in a regular SSRS instance (after all, it’s just a RDL report off a SQL database, not off one of the PowerPivot workbooks) and you can setup alerts or subscriptions.

 

Enjoy!

When you start playing with the PowerPivot Gallery, some of you will be enamored by the various views of the Silverlight control included as part of the PowerPivot Gallery – I know I was!

image

But if I want to modify or view the properties of the individual report – how do I do this?  Well, it’s all about the different views within the PowerPivot Gallery.

To change the views go to the Ribbon > Library and under the “Current View” you can change the view of your PowerPivot Gallery (the above is the Carousel view).

image

Choose the Public > All Documents one and then you get your friendly SharePoint documents view within the PowerPivot Gallery.  From there, you can click on the individual document and choose the various menu options associated with the individual file.

image

HTH!

I was asked a pretty good question recently on what are the pros and cons of deploying SQL Server Integration Services packages on the same server that has your SQL Engine Server as well.  While we covered a lot in Top 10 SQL Server Integration Services Best Practices, we never really addressed this issue specifically – so let’s do it:

Advantages of placing SSIS and SQL on the same server:

  • You can use the SQL Server Destination instead of OLE DB which allows you to load data into SQL faster.
  • You remove network as a bottleneck for insertion of data by SSIS into SQL
  • If you have these newer beefy boxes which has a lot of memory, CPU, and a solid direct attach storage (and separate volumes with their own set of physical disks for each set of processes) it may make more sense to consolidate.  After all, if SSIS and SQL individually are using less than half of the available resources, you can consolidate AND you can get faster performance.

 

Disadvantages of placing SSIS and SQL on the same server:

  • If you have a number of SSIS packages that need to run in parallel and/or need to use a lot of memory (especially ones that have a lot of complex transforms), you’d have memory allocation conflicts between SQL and SSIS.  SQL will typically win this battle which means SSIS pages to file.  Once SSIS pages to file, you will have suboptimal processing by SSIS – i.e. don’t do it!
  • The same can be said for CPU – you need to ensure there are enough processors allocated for SSIS and SQL separately.  If the two compete, SQL will typically win which will mean SSIS will run much more slowly.

 

I’m sure many other people can suggest other very good advantages and disadvantages.  Saying this, from my experience these are the top advantages / disadvantages for the deployment of SSIS and SQL on the same box.

Enjoy!

Just in case you didn’t already know from the tweets, I figured I’d also give a shout out (again) to the “Great PowerPivot FAQ” and note the fact that it is now accepting applications for moderators. 

To become a moderator – is it a trial by fire?  Does one need to know how to unlock the secrets of chair that nearly killed General O’Neill (Stargate Universe reference)?  Or does someone need to quote Admiral Adama?  (BSG reference) While those are great – the answer is less complicated and more PowerPivot!

So to find out… check out the blog posting at PowerPivotPro.com -> The FAQ is now accepting applications! and if you’re ready to show your PowerPivot (as opposed to your SciFi show chops), please do apply!

Enjoy!

Just in case you’re not already aware of this, some of great contributors to the PowerPivot community are now adding and moderating to the Great PowerPivot FAQ.  Please check it out and email us your questions as well!

Enjoy!

Older Posts »