OLAP Distinct Count – how to make it faster

Distinct Count

To get the full performance out of Analysis Services for distinct count, insure that a distinct count measure is in its own cube and you can perform custom aggregations for the dimensions within the cube.  But the most powerful way to improve performance is to use a partitioning strategy.

The idea is to create partitions where the distinct IDs are spread out through all of the partitions (i.e. partition by dimension(s) that have the IDs repeated in most/all partitions); which is slightly counter-intuitive.  At query time, this design forces AS2k to utilize multiple threads to calculate distinct count instead of just one thread; i.e. distributed query calculations in parallel. From real-world scenarios, there was a 4-10 time improvement w/ this design.

Note, you willneed to be careful with your partitioning strategy – you can actually run the risk where you peg the processors if there are too many threads running; hence the importance of multi-proc OLAP machines.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s