Ramblings of a data dork: from BI and Big Data to Travel and Food
I recently received a great question concerning the whitepaper Scale-Out Querying with Analysis Services:
Question, does this also apply for SQL Server 2008?
What are you experiences with the scale out? I’m in the process of recommending using this technology to a large customer of mine and want to make sure we are actually able to deploy such a setup.
So let’s tackle these issues one-by-one.
Okay, now a more verbose answer here . The paper is applicable to both SQL Server 2008 and SQL Server 2008 R2 as there were no fundamental architecture changes to the Analysis Services engine. What I would add here is that you may also want to review the technical note Analysis Services Synchronization Best Practices which provides you the main methods to synchronize the AS database between the query and processing servers.
As well, we will (on sqlcat.com) be publishing a new whitepaper on Scale Out Querying with Analysis Services using the Read-Only feature of SQL Server 2008 in the near future. The quick gist of the paper will provide you in detail how to use the Analysis Services Read-Only feature to scale out query Analysis Services using a single copy of the database.
It works! Otherwise, we wouldn’t write it
Melodrama aside, actually the main reason we publish anything on sqlcat.com is that we’ve seen it successfully work on many customer implementations. The key issue that we find with scale out is more about the process and expectations. A little on these:
Because you’re building a scale out environment with separate processing and querying servers, you avoid the issue of long-running queries preventing processing from completing AND you are providing yourself some infrastructure bandwidth to handle query concurrency.
But to do this, you’ll need to build processes so there you can synchronize the cube from the processing server to the query server(s). This requires some time window and some mechanism to switch users to different sets of servers (e.g. set your network load balancer to query only two of the four query servers so that way you can initially synchronize the two dormant servers). This means you will need to work with your infrastructure team to automate these mechanisms and to test that the mechanisms work.
The process is not about the actual synchronize but the processes surrounding them – i.e. systems engineering or infrastructure – to get them to work in an automated fashion. So make sure you work well with your infrastructure team when doing this.
While this mechanism solves many query concurrency issues – it does not solve all queries. As a friend of mine once said:
Poorly written MDX? You’re s****ed buddy!
A little harsh but right to the point here – issues ranging from bad MDX, too many users asking for all the data in the cube, poorly designed cubes, poorly partitioned cubes, bad aggregations, etc. These will always come to play and scaling out will not solve these problems.
So just as long as you understand why you are scaling out – to handle more concurrent users and ensuring querying does not block processing – then you are in great shape.
Hope this helps!