Hi there, my name is Denny Lee and I’ve been a long time data geek (hence the focus on Apache Spark, Delta Lake, Data Engineering, Data Sciences, Big Data, SQL and Analysis Services) with an inkling toward Healthcare.
Denny is a Databricks Developer Advocate. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms, and predictive analytics systems. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.
He has extensive experience in building greenfield teams as well as turn around / change catalyst. Prior to joining Databricks, Denny worked as a Senior Director of Data Sciences Engineering at Concur and was part of the incubation team that built Hadoop on Windows and Azure (currently known as HDInsight).
He is avid in social media (@dennylee) and continuously uses the forum to share deep technical learnings. Examples can be found in content ranging from Analysis Services and Healthcare Informatics (https://dennyglee.com/books-2/), comprehensive technical guides including Analysis Services Performance and Operations Guides (https://dennyglee.com/articles/), and blogging at databricks.com, sqlcat.com, and dennyglee.com. He had helped create the first OLAP Services reporting application in production at Microsoft and is a co-author of
- Learning Spark, 2nd Edition
- Learning PySpark
- Professional Microsoft SQL Server 2012 Analysis Services with MDX and DAX
- Professional Microsoft PowerPivot for Excel and SharePoint
- Professional Microsoft SQL Server Analysis Services 2008 with MDX,
- Professional Microsoft SQL Server 2000 Data Warehousing with Analysis Services and
Specialties: Apache Spark, Delta Lake, MLflow, Data Engineering, Data Science, Business Intelligence, Analysis Services (OLAP), PowerPivot, Big Data, Hadoop for Windows, Hadoop for Azure, Hadoop, Data Warehousing, Data Mining, Web Analytics, Medical Informatics, Healthcare, Smart Grid, Advertising Analytics
So there you have it! Hope this blog is useful to you and provides some great information.
i have got a question. I’m looking forward to use the excel powerpivot november ctp addin, but after a successfull installation I havn’g got a ribbon in excel.
How can I solve this issue? thx in advance
Sorry about the delay in response, you can find more information about this at: http://powerpivotfaq.com/Lists/TGPPF/DispForm.aspx?ID=5&Source=http%3A%2F%2Fpowerpivotfaq%2Ecom%2FLists%2FTGPPF%2FAllItems%2Easpx.
We are using Analysis Services heavily at my company to deliver analytics to users. (www.retailsolutions.com).
We have had problems with the cube synchronization – some times with error of “transport layer”; some times others. I am wondering if you can tell me if there is any sync problem if the cube process server is on SQL 2008 and the query server (facing end users) SQL 2008 R2.
Any directions you can provide is greatly appreciated.
Ahh, technically SQL Server 2008 to SQL Server 2008 R2 is not officially supported. While I can see it work, this is one of those features where backward compatibility isn’t supported, eh?!
Thanks; I was hoping there was something we were missing.
Recently we upgraded to SQL Server 2008 R2, and are seeing some strange behavior with cube synchronzation. The sync times seem to have increased by about 30-40%. Are there any articles about Analysis Services on R2 that you can refer me to? Are some of the best practice that apply to memory management of Analysis Services on 2008 changed for 2008 R2?
Also, do you know of any consultants who work on performance tuning for Analysis Services that you can refer?
That’s a little surprising to see R2 synchronizations run longer than 2008. If you are consistently seeing this, I would open up a case at http://connect.microsoft.com/sqlserver and provide Profiler traces of the two scenarios to showcase why this is happening. As for memory management and configurations, it’s pretty much the same between 2008 and 2008 R2. As for consultants, there are quite a few of them that are out there; but at this time I cannot say as I do not want reveal any bias. Sorry!
I’m currently working on a large project for one of our clients. There is a need for extracting and analyzing various data, and PowerPivot seems to be a good tool for the job. The challenge that I’ve had is to make a PowerPivot solution flexible for deployment. We’ve got several developing, testing and production environments, and our solutions are deployed to these environments as needed.
What we desire is a way to automate the deployment process to the various environments, especially setting which database PowerPivot should use by altering a config file of some form. I know this is somehow possible in plain Excel by using ODC files, but haven’t found a similar solution for PowerPivot.
For V1 of the PowerPivot product, there is no straight-forward way to build an automated deployment solution. The reason for this is because V1 of PowerPivot is about self-service BI – allow your analysts to build reports her or his self against the real data. For now, I would suggest creatung an alias for the database so that you can refer to the same server/database in each environment – or rebuild. Yeah, not the best options, eh!?
While I cannot commit what is being considered for future releases of PowerPivot, we are considering your scenario where IT can easily automate and deploy PowerPivot workbooks on behalf of their customer.
Hope this helps!
i grew up in wedgewood park and we went to the same school (you had a year or 2 on me i believe) and i saw your name while researching a tempdb issue in our DW and i had to see if it was you. lo and behold it was. wow.
anyway, i have a cube build that’s blowing up tempdb(50gb). i’ve isolated the query and there are no bottlenecks per se, just about a half-dozen hash matches(none > 13 %). the result set is about 40 million + and it’s pretty wide. they’re being stingy on bumping up the tempdb, so i’m trying to dial down all the hash matching. i’ve tried some covering indexes and had moderate success. the data model is tiptop dimensional. all surrogate, integer based keys. our DW is about 200gb.
you probably need more information, but i just wanted to say hello really.
talk to ya,
Wow – what a great blast from the past, eh?! I’m a little confused, why is there an extreme amount of hash matching if the data model is all int and the DW is only 200GB? I probably wouldn’t use too many covering index and instead examine the query plan that the cube build is doing to determine exactly where the slow down, eh?!
I got your book Microsoft PowerPivot for Excel and Sharepoint, and I started to read it. I don’t have much experience with this, so I want to know about data warehouses and PowerPivot? I don’t understand that concept, for example my company wants to implement PowerPivot, so do we need to create data warehouse or data mart (SQL 2008 R2) and create tables for example like those from Contoso sample database to put it on that server or we can use already existing Excel sheets and data of our company? Any help to understand this, please
Actually data warehouses are not a requirement for PowerPivot. PowerPivot is really about using Excel as your analysis tool – it allows the power Excel user to perform complex queries against large volumes of data right from their own desktop. Underneath it all, a VertiPaq cube is being created and queryed against – but this is transparent to the user. As for a data warehouse, it would be a good source for your PowerPivot workbook, but it is not necessary. In addition to database, PowerPivot can query text files, reports, Excel workbooks, etc. So using the Contoso sample database is just about providing you a sample system for PowerPivot to query. But it isn’t entirely necessary – atleast from the standpoint of PowerPivot.
I’m fairly new to SSAS and you kind of look like the bigges expert I have seen on the web so I am hoping you can help me. I am trying to determine the best/least manual way to
1. Migrate SSAS content across environments i.e. Dev -> Staging – Production
2. Create a DR environment that mirrors the Prod environment.
Through my research I have seen many different ways to do the migration however I am looking for your recommendation on which one to implement. So far it looks like Syncing servers is the best. If you have any advice and a link to good articles to explain it would be greatly appreciated !!!!!! Thanks
There are a number of ways to synchronize data between environments – check out “Analysis Services Synchronization Best Practices” at: http://sqlcat.com/technicalnotes/archive/2008/03/16/analysis-services-synchronization-best-practices.aspx. This should handle both the migration aspect (presuming they are all using the same data source, otherwise perhaps just deploy the cube XMLA) and the DR environment. HTH!
I posted the following in the MSDN Forums Database Engine section:
One of my replies there references your book “Professional Microsoft® PowerPivot for Excel® and SharePoint®”. The steps in the book say to install SQL 2008 R2 and the Contoso BI sample database. However, my attempt at restoring the .bak file fails with the error described in the forum question.
I assume that you must have been able to make this database work with R2. Can you please tell me how?
As per the thread, I take it that this has been resolved now?
This is resolved, yes. Thank!
– Mark Z.
Hi Danny, I read about Maestro Program, but I didn’t see in the Microsoft Official site any kind of Pre-Requisites. Do you have any kind of information about it? Thx!
Hello Denny, great information in the Report Server Catalog Best Practices article! In the section talking about the SSRS Windows/Web Services and the report server catalog on the same server, how could one distinguish/quantify between what would be considered a “small” environment versus an “enterprise” environment? Thanks
Good call out – in general, the difference between enterprise vs. small is the sheer volume of users (especially concurrent users) trying to use your system. But it’s also how heavy of a workload is applied to your system as well. Once we’re talking about hundreds or thousands of users with queries that are >5min (some going into hours), I think we’re safe in the enterprise category. HTH!
My company is working on anonymizing health data and cross referencing ot with voter data to provide PII compliant online ad targeting. Your work in 2007 was fascinating to me, I’d love to chat if you have the time and inclination.
Sure – sorry for the delay! DM via @dennylee and let’s chat, eh?!
I recently tested connecting spark to tableau with hive 0.13 configuration and surprisingly we found spark is not thread safe. Multiple users are not able to connect tableau and work with it at the same time . For one user it works perfectly fine. Do u have any comments on that or may be any procedure we need to follow to not encounter this problem.