Working with my compatriots Ayad Shammout (@aashammout) and Peter Myers, we had found out that PowerPivot at times can be really insensitive…in terms of case that is. That is by default, the collation for PowerPivot is case-insensitive.
For example, if you import the table below:
Within the PowerPivot window, you’ll notice that the lower case “a” has now become an upper case “A”.
If “a” had come before “A”, then both rows would be “a” instead of “A” 😉
Why are you so in-sensitive?
Actually, it isn’t THAT in-sensitive. When importing, the PowerPivot client picks up the regional settings and sets the database language and engine defaults the collation based on that language (thanks to Deva Kaladipet Muthukumarasamy for clarifying this).
The technical reason behind this had to do with the great compression that you get with PowerPivot’s VertiPaq Column-based store. For example, if you a data source that has the rows:
And your regional settings are set to case insensitive, then you would certainly save more space and compress better if what was stored was just “PowerPivot” (the appropriate branding name) instead of “Powerpivot” which is wrong branding-wise anyways 🙂
Some additional thoughts
One way to work around this is to build a view that would map the different codes so that “a” is “something” and “A” is something else. You could also add modify it slightly such as “a.” so you keep the lower case. The PowerPivot development team is also considering allowing users to change this in the future.
This shouldn’t be a major hassle since there is a workaround and I’ve yet to see a lot of people run into this issue. Saying this, if it is a significant problem – I’d like to hear it in the comments section, eh?!
Thanks!
[…] Continue reading: PowerPivot, you are so insensitive! (case that is) […]
I just ran into this problem, and it took me completely by surprise!
I am importing some external data feeds, that use strings of random characters as IDs and I have two tables related by that ID. Suddenly, I found an entry where by some weird chance there are two IDs that only differ by a case of one letter, and everything blew up. That is, the data can not be imported anymore, because the table where the ID column is treated as a PK of the relationship complains that there are duplicate entries.
In my opinion this IS a major hassle, I’m surprised that this was the only post I could find abbout this issue on Google.
I didn’t quite understand, when you say that “When importing, the PowerPivot client picks up the regional settings and sets the database language and engine defaults the collation based on that language”, does that mean that there is a way to influence this behaviour, so that PowerPivot becomes case-sensitive?
Thanks for having this post, I think I would have spent a good few more hours banging my head on the desk otherwise!
Glad that this was helpful! You should be able to modify the settings within Excel and/or the OS itself to influence which collation it works with. But in terms of just the case-sensitivity, I haven’t found a particular way to do it yet. I did try all sorts of different collations – but admittedly, not all of them – but no luck in that. Hence why the blog posting that this is something that needs to be looked out for, eh?!
Denny, it seems that we have at least one case where case insensitive is a real issue:
http://social.msdn.microsoft.com/Forums/en-US/sqlkjpowerpivotforexcel/thread/a45f4f8c-4621-41d1-adcd-87694ce27a77
Any thoughts?
Actually, I’ve seen a few more cases too hence why I decided to blog about it. Overall, I’d love to see this be a little more user-collation friendly BUT in its current mode, it is much more efficient with compressions (i.e. memory) which is extremely important too! As Deva noted, it should be able to take on the collation of the language you’re working on but for now there isn’t any straight forward way to change the default collations, eh?!
Is there any new information about this problem? I ran into the same situation an imported IDcolumn with weird combination of upper and lowercase letters is no more usable in powerpivot, because the table now has duplicate entries.
I hope very much that there will be a solution or workaround for this problem!
Not yet – we are tracking a possible solution right now but I’m afraid there are no details I can provide just yet.
Hi Denny, Thanks for the post. I wanted to get your thoughts on this. I created a UDF that converts a binary value to a hex string since I think PowerPivot does not natively supports varbinary as a column type. When building the PowerPivot table I used a SQL query that first CASTs the case-sensitive string column to varbinary and then passes the binary value to the UDF. It appears that this yields a unique string column that can be used as the key.
select CaseSensitiveId
, dbo.udf_convertBinaryToHexString(CAST(CaseSensitiveId as varbinary)) AS PowerPivotId
FROM TableWithCaseSensitiveId
DAX measures built using the PowerPivotId column seem to work as required like distinct counts and the column maintains its uniqueness if case-sensitive values are added to the source database. Of course PowerPivot still aggregates over the case-sensitive column in a case-insensitive manner.
Just thought I’d throw this idea out as a way of possibly keeping loads from blowing up and get your thoughts. You see any issues with this approach? Am I missing something?
Not sure – it sounds like it makes sense though. Why not test it out and post back on the results? Or write a new blog post if it works? 🙂
Hi Denny, I did some additional testing and have a short write-up if interested, I learned that SQL 2008 and above have built-in binary to Hex string conversion using CONVERT, so the PowerPivot Table Import SQL does not require a UDF and becomes:
select CaseSensitiveKey
, CONVERT(varchar(MAX)
,CAST(CaseSensitiveKey as varbinary)
,1
) AS PowerPivotKey
FROM TableWithCaseSensitiveKey
Table relationships built on the PowerPivotKey appear to be safe from duplicate key errors when case-sensitive unique keys are added to the source database. This is really just another version of the workaround you suggested but can be done on the client without creating a view on the server.
Yes, that would be very intresting – please do send me the write up and perhaps I can link back to your post as well, eh?!