Choose hashkey

Category: azure sql data warehouse


Hennie7863 on Sun, 01 Oct 2017 09:02:57


I was wondering about choosing the hashkey. In examples they say choose productkey as hashkey so that the data is stored together for the same productkey for the facts and dimensions. Ok, but what about queries about other dimensions. Then we have not a performance gain anymore?!

So, I don't understand saying that you have the hash on the productkey. What about the other possible queries?

Or should you create multipe tables ? No that seems not ok to me.

So help me understand this...



Allan.Miller on Mon, 02 Oct 2017 05:04:06


Choosing a hashkey is all about getting your data evenly spread across the distributions. This allows for even compute times, giving better performance. If your data is not distributed well for the join, data movement will occur which is a time consuming operation.

Maybe the best way to understand it is to look at what happens to a round robin distributed table. Data movement will likely happen on every join operation so that the data needed to properly compute is aligned on the correct distribution.

It appears that you get the information above, I just wanted to try and make it more clear.

If you need to join on more than one column, it is completely acceptable to use CTAS and distribute on another column. What you decide is up to you. Test, use explain plans and understand your data.