This article is an end-to-end instruction on how to build a data pipeline with Snowflake and Azure offerings where data will be consumed by Power BI enabled with SSO.
The above architecture helps you to achieve the following:
1. Secure the traffic to Snowflake by creating a private link(assigning private IPs, the traffic does not traverse through public internet).
2. Load data from Azure Blob storage to Snowflake.
3. Integrate Snowflake Authentication with Azure Active Directory with Single Sign-On.
4. Build Power BI reports with Snowflake data and enable scheduled dataset refresh on Power BI Online Service.
1. An Azure subscription (where you have access to create resources) under your tenant.
2. A Snowflake instance. When you create a new Snowflake account, you can specify which cloud you want it to reside in as instructed here. We want to use Azure as the cloud provider here.
3. A Snowflake community account (to create support tickets). This account has to be the account you used when creating a Snowflake account. Otherwise, your support ticket will not go through and return this error:
“Your request to Snowflake support through the user validation request flow in the community was unsuccessful. We were not able to map your user to the requested account.”
4. An Azure virtual network with your desired IP range. You do not need a big range for just a private link. A /24 should be more than enough unless you will be running many virtual machines in the same virtual network.
5. An Azure storage account deployed in the above virtual network. Allow traffic from all networks in the Firewalls and virtual networks setting. If you need your blob storage to be only accessed by the “Selected Networks”, you need to whitelist Snowflake’s Vnet ID to your blob’s firewall and virtual network setting as an extra step specified in the “Attention” in the Set up your Private link resources in the following article. Some important consideration from Snowflake official documentation when you choose the storage account type:
“Snowflake currently supports loading from blob storage only. Snowflake supports the following types of storage accounts:
Data Lake Storage Gen2 — Supported as a preview feature.
Snowflake does not support Data Lake Storage Gen1.”
6. The latest version of Power BI desktop (verify you can find the Snowflake connector).
7. A Power BI Online workspace.
Yes, that is a lot of preparation. Now take a deep breath because the work only gets more complex.
What is Azure Private Link?
When you google Azure private link, there might be other Azure Private XX popping up: Azure private endpoint, Azure service endpoint.
I will not discuss the difference between them in this article. Here are some good readings that can give you a better idea:
Azure Service Endpoint VS Azure Service Endpoint
Azure Private Link
Azure Private Endpoint
1. Go to the virtual network you prepared before and create an Azure private link for Snowflake following the instruction here.
2. You need an Azure Private DNS Zone attached to the same virtual network that is hosting your private link. Instruction on creating a private DNS Zone and attaching it to a virtual network is here.
3. Make sure you add a record in the private DNS zone to resolve your Snowflake address to the private IP address of the private link. The instruction is here.
4. Then we need Snowflake’s support here. You need to request Snowflake to approve the Snowflake private link in your Azure subscription. Instruction on how to create a support ticket is here. In this support ticket, you need to include your Azure subscription ID and Snowflake account name.
5. It may take Snowflake two business days to respond.
6. Once your private link is approved (instruction on how to check if the private link is approved is here), test if you can access your Snowflake instance by creating a virtual machine in the same virtual network and run nslookup on the DNS name you created a record for in step 3. It is a success if the command returns the private IP address of your Snowflake private link.
Attention: if you want to secure your Snowflake instance by locking the network access down completely to private IPs (no public internet access), you should use Network Policies. However if you do so, your Power BI Online service will have great trouble accessing Snowflake because Power BI Service IP Range is not whitelisted in the network policy. It is not realistic to whitelist the whole IP range especially when Microsoft updates the range weekly.
We will be using Snowflake Copy Into command. Follow the instruction here then here to bulk load data from the Azure Blob storage to Snowflake. You need `ACCOUNTADMIN` role in Snowflake, Blob Storage Data Owner/Contributor of the Azure Blob storage as well as access to create App Registrations in Azure Active Directory to complete this step.
Note: Your Snowflake Copy Into command still reads data from Azure Blob storage via the public internet. But since your Snowflake is with Azure as the cloud provider, this traffic will traverse through Azure backbone.
To further secure the traffic between blob storage and Snowflake, you need to
1. Lock the blob storage down with allowing only “Selected Network” in blob storage Firewall and Virtual Network settings instructed here.
2. Whitelist the Snowflake Vnet subnet ID in Azure Blob storage instructed here.
Both Microsoft and Snowflake have detailed instructions here.
Note: you have the option to replace the identifier URL with the private Snowflake URL you specified in step 3 here. If you do so, your SSO traffic will only work when you log into Snowflake from within the virtual network that hosts your Snowflake private link.
1. Here is the instruction on enabling SSO for Power BI on Snowflake.
2. Here is the instruction on connecting to Snowflake via Power BI Snowflake connector via SSO.
3. Here is the instruction on scheduling the Power BI dataset refresh.
You might run into a known bug when authenticating via SSO to Snowflake from Power BI Online service:
Error : “Cannot load model” “Invalid OAuth access token”
If your SSO works fine with Power BI Desktop but fails with the above error in the Online service, here is the fix:
Add an extra entry with either upper or lower case of “snowflake” to external_oauth_audience_list in step 1 here.
It should end up like this:
external_oauth_audience_list = (‘https://analysis.windows.net/powerbi/connector/Snowflake’,’https://analysis.windows.net/powerbi/connector/snowflake’)
It was a long journey but you followed along!! Now close the half a million tabs you opened in the browser and give some applause for yourself.
Now we have set up a secure (traffic to Snowflake does not traverse through the public internet) data pipeline that reads data from Azure Blob Storage to Snowflake and eventually consumed by Power BI.
I might expand on the content in this article in the following series. Leave a comment to help me prioritise!