Categories
PowerBI Snowflake Uncategorized

Snowflake and Power BI

In the article I will try to give you a small introduction in how to create tables and views in SNOWFLAKE.

Snowflake is basically a Data Warehouse where you can have many databases.

For this article let’s assume that you have already a created user, and also you user has been added to one or many specific roles that have grant permissions of creating tables. When you enter the first time in you may have the following view:

We are going to click on the option “Classic Console” that is on the left pane of the page, as you can see above.

You’ll be redirected to sign into Snowflake again, that is normal, after you’ve signed the classic console view will have 7 tabs.

For this article only Databases and Worksheets will be described. The “Databases” section shows all the databases you have permissions to see, “Worksheets” section is where the queries can be executed, in this link of Snowflake Documentation there is a very detailed explanation of how it works.

Is SQL Supported in Snowflake?

Snowflake supports a common standardized version of SQL: ANSI. This means that most of the common operations needed in warehousing operations, like create, update, insert, select are supported.

CREATING TABLES

Here’s an example of creating a table:

Before you run the query, make sure you’ve chosen the right Data warehouse, database and schema where the table is going to be:

Then you can write your SQL statement:

Now that our table exists, we can load data from a csv file :

1- Go to the Databases section

2- Choose The database of the table:

3- Choose the table and click on “Load Table”

4- Follow the steps of the pop-up window:

  • Its important to set the property of the “File Format” step: “Header lines to skip” to 1 in order to avoid loading the columns headers of the csv.

5- Check that your data has been uploaded.

CREATING VIEWS

Let’s create a simple view in snowflake.

1- As same as tables, make sure to select the correct, data warehouse, database and schema

2- Insert the SQL statement in the worksheet console:

3- Refresh and check the table has been created:

4- Preview data to check the view is the desired one.

SNOFLAKE AND POWER BI

Now that we have our table and view let’s connect our source to Power BI.

1- Go to “Get Data” and then search for Snowflake source:

2- Need to know the server url and the name of the warehouse, also for advanced options you can insert a SQL statement as a part of the connection to the source, in our case we won’t use that.

3- We must enter the credentials and once the identity is verified we will have access to choose the table or view we need.

4- You can choose “Load” or “Transform Data” according to your convenience.

5- Finally you can use the data from Snowflake in Power BI

“So… this has been a very quick and practical guide to start familiarizing with Snowflake, hope this article can be useful.”

Categories
Azure Synapse Analytics Dedicated SQL Pool (Data Warehouse) SQL Database

Azure Link For SQL

Azure Link For SQL enables near real time analytics over operational data in an external SQL Database.

Azure Link can now be used with SQL Server 2022 and Azure Database, and it works by replicating the specified tables into a Synapse SQL Pool automatically, in an incremental way. This allows to avoid creating a (possibly) complex ETL process to keep the Datawarehouse in sync.

This integration can be done with a SQL Server and with an Azure SQL Database. For SQL Server a Master key needs to be created both in the Datawarehouse and the SQL Server, and it is required to use a Self-Hosted integration runtime hosted in a machine with access to the server.

For Azure SQL Database the process is much simpler, but there’s a pricing limitation as Synapse Link can only be used in Databases with at least Standard tier (so no Basic tier) and 100 DTUs. The cost of this is 147.19 USD every month, so this limitation is something to keep in mind:

Minimum specification to be able to use Synapse link. Lowering the Data max size doesn’t have an impact on the cost.

Creating a Synapse Link for Azure SQL Database

In this case I’m going to do this example using the “Azure SQL Database” path as it’s simpler. So the first step is to create a Linked Service connecting to the Azure SQL Database as usual (in this case I’ve used System Managed Identity for authentication). As stated previously, the Database needs to be at least Standard Tier and have 100 DTUs and Managed Identity enabled.

After that we can create the Link:

During the process you can select in which SQL Pool do you want to copy, and which tables are going to replicated. It is important to keep in mind that only tables with a primary key can be used.

With the link connection created, you have the chance to further customize the process, renaming the tables, changing the distribution type and the structure type (which need to be set to ‘Heap’ whenever your table contains nvarchar(max) columns), and deciding whether you want to drop already existing tables or not. Once everything is set up the link connection can be published and started. After this, no further steps needs to be taken in Synapse. You can stop the link at any point and monitor the connection:

After this, every insertion, update or deletion in the source will be reflected in the SQL Pool in less than a minute.

Categories
Dedicated SQL Pool (Data Warehouse) PowerBI

Data Transformation at Different Stages in PowerBI 

There are numerous ways to apply data transformations when creating a dashboard. One of the most important aspects is when to apply these transformations. This decision may impact not only the performance but also the complexity of the dashboard.  

Let us review 4 different options and analyze which one is better under certain conditions. 

Option 1Option 2Option 3Option 4
LanguageDAX (used in PBI presentation layer) M Language (used in PBI data layer) SQL (query database from PBI) SQL (View in database) 
Transformation needs user input YesNoNoNo
ComplexityLow to high LowHighHigh
ReusabilityNoNoNoYes
SampleAverage delivery time in user selected period Time to deliver a product. (Delivery date – order date) Time to deliver a product Time to deliver a product 

Option 1

This is the most frequent option. Transformations are applied at PBI presentation layer by using DAX in measures or calculated columns. These kinds of transformations are only recommended when they depend on some user input. Otherwise, since in previous steps the user input is unknown, every possible choice would need to be considered.  

Note that these kinds of transformations are recalculated every time the dashboard UI changes and therefore may negatively influence the report performance. 

For example, this approach would fit the need to calculate an average value according to a time period selected by the user. 

On the other hand, this option is not recommended for transformations that do not depend on the user context. For example, in a fact table about product deliveries the delivery time could be calculated by subtracting the order date minus the delivery date. 

Option 2

Transformations are applied at the PBI data layer by using the M Language in the Power Query editor. This is a very powerful tool since it has a wide range of transformation tools available with little to no code. This option is ideal when the given transformation does not involve user interaction and neither of the two following options are available. 

Note that these transformations are recalculated by the Power Query Engine every time the report’s dataset is refreshed. 

For example, referring to the example of bad use of the first option, the column “DeliveryTime” can be added with a Power Query step. This is a better approach than using the first option but still could be better with the next two options. 

Option 3

Transformations are applied by using a custom SQL query within PowerBI when the data source is being imported into the report. This option is ideal when the given transformation does not involve user interaction and we cannot create views in the database.  

Note that these queries are executed by the database engine itself every time the report’s dataset is refreshed. 

For example, to achieve the same result as in the previous example, the following import configuration could be used. 

Option 4

Transformations are made directly in the data source by creating a table view that contains the data with the desired transformations. This is the best option as long as the transformations do not involve user interaction and we have enough rights in the database. Also, once the view is created, it can be reused in multiple dashboards. 

Note that these queries are also executed by the database engine itself every time the view is accessed. 

For example, to achieve the same result as in the previous example, a “delivery_time” view should be created with the corresponding grouping. 

Conclusion

There is a key aspect to have in mind when deciding how or when to apply data transformations, the user interaction. If the transformations depend on any user interaction, then it must be performed as in the first option. Otherwise, it must be performed as in the other three options.  

Among these other options the fourth is the most preferable, but sometimes we do not have the required rights to create views in the database and therefore option three becomes useful. If option three is not available due to the type of data source, then the second option is the way to go.