Data warehouses are extremely important when we’re talking about reporting systems. We’ll analyze what is doable using SqlDBM to create a DWH data model and then connecting it with Power BI. In this, first, part we’ll go through the most important DWH concepts and present our data model and in the second part, we’ll analyze data stored in DWH using Power BI and SQL queries.
Data warehouse (DWH) is a relational database used to store “compressed” data we need for reporting.
There a are few reasons why we should use DWH instead of running queries that will generate reports on a live database. The most important are performance and user rights.
DWH is populated with new data in these moments we find the best regarding performance. This way we can ensure that the impact on the live database will be minimal. Also, the DWH is populated on a regular basis, e.g. once each day. That way, we’ll split the larger task into a number of smaller ones. DWH is obviously stored separately from the live database so when users will query it, they won’t affect live database performance.
The second advantage is everything related to user rights in databases. The idea of the data warehouse is that data are aggregated so we won’t store individual data. This way users would be able to see only aggregated numbers without knowing e.g. customer details. If we have larger DWH we can also give certain user permissions to use only parts of DWH they need.
Tables in DWH are either dimension, either fact tables. Of course, that tables and their structures will depend directly on our reporting needs.
The fact table is used to store aggregated data. This is the place where we’ll store results of calculations we need, e.g. number of products sold, the total amount charged, the number of customers etc. Basically, we’ll store aggregated all data we want to include in any of our reports. This aggregating will significantly reduce the size of DWH and therefore queries will run much faster.
Dimension tables are those that contain the values we’ll use as categories in our reports. One obvious dimension table, you could expect almost in any DWH, is the time dimension. You could also expect other categories, like locations, products, and departments here. The granulation directly deepens of the granulation we want to have in our reports. If we’ll have weekly reports than the time dimension will contain weeks and we’ll insert data once a week.
DWH Data Model
The idea for our model was to build a marketing or sales DWH. We want to aggregate results of our operations so we can produce our reports faster.
Our model consists of six tables, one fact table, placed in the center of the model and five dimension tables. You’ll notice that all dimension tables have a primary key attribute id, while the primary key in the fact table is composed of all referenced foreign keys from dimension tables. This way, we’ll ensure we’ll have exactly one set of aggregated values for one unique combination.
Now we’ll take a closer look at all six tables separately.
The first dimension table I’ll mention is the time dimension. The reason for that is that you’ll meet this dimension almost in any, if not in all, data warehouses, you’ll see. That is due to the fact that we’ll produce data in live database daily and we definitely need to track changes during the time. The idea is that we’ll store new data once each day, so we’ll add exactly one record in this table. Besides storing new date, I’ll also store date parts and a flag if the date was a holiday or not. The reason for that is to simplify writing quires for end users.
The location dimension is also the one you could meet often in data warehouses you’ll work with. There obviously will be differences in doing business with customers from different geographical locations and we need to track that in our reports. I’ve simplified it here and went with only three attributes. Their combination is also the alternate key in this table. This is needed to avoid adding the same location more than once. We can expect we won’t add locations often, but at the moment new location is inserted in the live database.
If we work with customers we’ll offer them our products and services. These could vary regarding the industry, but in databases, these are just codes (SKU) and names. This list will also get updated only when we add new products/services into our live database.
We also want to know who made sales so we can create reports on the salesperson level too. In some cases, these could be departments and not persons.
The last dimension table in our model is the sales category dimension. We’ll probably have more than one way how we sell our products and services to customers. These could be direct sales, phone sales, via our web page. And yes, we want to have our reports granulated also on that level.
It’s important to include as many dimensions we think we’ll use now or in the future in order to avoid changes later. Of course, if we included dimension we don’t need, we’ll have more rows in the fact table. So, the problem is to find the balance between what we need now and what we think we’ll need later.
The last table in our model is the fact table. This is the place where we’ll store aggregated data. Each record in the fact table shall contain a unique combination of foreign keys (this is also the primary key in the fact table) and rows to store aggregated data.
In our case, we’ll store the number of sold products, total cost we had for these products and total amount we charged to our customers. And all of that will be granulated using all five dimension tables.
Forward Engineer – Generate SQL
After completing the model in sqlDBM, we’ll transfer it to the SQL Server. I’ve installed SQL Server Express locally on my machine.
In sqlDBM, click on the icon in the left lower part of your screen (look at the picture above). It’s called “Forward Engineer – Generate SQL” and it will do the magic, generate all the code directly from your model.
After clicking on the “Generate SQL” button you’ll have the code and you can simply copy-paste it into SQL Server.
The situation on my SQL Server is like the one presented in the picture above. Now we’re ready for the fun part.
Today we’ve seen how to create the DWH data model using SqlDBM for SQL Server. In the next part, we’ll relate our database with Power BI and we’ll see what happens. Stay tuned!