Lab 1 - Accessing and Preparing Data
Lab 1 - Accessing and Preparing Data
• You must download the Power BI Content: Create a folder called DIAD on the C drive of your local
computer. Copy all contents from the folder called Dashboard in a Day Assets to the DIAD folder
you just created (C:\DIAD).
Document Structure
This document and the documents that follow have two main sections:
• Power BI Desktop: This section highlights the features available in Power BI Desktop and walks the
user through the process of bringing in data from the data source, modeling and creating
visualizations.
• Power BI Service: This section highlights the features available in Power BI Service including the
ability to publish the Power BI Desktop model to the web, creating and sharing a dashboard, and
Q & A.
The lab includes steps for the user to follow along together with associated screenshots that provide a
visual aid. In the screenshots, sections are highlighted with red or orange boxes to indicate the area the
user needs to focus on.
Users should use their files from Lab 1 through Lab 5. The solutions provided for each lab are a final
product to reference. The solutions are not meant to be the starting point for each lab.
NOTE: This lab uses real, anonymized data provided by ObviEnce, LLC. Visit their site to learn about their
services: www.obvience.com. This data is property of ObviEnce, LLC and has been shared for the
purpose of demonstrating Power BI functionality with industry sample data. Any use of this data must
include this attribution to ObviEnce, LLC.
• How to load data from Microsoft Excel and Comma-Separated Values (CSV) sources
• How to manipulate the data to prepare it for reporting
• How to prepare the tables in Power Query and load them into the model
Learning these steps will prepare you for the reporting exercises in Lab 2.
Dataset
The dataset you will you use today is a sales and market share analysis. This type of analysis is very
common for a Chief Marketing Officer (CMO). Unlike the Chief Financial Officer (CFO), a CMO is focused
not only on the company’s performance internally (how well do our products sell) but also externally (how
well do we do against competing products).
The company, VanArsdel, manufactures expensive retail products that can be used for fun as well as work.
It sells them directly to consumers nationwide as well as in several other countries.
By the end of the class, you will build a report which will look like the screenshot below. The CMO can use
this report to analyze VanArsdel’s performance.
USA sales data is in a CSV file located in the USSales subfolder within the Data folder (/Data/USSales).
Sales of all other countries is in the InternationalSales subfolder within the Data folder
(/Data/InternationalSales). Each country’s sales data is in a CSV file in this folder.
Product, Geography, and Manufacturer information is in a Microsoft Excel file called bi_dimensions.xlsx in
the USSales subfolder within the Data folder (/Data/USSales/).
1. Open the bi_dimensions.xlsx file. Notice that the first sheet has Product information. This sheet has a
header, and product data is in a named table. Also notice that the Category column numerous empty
cells.
The Manufacturer sheet has data laid out across the sheet, no column headers, several blank rows, and a
note in row seven.
The Geo sheet has the geography information. The first few rows have data details. Actual data starts on
row four.
Let’s set the Locale to US English to make it convenient in the rest of this lab.
6. From the ribbon, click File, then click Options and settings, then click Options.
The next step is to load data to Power BI Desktop. We will load USA Sales data which is in CSV files.
10. From the ribbon, click Home and then click the Get Data drop-down arrow.
11. Click Text/CSV.
12. Browse to DIAD, double-click Data, double-click the USSales folder, and then click sales.csv.
13. Click the Open button.
Power BI detects the data type within each column. There are options to detect the data type based on
the first 200 rows, based on the entire dataset or to not detect the data. Since our dataset is large and it
will take time and resources to scan the complete dataset, we will leave the default option of selecting
the dataset based on the first 200 rows.
After completing your selection, you have three options – Load, Edit or Cancel.
• Load adds the data from the source into Power BI Desktop for you to start creating reports.
• Transform Data allows you to perform data shaping operations such as merging columns, adding
additional columns, changing data types of columns as well as bringing in additional data.
• Cancel gets you back to the main canvas.
14. Click Transform Data as shown in the screenshot. A new window opens.
Note: You will bring in sales data from other countries as well as performing certain data shaping
operations.
15. Notice that Power BI has set the Zip field to the data type Whole Number. To ensure that the leading
zero is not dropped from Zip codes that start with zero, we will format them as Text. To do this, select
the Zip column. Then, from the ribbon, click Home, click Data Type, and change it to Text.
16. The Change Column Type dialog box opens. Click the Replace Current button which overwrites Power
BI’s predicted data type.
17. From the ribbon, click Home, click New Source, and click then Excel.
18. Browse to DIAD, double-click Data, double-click the USSales folder, and then click
bi_dimensions.xlsx.
19. Click the Open button. The Navigator dialog box opens.
Note: Table names are differentiated from Worksheet names by using different icons.
22. From the left panel, click geo. In the preview panel, notice that the first few rows are headers and are
not part of the data. We will remove them shortly.
23. From the left panel, click manufacturer. In the preview panel, notice that the last couple of rows are
footers and are not part of the data. We will remove them shortly.
25. On the Home tab of the Query Editor, click on the New Source drop-down menu.
26. Click More… as shown in the figure.
Note: This approach will load all the files located in the folder. This is useful when you have a group that
puts files on an FTP site each month and you are not always sure of the names of the files or the number
of files. All the files must be of the same file type with columns in the same order.
The dialog box will display the list of files in the folder.
37. Click Combine & Transform Data.
The Combine Files dialog box will open. By default, Power BI will again detect the data type based on the
first 200 rows. Notice there is an option to select various file Delimiters. The file we are working with is
Comma delimited, so let’s leave the Delimiter option as Comma.
There is also an option to select each individual file in the folder (using Example File drop-down) to
validate the format of the files.
You will now be in the Query Editor window with a new query named InternationalSales.
39. If you do not see the Queries pane on left, click on the > (greater than) icon to expand.
40. If you do not see the Query Settings pane on the right as shown in the figure, click on View in the
ribbon and click Query Settings to see the pane.
41. Click on the Query InternationalSales.
42. Highlight the Zip column and change the Data Type to Text.
43. The Change Column Type dialog box will open. Click the Replace Current button.
IMPORTANT!
Changing the data
type is a big deal
to perform later
In the Queries panel, notice that a Transform File from the InternationalSales folder is created. This
contains the function used to load each of the files into the folder.
44. We do not need the Source.Name column. Click the Source.Name column and from the ribbon, click
Home, click Remove Columns, and then click Remove Columns again.
You will see the countries Australia, Canada, Germany, Japan, Mexico, and Nigeria.
• If formula bar is disabled, you can turn on the formula bar from the View ribbon. This enables you to
see the “M” code generated by each click on the ribbons.
• Click the options available on the ribbon, Home, Transform, Add Column, and View, to review the
various features available.
1. Under the Queries panel, minimize the Transform Files from InternationalSales folder.
2. Click each query name in the Other Queries section.
3. Navigate to Query Settings, and then the Properties section to rename the queries as shown below:
Notice how all the null values are filled with the appropriate Category values.
Note: If the delimiter occurs multiple times, the Split at section provides the option to split only once
(either left most or right most) or the option to split the column on each occurrence of the delimiter.
In this scenario, the delimiter occurs only once, therefore the Product column is split into two columns.
13. Click the Product.1 column, and then right-click next to the column name.
14. Click Rename… from the selection menu.
15. Rename the field to Product.
16. Following these steps, also rename Product.2 to Segment.
Notice that all the steps we performed on the Product query are being recorded under APPLIED STEPS in
the right panel.
Notice that after you click enter, Power BI knows you want to split the Price column. The formula it uses is
displayed as well.
27. Double click the column header Text Before Delimiter to rename it.
28. Rename the column to Currency.
29. Click OK to apply the changes.
Now that we have split Price column into the MSRP and Currency columns, we don’t need the Price
column. Let’s remove it.
Notice the first row in the Geography query is now the column header. Let’s make it a header.
With that step, Power BI will predict the data type of each field again.
Notice that the column Zip was changed to the number data type. Let’s change it to text as we did earlier.
If we don’t, we will see errors when we load the data.
38. Click 123 next to the Zip Column. From the dialog box, click Text.
39. Click Replace Current in the Change Column Type dialog box.
40. From the left panel, click the Manufacturer query. Notice the bottom three rows are not part of the
data. Let’s remove them.
41. From the ribbon, click Home, click Remove Rows, and then click Remove Bottom Rows.
42. The Remove Bottom Rows dialog box opens. Enter 3 in the Number of rows text box.
43. Click OK.
44. From the left panel, click the Manufacturer Query. Notice that the ManufacturerID, Manufacturer,
and Logo data is laid across in rows. Also notice that the header is not useful. We need to transpose the
table to meet our needs.
45. From the ribbon click Transform and then click Transpose.
Notice that this transposes the data into columns. Now we need the first row to be the header.
46. From the ribbon click Home and then click Use First Row as Headers.
Notice that now the Manufacturer table is laid out the way we need it with a header and values along
columns.
Also notice that on the right panel under APPLIED STEPS you will see the list of transformations and steps
that have been applied. You can navigate through each change made to the data by clicking on the step.
47. Click Sales in the Queries window in the left panel as shown above.
48. From the ribbon click Home and then click Append Queries.
The Append dialog box opens. There is an option to append Two tables or Three or more tables. Leave
Two tables selected since we are appending just two tables.
49. Click International Sales from the drop-down and then click OK.
You will now see a new column in the Sales table called Country. Since the International Sales query had
the additional column for Country, Power BI Desktop added the column to the Sales table when it loaded
the values from the International Sales query.
You will see null values in the Country column by default for the Sales table rows because that column did
not exist for the table with USA data. We will now add the value “USA” as a data shaping operation.
50. From the ribbon click Add Column and then click Conditional Column.
This reads: if current Country value equals null then the value should be USA otherwise use the current
Country value
59. You will see the CountryName column in the Query editor window.
60. Right-click on the Country column and click Remove as shown in the figure.
When the data is refreshed, it will process through all the “Applied Steps” that you have created.
The newly named Country column will have names for all countries, including the USA. You can validate
this by clicking on the drop-down menu next to the Country column to see the unique values.
64. At first, you will only see USA data. Click Load more to validate you have data from all seven countries.
65. Click OK to close this filter.
Our dataset has data from 2013 to 2019. For our analysis we want to start with the last three years of data
(2017-2019). We don’t yet know how many rows will result. We can filter by year to get the subset.
68. The Filter Rows dialog box opens. Enter 3 in the text box next to is in the previous.
69. Click years from the drop-down menu.
Now that the International Sales data is appended to the Sales query, we don’t need the International
Sales table to load into the data model. Let’s prevent the International Sales table from loading into the
data model.
71. From the Queries panel on the left, click the International Sales query.
72. Right-click and then click Enable Load. This will disable loading International Sales.
Note: The appropriate data from the International Sales table will load into the Sales table each time the
model is refreshed. By removing the International Sales table, we are preventing duplicate data from
loading into the model and increasing its file size. In some instances, storing very large amounts of data
affects the data model performance.
This opens the Query Dependencies dialog box. The dialog box shows the source of each query and its
dependencies. For example, we see that the Sales query has a CSV file source and a dependency on the
International Sales query. This is a useful information to share knowledge with your team members.
You have now successfully completed import and data shaping operations and are ready to load the data
into the Power BI Desktop data model to visualize the data.
75. Click File and then click Close & Apply. This will close out the power query window and apply all
changes
76. Click File and then click Save to save the file after the data loading is complete. Name the file as
“MyFirstPowerBIModel”. Save the file in the DIAD Reports (\DIAD\Reports) folder.
References
Dashboard in a Day introduces you to some of the key functions available in Power BI. In the ribbon of the
Power BI Desktop, the Help section has links to some great resources.
Here are a few more resources that will help you with your next steps with Power BI.
• Getting started: http://powerbi.com
• Power BI Desktop: https://powerbi.microsoft.com/desktop
• Power BI Mobile: https://powerbi.microsoft.com/mobile
• Community site https://community.powerbi.com/
• Power BI Getting started support page:
https://support.powerbi.com/knowledgebase/articles/430814-get-started-with-power-bi
The technology/functionality described in this demo/lab is provided by Microsoft Corporation for purposes of
obtaining your feedback and to provide you with a learning experience. You may only use the demo/lab to
evaluate such technology features and functionality and provide feedback to Microsoft. You may not use it
for any other purpose. You may not modify, copy, distribute, transmit, display, perform, reproduce, publish,
license, create derivative works from, transfer, or sell this demo/lab or any portion thereof.
COPYING OR REPRODUCTION OF THE DEMO/LAB (OR ANY PORTION OF IT) TO ANY OTHER
SERVER OR LOCATION FOR FURTHER REPRODUCTION OR REDISTRIBUTION IS EXPRESSLY
PROHIBITED.
FEEDBACK. If you give feedback about the technology features, functionality and/or concepts described
in this demo/lab to Microsoft, you give to Microsoft, without charge, the right to use, share and
commercialize your feedback in any way and for any purpose. You also give to third parties, without
charge, any patent rights needed for their products, technologies and services to use or interface with any
specific parts of a Microsoft software or service that includes the feedback. You will not give feedback that
is subject to a license that requires Microsoft to license its software or documentation to third parties
because we include your feedback in them. These rights survive this agreement.
DISCLAIMER