From the course: Power BI Data Methods

Connecting to CSV files - Power BI Tutorial

From the course: Power BI Data Methods

Connecting to CSV files

- [Instructor] Text files often use delimiters like commas or tabs to separate the data fields within each record that appear as individual rows in a data file. A comma-separated values file or CSV file for short separates these values within a row using commas. Let's first take a look at the CSV file storing the Santa Barbara weather in 2023. If we open this file in Notepad, we see each record on a separate line and commas between each of the fields, along with the header names in the first row of the file. Text files theoretically store an unlimited amount of data, subject to constraints for the application we use to open it. We can however see that viewing it this way is difficult for humans to interpret. Excel for example can open CSV files but it only displays a million rows or so. Although it makes it a lot easier to see the transformed tabular structure that we ultimately want to get within Power Query. Within Power Query, let's set up a new connection to this data using the text CSV data connector and pointing it to the location of the 2023 weather CSV file. We'll then transform our data. Once we confirm our connection, we see in the source step that the file.content function gets the data from the file path and the csv.document function reads the binary object to create a table object. We can change the CSV configuration details by opening the source step. One common issue when connecting to data with many columns like this is the potential to have more columns added in the future. To mitigate this issue during our data connection step, we can set the columns in the second parameter to null. We can also connect to CSV data online, including those stored in ZIP files like the NOAA weather data. We can access this data through the CDO portal by selecting data sets. Let's expand the Daily Summaries section. We'll then navigate to the linked FTP page. We'll then select the By Station folder. We often see data stored in FTP folders like this for bulk downloads because it makes it easier to work with huge amounts of data that they contain. The developers of this portal also update the data on a daily basis to include the latest data measurements directly in the folder. From our perspective, this means that we can also easily refresh our data model to include the latest weather measurements and gain efficiencies along the way. Since we want to get Santa Barbara weather data from these folders, let's navigate to its station ID. There are a lot of stations in this list so I'm going to search for the last five digits of its station number. So it's going to be 23174. Yeah, so we're going to choose this link. This matches our station ID that we worked with earlier. Once we navigate to the station ID, we'll then right-click to copy the folder location URL. Let's connect to the web URL directly through our eponymous web data connector in Power Query. We see that the file extension has .gz on the end, which is a type of ZIP file using the GZIP compression utility. We cannot typically open it using many built-in native applications on our computers. Once we confirm our web connection, we still don't see anything in our current query space other than a link to this URL file path. If we go into the source step formula, it uses the web.contents function to reference the URL to create a binary object but we can't open it directly. And we see that when we click on it, it doesn't do anything. The reason for this is that we first need to unzip the binary object storing the weather data before we can read the data in it. To do so, we'll use the binary.decompress function around our web contents function to create a decompressed copy of the input. To decompress a binary value, we need to add a second parameter in this function for the compression type as compression.gzip. We can then hit Enter or navigate out of the step to make sure Power Query updates this. We now see a link to our CSV file in the middle of our query. To access its data, we can click directly on it. After the query's last applied step, when we open the CSV file we now see an imported CSV step added to our applied steps list. The data now displays as a data table that we can use in our models. This is a neat trick that I use all the time in Power BI and also Excel to easily unzip compressed files without having to download other applications. We'll save this file as Santa Barbara weather and we'll continue to use this file in other examples of transformation functionalities throughout this course.

Contents