From the course: Python in Excel: Working with pandas DataFrames

Resampling and correlation

- [Instructor] Changing the frequency of time series and looking at their mutual correlation coefficient are important concepts in time series analysis. Let's see how it's done. On the Prices sheet here, I have the data of the prices data frame from the previous video. In other words, it's already a clean dataset. So, let's switch over to the Calculations sheet and turn it back into a data frame with the date as the index here in cell B1. Having a quick look at the data here in the preview shows us that these are daily time series, but if you are a long-term investor, a daily time series may be too noisy for statistical analysis. So, we want to change the frequency of the observations from daily to weekly. This is called resampling. And accordingly, pandas has a method called resample, as we can see here in cell B4. As an argument, you have to provide the desired frequency. In our case, I am choosing weekly on Fridays. For other possible frequency strings, you can check the pandas' docs. Then, you need to chain another method to define how you resample in the cell below here. In my case, I am using last, so if Friday is a holiday, it takes the value of the last business day of that week, which might be Thursday or even earlier in the week. We're storing the result and the variable weekly prices. And looking at the preview here confirms that the index now has a weekly frequency. With weekly prices, we can move on to calculate the weekly log returns here in cell B6 using the formula from an earlier video. So, to get an idea about how much the various stock prices move together over time, we can look at the correlation coefficient of the returns. As a quick recap, the correlation coefficient moves between -1 and +1, where 1 means the strongest correlation and 0 means no correlation at all. In pandas, to calculate a correlation matrix, all I have to do is use the core method here in cell B9 on the data frame with our log returns. Looking at the preview gives us the full correlation matrix where you can read off the correlation of each stock with the other one. I personally love this functionality as the built-in Excel function CORREL only accepts two datasets and can't calculate a full correlation matrix. Now, looking at all these numbers may make you a bit dizzy, so to make things easier, we can visualize this correlation matrix as a heat map. To do so, we will use seaborn here in cell B12. As mentioned in an earlier video, seaborn is built on top of Matplotlib. And it makes statistical plotting a lot easier, including our heat map. Seaborn is already imported with the alias sns in the initialization pane, so we can directly use it in cell B12 here. Looking at the preview, this gives us something, but it isn't really usable just yet. What we want is a literal heat map, so the more red, the hotter or bigger the number. We can fix this by giving it the cool warm color map here in the cell below. Looking at the preview now gives us a red and blue color scheme. But the correlations move between -1 and +1. So, we want to fix the scale to those values by providing the vmin and vmax argument here. So, let's right click on it and select Display over Cells. And as you can see, this is a pretty decent heat map. So, now you can see for example here between Apple and Netflix, there is a lower correlation than in many of the other areas. And you could use that information to pick the right stocks for your portfolio. All right, that was how you can have some fun with correlation matrices.

Contents