How to Calculate Correlations in Python

A good starting point to performing statistical analyses in Python is to learn how to conduct correlation analyses, and the current page provides a review on how to do so. Before reading this page, however, you should be familiar with importing your data to Python. If you are not, go check out my page on importing your data to Python. If you have any questions or comments about anything in Python, feel free to email me at MHoward@SouthAlabama.edu. I am always happy to answer your questions.

Below provides a easy guide on calculating correlations in Python using an example dataset. If you need a dataset, click here to download the example dataset. Be aware, however, that this dataset is in the .xlsx format, and the current guide requires the file to be in .csv format. For this reason, you must convert this file from .xlsx format to .csv format before you can follow along using this dataset. If you do not know how to do this, please visit my page on converting a file to .csv format. While this page was written for R, the portion on converting .xlsx files to .csv files is completed using Excel. After converting the file, you can continue with this guide.

First, you must import your data. If you don’t know how to do this, reference my guide on importing data in Python. For the current example, we are going to label our data as: MyData. You can see my syntax for doing this below.

Once your data is imported, create a new line in your syntax window by pressing enter. Then, type in: print(MyData.corr()) . Afterwards, press enter.

And that is all there is to it! You should receive your correlation matrix.

But, how can we determine whether our correlations are statistically significant? A very easy method is to plus these results into a correlation p-value calculator. My favorite website to find the p-value of a correlation is the following: http://www.socscistatistics.com/pvalues/pearsondistribution.aspx . Just enter your correlation and your sample size, then click Calculate. For the current example, our sample size is 339. So, to calculate the p-value for the correlation between Var1 and Var2, we would enter .062 for the R Score and 339 for the N. We would then press calculate.

From this result, we can see that our p-value is > .05. So, our correlation is not statistically significant. Go ahead and try to calculate the p-values for the other correlations. For more practice and information about this calculator, just visit my page on calculating a correlation in Excel. It’s a very useful tool!

Of course, there are ways that you can get the correlation p-value using Python alone. This requires a little more work than the steps above, and I may include a guide on it in the future. For now, though, you should be able to calculate a correlation and obtain a p-value using Python from the steps above. If you have any questions or comments, just email me at MHoward@SouthAlabama.edu!

Dr. Matt C. Howard

My research interests include (1) statistics and methodologies, (2) health and well-being, (3) personality and individual differences, as well as (4) technology-enhanced training and development.

How to Calculate Correlations in Python

Share this: