Sometimes you want to know the relationship of X and Y when accounting for Z. This is a particularly good situation to apply a partial correlation analyses. Partial correlations are not pre-programmed into Excel’s Data Analysis add-on, but they are very easy to calculate in Python. For this reason, this page is a brief lesson on how to calculate partial correlations in Python. As always, if you have any questions, please email me at MHoward@SouthAlabama.edu!
A partial correlation determines the linear relationship between two variables when accounting for one or more other variables. Typically, researchers and practitioners apply partial correlation analyses when (a) a variable is known to bias a relationship (b) or a certain variable is already known to have an impact, and you want to analyze the relationship of two variables beyond this other known variable. Thus, partial correlations can be used to answer the following questions and similar others:
- What is the relationship of leader ability and job performance when accounting for follower job satisfaction?
- What is the relationship of hour studied and test grades when accounting for prior test performance?
- What is the relationship of employee mood and job performance when accounting for social desirability?
Of course, there is more nuance to partial correlations, but we will keep it simple. To answer these questions, we can use Python to calculate a partial correlation. If you don’t have a dataset, you can download the example dataset here. As you can see, the dataset is in the .xlsx format. To analyze it in Python, we first need to convert it to .csv format. If you do not know how to do this, check out my page on opening .csv files in R. The first few steps cover transforming a .xlsx file to .csv format, and it only requires the use of Excel. So, it should be pretty easy to follow. Once you have converted the file, you can continue with this guide.
To calculate a partial correlation in Python, we are going to use the pingouin and pandas modules. Therefore, we need to first install these Modules. To learn how to download and install modules in Python, please refer to my page on installing modules in Python. Once you have installed the modules, you should enter the following into your IDLE window: import pingouin as pg. Then press enter. You should then enter the following into your IDLE window: import pandas as pd . This will activate the packages.
Then, we need to input our data into Python. You can label your dataset anything that you want, but we are going to use the following label for the current page: MyData . If you do not know how to input your data into Python, reference my page on opening .csv files in Python.
From these preliminary steps, you should have something similar to the following in your syntax window:
Do you? If not, go back and try to figure out where you differed from the instructions above.
Now, we can enter our syntax to calculate the partial correlation. First, we want to type: pg.partial_corr(. This identifies the command that we want to use. Second, we want to type: data=MyData, . This identifies the dataset that we will be using. Third, we want to type: x=’Var1′, . This identifies the first variable that we want to correlate. In the current example, this is Var1. Fourth, we want to type: y=’Var2′, . This identifies the second variable that we want to correlate. In the current example, this is Var2. Fifth, we want to type: covar=’Var3′) . This identifies the variable that we want to control for. In the current example, this is Var3. Together, this syntax will correlate Var1 and Var2 while controlling for Var3. Once you have entered all the syntax, press enter.
Did you get this result? Or something similar? If so, great! The first number is the partial correlation (-.0007), whereas the last number is the p-value (.9899). In this example, our result would not be statistically significant.
But what if we want to control for two variables (e.g. Var3 and Var4)? Fortunately, this is pretty easy. We would start with the same first part of the syntax: pg.partial_corr(data=MyData, x=’Var1′, y=’Var2′, covar= . But, when identifying the variables that we want to control for, we would enter [‘Var3’, ‘Var4’] . We would then close our parenthesis and hit enter.
Our effect size is .051, and our p-value is >.05. Still not statistically significant!
So there you go. Now, you should be able to calculate a partial correlation on your own. You can use the example dataset to practice with other combinations. If you still need help, feel free to email me at MHoward@SouthAlabama.edu with any questions or comments!