Many student’s first exposure to inferential statistics is the correlation. Because these students are getting used to statistics in general, correlations can be hard to understand. This page is a brief lesson on how to calculate a set of correlations in Excel. As always, if you have any questions, please email me at MHoward@SouthAlabama.edu!
Correlations identify a linear relationship between two variables. In other words, correlations can tell you the relatedness of two things, and correlations can be used to answer the following questions and similar others:
- What is the relationship between job satisfaction and job performance?
- What is the relationship between hours studied and test grades?
- What is the relationship between size of university and number of basketball team wins?
Of course there is more nuance to correlations, but we will keep it simple for this post. Anyways, to answer these types of questions, we can use Excel to calculate correlation coefficients. If you don’t have a dataset, you can download the example dataset here. In the dataset, we are investigating the relationships job satisfaction, self-efficacy, and job performance.
The data should look something like this:
If your dataset looks differently, you should try to reformat it to resemble the picture above. The instructions below may be a little confusing if your data looks a little different.
Once you have the data open, the first step is to click on the Data tab at the top. Then click on Data Analysis, as seen below:
If it worked, the following window should have appeared. You’ll want to click on Correlation, and then press OK.
The following window should pop up. On this window, you need to first click on the icon to identify your Input Range. Then, you need to highlight (click and drag) your data and labels. Then press the icon again. This will identify your relevant data.
Now, click on Labels in First Row and then click on OK.
Now we get results! When reading the table below, pay close attention to where the column meets the row. For instance, column B is job satisfaction, and row 3 is self-efficacy. The number where this row and column intersect is .16. This means that there is a .16 correlation between job satisfaction and self-efficacy. Neat!
So, from these results, we can determine that…
- Job satisfaction and self-efficacy have a correlation of .16.
- Job satisfaction and job performance have a correlation of .51.
- Self-efficacy and job performance have a correlation of .21.
But, how can we tell whether these are statically significant? Sadly (and stupidly), Excel does not provide p-values, so we need to go elsewhere to do this. My favorite website to find the p-value of a correlation is the following: http://www.socscistatistics.com/pvalues/pearsondistribution.aspx . Just enter your correlation and your sample size, then click Calculate. For the relationship of job satisfaction and self-efficacy, we would enter .16 for the correlation and 29 for the sample size, as seen below.
From this, we can see that our p-value is about .41. This is much greater than .05, so we would say that the relation of job satisfaction and self-efficacy is not statistically significant. If we were to do the others, the p-value for job satisfaction and job performance is about .004, whereas the p-value for self-efficacy and job performance is about .27. This means that the former is statistically significant, whereas the latter is not. Neat!
Now you should be able to perform a correlation in Excel. As always, if you have any questions or comments, please email me a MHoward@SouthAlabama.edu!