T-tests are used to identify the mean difference between two groups. But what do you do if you want to compare the mean difference of more than two groups? Well, as you’ve probably guessed, you can perform an ANOVA. Because ANOVA is a commonly-used statistical tool, I created the page below to provide a step-by-step guide to calculating an ANOVA in Python. This page is for a one-way ANOVA, which is when you have a single grouping variable and a continuous outcome. As always, if you have any questions, please email me a MHoward@SouthAlabama.edu!
As mentioned, an ANOVA is used to identify the mean difference between more than two groups, and a one-way ANOVA is used to identify the mean difference between more than two groups when you have a single grouping variable and a continuous outcome. So, a one-way ANOVA is used to answer questions that are similar to the following:
- What is the mean difference of test grades between Dr. Howard’s class, Dr. Smith’s class, and Dr. Kim’s class?
- What is the mean difference in total output of five different factories?
- What is the mean difference in performance of four different training groups?
Now that we know what an one-way ANOVA is used for, we can now calculate an one-way ANOVA in Python. To begin, open your data in Python. If you don’t have a dataset, download the example dataset here. In the example dataset, we are simply comparing the means of three different groups on a single continuous outcome. You can imagine that the groups and the outcome are anything that you want.
Also, this dataset is in the .xlsx format, and the current guide requires the file to be in .csv format. For this reason, you must convert this file from .xlsx format to .csv format before you can follow along using this dataset. If you do not know how to do this, please visit my page on converting a file to .csv format. While this page was written for R, you can follow the initial steps to convert .xlsx to .csv by using Excel alone. After converting the file, you can continue with this guide.
We are going to be using the pingouin and pandas modules. If you don’t know how to install modules, you can look at my guide for installing Python modules here. Likewise, you need to open your .csv file with Python. If you don’t know how to do so, you can look at my guide for opening .csv files in Python. In the current example, I named my dataset: MyData . Your initial code should look like the following:
Fortunately, conducting a one-way ANOVA in Python in extremely easy. You begin by typing in your command, which is: pg.anova( . Then, you identify your outcome variable. This is Outcome for the example dataset, which is identified by typing: dv=’Outcome’, . You next identify your grouping variable, which is identified by typing: between=’Groups’ , You lastly identify your dataset by typing: data=MyData) . Once you have entered all of that, press enter.
Did you get these results? Great! From these results, we can see that our test statistic, the F-value, is 117.4. We can also see that our p-value is extremely small, with a value of 4.8e-14. In non-scientific notation, this is .000000000000048. Likewise, our results are highlighted with ***, which indicates that our p-value is less than .001. So, clearly, our results are statistically significant, and there is a difference among our groups.
From here, you may want to calculate means and post hoc tests. These require different functions than calculating the one-way ANOVA, so I do not presently cover them in this guide. If you have any questions about the ANOVA portion, however, please email me at MHoward@SouthAlabama.edu. I am always happy to help!