Two-Way ANOVA in Python

From learning about the one-way ANOVA, we know that ANOVA is used to identify the mean difference between more than two groups.  A one-way ANOVA is used when we have one grouping variable and a continuous outcome.  But what should we do if we have two grouping variables?  As you’ve probably guessed, we can conduct a two-way ANOVA.  Because this situation is fairly common, I created the page below to provide a step-by-step guide to calculating a two-way ANOVA in Python.  As always, if you have any questions, please email me a!

As mentioned, an ANOVA is used to identify the mean difference between more than two groups, and a two-way ANOVA is used to identify the mean difference between more than two groups when you have a two grouping variables and a continuous outcome. So, a two-factor ANOVA is used to answer questions that are similar to the following:

  • What is the mean difference of test grades between left- and right-handed students, students in Dr. Howard’s and Dr. Smith’s classes, and the combinations of these groups?
  • What is the mean difference in total output of factories defined by location as well as industry?
  • What is the mean difference in performance for four different training programs, each performed at four different locations, and the combination of training program and location.

Now that we know what a two-way ANOVA is used for, we can now calculate a two-way ANOVA in Python.  To begin, open your data in Python.  If you don’t have a dataset, download the example dataset here. In the example dataset, we are simply comparing the means two different grouping variables, each with three different groups, on a single continuous outcome. You can imagine that the groups and the outcome are anything that you want.

Also, this dataset is in the .xlsx format, and the current guide requires the file to be in .csv format.  For this reason, you must convert this file from .xlsx format to .csv format before you can follow along using this dataset.  If you do not know how to do this, please visit my page on converting a file to .csv format.  While this page was written for R, you can follow the initial steps to convert .xlsx to .csv by using Excel alone. After converting the file, you can continue with this guide.

We are going to be using the pingouin and pandas modules. If you don’t know how to install modules, you can look at my guide for installing Python modules here. Likewise, you need to open your .csv file with Python. If you don’t know how to do so, you can look at my guide for opening .csv files in Python. In the current example, I named my dataset: MyData . Your initial code should look like the following:

Fortunately, conducting a two-way ANOVA is very easy in Python. You first need to specify your command, which can be done by typing: MyData.anova( . You then identify your outcome variable by typing: dv=’Outcome’, . Now that we have two grouping variables, we need to identify them both by typing: between=[‘Grouping1’, ‘Grouping2’]) . Finally, we want to round our output to three decimal places, as Python won’t provide it all otherwise. This can be done by typing: .round(3) . Once you have all of that, press enter.

Did you get these results? Great!  We can see that the F-value for Grouping 1 is 6.634, the F-value for Grouping 2 is 1.609, and the F-value for the interaction term is 1.099.  We can also see that the p-value for Grouping 1 is < .05, whereas the p-values for the other two effects are > .05.  So, only the effect of Grouping 1 is statistically significant, and the groups only differ in regards to the Grouping 1 variable.  Neat!

At this point, there are a few different things we could do to probe the nature of the group differences.  The most effective would likely be to rerun a one-way ANOVA with only the Grouping 1 variable followed by post hoc tests.  If we ran post hoc tests on the two-way ANOVA (e.g. TukeyHSD(fit)), we would get 36 different comparisons!  That would be a lot to analyze individually, and our interaction term was not significant anyways.  So, I would recommend the latter and running a one-way AONVA.

That’s all for two-way ANOVAs.  If you have any questions or comments, please email me at