Typically, I tell students that the two primary categories of “basic” statistics is whether they (a) determine the relationship between things or (b) the differences between groups. Sometimes, however, you want to do both. To do this, dummy-coded regression can help out. This page is a brief lesson on how to perform a dummy-coded regression in Excel. As always, if you have any questions, please email me at MHoward@SouthAlabama.edu!
The typical type of regression is a linear regression, which identifies a linear relationship between predictor(s) and an outcome. Believe it or not, a linear regression can also identify the differences between groups pretty well – as long as we know how to code our predictors correctly. This is where dummy coding can come into play, which can be used to answer the following questions and similar others:
- What is the relationship of people’s training groups on their job performance while accounting for their job satisfaction?
- What is the relationship of people’s county of residence on their life satisfaction while accounting for their income?
- What is the relationship of a widget’s manufacturing process on its assessed quality while accounting for the machine operator’s tenure?
Of course, there is more nuance to dummy-coded regression, but we will keep it simple. To answer these questions, we can use Excel to calculate a regression equation. If you don’t have a dataset, you can download the example dataset here. In the dataset, we are investigating the relationships of three training groups and conscientiousness with sales.
The data should look something like this:
If your dataset looks differently, you should try to reformat it to resemble the picture above. The instructions below may be a little confusing if your data looks a little different.
To perform a dummy-coded regression, we first need to create a new variable for the number of groups we have minus one. In this case, we will make a total of two new variables (3 groups – 1 = 2). To do so in Excel, we should first right-click on our outcome column, and then click on Insert. Then do this again.
This should create two new columns. In the first new column, you’ll want each person in Group 2 to have a 1 for their value, and each other person to have a 0. In the second new column, you’ll want each person in Group 3 to have a 1 for their value, and each other person to have a 0. When conducting these analyses, you’ll have Group 1 as the “baseline group” that Groups 2 and 3 are compared against. If you do it correctly, your dataset should look like the picture below:
Once you have the correct dummy codes, we are going to perform a regression as usual. Click on Data Analysis.
Then Regression and OK.
Then, click on the button below to identify your outcome data (your Y-Range).
Highlight your outcome data, including the label. Then click the button shown below.
Now, click the button below to identify your predictor data (your X-Range).
Now, highlight BOTH your predictor variable and its squared values, including their labels. Then click the button shown below.
Lastly, click on the box for Labels and press OK.
We should get results! Yay!
If you need help reading this table, take a look at my Regression in Excel guide. Otherwise, we can clearly see that the unstandardized beta for Dummy Code 1 is 27; the unstandardized beta for Dummy Code 2 is 297, and the unstandardized beta for Dummy Code 3 is 37. Dummy Code 1 is not statistically significant (p > .05), suggesting that there was not a significant difference between Groups 1 and 2. Dummy Code 2 was statistically significant (p < .001), suggesting that there was a significant difference between Groups 1 and 3. The effect of conscientiousness was significant (p < .05), suggesting that this predictor had a significant influence on sales. Lastly, the overall R-Square is .84, which is very high for the social sciences!
Of course, the results provide other information, which may be useful for your certain purposes, but the current guide just covers the basics.
Also, it should be noted that you can perform other types of coding to compare groups via regression, such as effects-coding. So, if you see someone use coding numbers other than 0 and 1, do not assume that they are wrong!
Now you should be able to perform a dummy-coded regression in Excel. As always, if you have any questions or comments, please email me a MHoward@SouthAlabama.edu!