Whether working in industry or academia, people often need to make scales. At first, this may seem like an easy task. I mean, how hard is it to write some words on an
online form and send it to
respondents, right? Well,
creating a scale can be a long and difficult task. For this
reason, I have several Statistics Help pages on scale development.
Recently, however, I had someone request a page on item writing. Generally, my Statistics Help pages focus on the mathematics behind certain statistical procedures, but I had completely neglected the most important part of survey development – the item writing process. For this reason, I include several steps below which may help practitioners and academics during the item writing process.
Also, as a note of clarification, below, the term “item” refers to an individual question or statement that respondents are meant to answer or reply. The term “scale” refers to a collection of items intended to measure the same construct, and the items almost always have the same response format. Alternatively, a “survey” may be a collection of scales or assorted items without correct/incorrect answers, and the response format of the scales and/or items may or may not
differ. Lastly, a “test” is a collection of items with correct/incorrect answers.
With that out of the way, I include some tips for the item writing process below.
Define Your Construct!
When starting out, most people have a general idea about what they want to measure – but how can you create a scale if you don’t know exactly what you want to measure? For example, one of my research interests is courage. If I started making a scale on courage, I might would include items like, “I do risky things without even thinking about it.” Without a definition, this might seem like an apt item to measure courage, as courage probably involves performing risky acts; however, if I were to apply Rate’s (et al., 2007; 2010) definition of courage, “(a) A willful, intentional act, (b) executed after mindful deliberation, (c) involving objective substantial risk to the actor, (d) primarily motivated to bring about a
noble good or worthy end” (p. 95), then I would notice that my item does not
satisfy this definition – at all. When using this definition, I would create items like, “I go above and beyond for others, even if it means making sacrifices.” For this reason, creating a definition before making items is extremely important when making scales, and you should always do so.
Also, the definition for your construct should be based upon previous research as much as possible. In the courage example, the definition is completely taken from previous research. While your construct may not be already defined, you should certainly find research that strongly supports your definition.
With the construct defined, we can move to the next phase – determine your item format.
Determining Your Item Format
Now that you know exactly what you want to measure, you should decide exactly how to measure it. The first decision is whether you want to receive quantitative or qualitative responses. That is, do you want your respondents to reply with numbers (i.e. on a scale from 1 to 10) or words? Most often, quantitative response formats are used, but qualitative formats can be extremely helpful. To
determine which response format that you need, ask yourself what you want to learn from your scale. In general, if you want fairly basic results, such as how much courage someone possess, you should use quantitative response formats. If you want more complex results, such as how someone defines courage, then you should use qualitative response formats. This is not a decisive rule, however, and you may consider writing items in both formats to determine which is more apt for your purposes.
If you’ve decided to use qualitative items, then you can move on to item creation. If you’ve decided to use quantitative items, you need to make a few more decisions about your answer format. When a person responds to your items, what do you want the numbers to mean? There are many options to choose from.
First, you need to determine what scale of measurement you want to use (not to be confused with a scale, which is a collection of items). There are four scales of measurement: nominal, ordinal, interval, and ratio. We won’t go into their differences, but (in addition to other things) nominal scales refer to categories (i.e. write 1 for male and 2 for female), ordinal/interval scales refer to responses on a rage (i.e. on a scale from 1 to 7…), and ratio scales refer to count variables (i.e. age, number of times). If you are using nominal or ratio, your response format is pretty much decided for you. If you are using ordinal/interval, then you have a few more choices to make.
Second, if using ordinal/interval, you need to determine whether your answers will be levels of agreement, frequency, or something else altogether. Here is a wonderful PDF which includes many different types of quantitative response formats: https://www.clemson.edu/centers-institutes/tourism/documents/sample-scales.pdf . The response format should be closely related to your construct
Third, you need to decide how many response options to give the respondents. Should they answer on a 1 to 5 format? 1 to 6? 1 to 9? From my experiences, most researchers and practitioners use a 1 to 7 format, and some empirical research has demonstrated that very little change occurs once you use any format larger than from 1 to 9.
Fourth, it is probably more important to consider whether you want a midpoint (i.e. even or odd number of answers). Often, midpoints represent a neutral option between the two endpoints. For some constructs and response formats, however, this may mean something entirely different. For example, I once did a study on college student alcohol use and driving behaviors. I asked students whether they Agreed/Disagreed that they could drive in their current state (after drinking). If a student responded with a midpoint answer (i.e. 4 – neither agree or disagree), that could mean that they had no idea about their intoxication level, rather than being somewhat agreeable and disagreeable towards their perceived ability to drive. So, decide whether a midpoint for your construct would actually represent something between agree and disagree, or whether it would mean something else entirely. If the former is the case, then a midpoint should be alright.
Once you’ve decided upon your response format, you can create your items. One last note should be made, though, about determining your item format. Sometimes, after defining your construct, you might be unsure about your item format. That is okay. You might need to make a few items, and then determine which response format fits the items. Once you’ve recognized which response format fits most of your items, you can begin writing items catered to that
Creating Your Items
Creating your items is the most important part of the scale development process – but it is also amongst the most difficult parts. The best advice that I can give is to create wayyy too many items which measure your defined construct. Go overboard. If you need 10, make 30. If you need 20, make 50. You can always remove items later. If you are completely stuck on how to make items, look at other successful scales. Other authors have already made excellent measures for many constructs, and they can set a great example. Also, when making items, keep the following mistakes in mind. Then, after making the items, review your items for these common mistakes.
Grammar – Make sure you use proper grammar. This includes avoiding double-negatives, run-on sentences, and the like. Although you may believe that the item is clearer if you use improper grammar, most respondents will likely think otherwise. Also, respondents may believe that the survey is bad or pointless if they
repeatedly see items with bad grammar.
Double Barreled – Double barreled items contain two different qualifiers in the same item. For example, a double barreled item would be, “I am a happy productive employee.” Some employees may be happy but not productive, and vice versa. It would certainly be best to separate this item into two different items (or remove one of the qualifiers – happy or productive). When making items, make sure that someone cannot agree with a part of the item while disagree with the other part. Items should express a single idea.
Too Complex or Long – Items should be as short and simple as possible. If items are too long, readers may be quickly bored by your survey and their
responses will be poor. Also, not everyone has an advanced reading level. Long sentences can be confusing, leaving respondents unsure about their responses. So, make your survey short and sweet.
Slang – Sometimes, people want to include slang into items. This is a big no-no. Slang words are usually region-specific, and respondents from other locations may not be familiar with the words. Also, non-native English speakers are usually unfamiliar with slang words, possibly biasing their responses. Lastly, the meanings of slang words tend to change over time, meaning your survey can quickly become outdated.
Contractions – Never use contractions in scale items. Respondents may overlook the contraction and inappropriately answer the item.
Negatively Worded – These days, many authors suggest that items should never be negatively worded. Negatively worded items have the opposite coding of the other items in the scale. For example, if we wanted to measure courage, a negatively worded item may be, “I would never put the others before myself.” There are several reasons to not use negatively worded items, but I won’t go into them. Just try to avoid them unless you have an extremely good reason to use them…And catching insufficient motivation in responses is NOT a good enough reason.
Leading Questions – Certain aspects of a question may be leading. Take the following example, “Do you agree with the proposal set forth by Bill Gates, one of the most successful businesses men in America?” Of course, people will likely agree with this item, because the latter portion suggests that Bill Gates is smart and his proposals are likely a good idea. Take this other example, “Do you think the proposal is a good idea?” Respondents may favor this item more often than not, because it is already leaning on the good side. Take this final example, “Do you think the proposal is a good or bad idea?” This example is completely neutral and avoids biases. Try to make your items akin to this last one.
False Premise – Avoid making items which require respondents to go-along with a portion. For example, the item, “To improve healthcare, do you think taxes should be raised or services should be cut?” Many respondents may be uneasy answering this questions if they do not want to change healthcare.
What to Do Now?
Now that you have your over-representative item list, you should look over it for these common mistakes. Then, look over it again for repetitive items, and
remove any items which are identical or almost identical. Once you are done with that, you have completed the first step of the scale development process!
Now you need to reduce your over-representative item list. There are many ways to do this, but one of the most popular is the item-sort task. Fortunately, I have a Statistics Help page on best methods in item-sort tasks. Hop on over there to find out more – click here!
As always, I hope this page helped. If you have any questions or requests for new pages, please email me at firstname.lastname@example.org .