In many introductory stats courses, both at the undergrad and graduate level, the concept of percentiles is an early topic. While percentiles rarely appear in

academic articles, they are sometimes used in practice. For instance, most people probably have a negative perception of percentiles from taking the SAT, GRE, or other standardized tests. How do you know whether a 10^{th} percentile score should have you jumping for joy or down in the dumps? Likewise, how are we supposed to feel about a 90^{th} percentile score?

To understand percentiles, it is best to start with their definition – which is often their most confusing aspect. Believe it or not, percentiles have two (maybe even three) popular definitions. They are:

Definition 1: A measure that tells us what percent of the total frequency that scored **at or below** (aka **less than or equal to**) that measure. In this case, you’re looking for the smallest value that is **greater than or equal to** a certain

percentage of the scores. For instance, if the number 54 is the 90^{th} percentile for a certain variable, 90 percent of the values for that variable are **less than or equal to** 54. When we use this definition, we often say that a certain value or cutoff is ** in** a certain percentile (i.e. my score was in the 90th percentile).

Definition 2: A measure that tells us what percent of the total frequency that scored **below** (aka **less than**) that measure. In this case, you’re looking for the smallest value that is **greater than** a certain percentage of the scores. For

instance, if the number 54 is the 90^{th} percentile for a certain variable, 90 percent of the values for that variable are **less than** 54. When we use this definition, we often say that a certain value or cutoff is * at* a certain percentile (i.e. my score was at the 90th percentile).

So, the main difference between these two definitions is whether the percentile value is included or excluded from that percentile. Confusing, right? For our

purposes, we are going to use the first definition. It should be noted that

choosing the second definition would slightly alter some of the formulas below, so be careful if you want to apply this second definition.

To determine a percentile, one of the easiest methods is to approximate it using the following formula:

Location of Percentile = (n + 1) * (p / 100)

where p is your desired percentile (90^{th} percentile would be 90, for example) and n is your sample size. When using this formula, if the result is an integer, then it indicates the location of the desired percentile in the dataset (i.e. a result of 10 would indicate that the 10^{th} value in the **ordered dataset** is the desired percentile value). If your result is not an integer, then you interpolate the distance between the two. For instance, if the result was 10.50, you would identify the number that is halfway between the 10^{th} and 11^{th} values in the ordered dataset. Or, if the

result was 10.25, you would identify the value that is ¼ between the 10^{th} and 11^{th} values in the ordered dataset. This is the formula that the =PERCENTILE.EXC command uses in Excel.

Additionally, the following formula can also be used to get an approximation of a percentile:

Location of Percentile = [(n – 1) * (p / 100)] + 1

where the meaning of the letters is the same as before. When using this

formula, we use the same procedures as before in regards to whether our result is an integer or not. This is the formula that the =PERCENTILE.INC and

=PERCENTILE command use in Excel. The differences in the two formulas (and the Excel commands) can be found at this link (here). Theoretically the two methods differ, as noted in the links, but you will likely receive similar results from the two in practice.

Lastly, certain resources (including Wikipedia) lists the following formula to

determine the percentile:

Location of Percentile = n * (p / 100)

where the meaning of the letters is the same as before. When using this

formula, if the result is an integer, then it indicates the location of the desired

percentile in the dataset. If it is not an integer, then we simply round-up.

So, let’s use the following example. Take a dataset of ten values that are

numbered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. If we wanted to use the first formula to

determine the 30^{th} percentile, it would look like this:

Location of Percentile = (10 + 1) * (30 / 100) = 3.3

If we wanted to use the second formula to determine the 30^{th} percentile, it would look like this:

Location of Percentile = [(10 – 1) * (30 / 100)] + 1 = 3.7

If we wanted to use the third formula to determine the 30^{th} percentile, it would look like this:

Location of Percentile = 10 * (30 / 100) = 3

In these cases, the 30^{th} percentile would be 3.3 using the first formula, 3.7 using the second formula, and 3 using the third formula. Using the first formula, we would say that 30 percent of the values were less than or equal to 3.3. Using the second formula, we would say that 30 percent of the values were less than or equal to 3.7. Using the third formula, we would say that 30 percent of the values were less than or equal to 3. Given the current example and dataset, each of these are true although they provide different numbers.

Hopefully this clears some things up about percentiles. As always, if you have any lingering questions, feel free to contact me at MHoward@SouthAlabama.edu.