Percentiles

In many introductory stats courses, both at the undergrad and graduate level, the concept of percentiles is an early topic.  While percentiles rarely appear in
academic articles, they are sometimes used in practice.  For instance, most people probably have a negative perception of percentiles from taking the SAT, GRE, or other standardized tests.  How do you know whether a 10th percentile score should have you jumping for joy or down in the dumps?  Likewise, how are we supposed to feel about a 90th percentile score?

To understand percentiles, it is best to start with their definition – which is often their most confusing aspect.  Believe it or not, percentiles have two (maybe even three) popular definitions.  They are:

Definition 1: A measure that tells us what percent of the total frequency that scored at or below (aka less than or equal to) that measure.  In this case, you’re looking for the smallest value that is greater than or equal to a certain
percentage of the scores.  For instance, if the number 54 is the 90th percentile for a certain variable, 90 percent of the values for that variable are less than or equal to 54.  When we use this definition, we often say that a certain value or cutoff is in a certain percentile (i.e. my score was in the 90th percentile).

Definition 2:  A measure that tells us what percent of the total frequency that scored below (aka less than) that measure.  In this case, you’re looking for the smallest value that is greater than a certain percentage of the scores.  For
instance, if the number 54 is the 90th percentile for a certain variable, 90 percent of the values for that variable are less than 54.  When we use this definition, we often say that a certain value or cutoff is at a certain percentile (i.e. my score was at the 90th percentile).

So, the main difference between these two definitions is whether the percentile value is included or excluded from that percentile.  Confusing, right?  For our
purposes, we are going to use the first definition.  It should be noted that
choosing the second definition would slightly alter some of the formulas below, so be careful if you want to apply this second definition.

To determine a percentile, one of the easiest methods is to approximate it using the following formula:

Location of Percentile = (n + 1) * (p / 100)

where p is your desired percentile (90th percentile would be 90, for example) and n is your sample size.  When using this formula, if the result is an integer, then it indicates the location of the desired percentile in the dataset (i.e. a result of 10 would indicate that the 10th value in the ordered dataset is the desired percentile value).  If your result is not an integer, then you interpolate the distance between the two.  For instance, if the result was 10.50, you would identify the number that is halfway between the 10th and 11th values in the ordered dataset.  Or, if the
result was 10.25, you would identify the value that is ¼ between the 10th and 11th values in the ordered dataset.  This is the formula that the =PERCENTILE.EXC command uses in Excel.

Additionally, the following formula can also be used to get an approximation of a percentile:

Location of Percentile = [(n – 1) * (p / 100)] + 1

where the meaning of the letters is the same as before.  When using this
formula, we use the same procedures as before in regards to whether our result is an integer or not.  This is the formula that the =PERCENTILE.INC and
=PERCENTILE command use in Excel.  The differences in the two formulas (and the Excel commands) can be found at this link (here).  Theoretically the two methods differ, as noted in the links, but you will likely receive similar results from the two in practice.

Lastly, certain resources (including Wikipedia) lists the following formula to
determine the percentile:

Location of Percentile = n * (p / 100)

where the meaning of the letters is the same as before.  When using this
formula, if the result is an integer, then it indicates the location of the desired
percentile in the dataset.  If it is not an integer, then we simply round-up.

So, let’s use the following example.  Take a dataset of ten values that are
numbered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.  If we wanted to use the first formula to
determine the 30th percentile, it would look like this:

Location of Percentile = (10 + 1) * (30 / 100) = 3.3

If we wanted to use the second formula to determine the 30th percentile, it would look like this:

Location of Percentile = [(10 – 1) * (30 / 100)] + 1 = 3.7

If we wanted to use the third formula to determine the 30th percentile, it would look like this:

Location of Percentile = 10 * (30 / 100) = 3

In these cases, the 30th percentile would be 3.3 using the first formula, 3.7 using the second formula, and 3 using the third formula.  Using the first formula, we would say that 30 percent of the values were less than or equal to 3.3.  Using the second formula, we would say that 30 percent of the values were less than or equal to 3.7.  Using the third formula, we would say that 30 percent of the values were less than or equal to 3.  Given the current example and dataset, each of these are true although they provide different numbers.

Hopefully this clears some things up about percentiles.  As always, if you have any lingering questions, feel free to contact me at MHoward@SouthAlabama.edu.