The most appropriate measure of central tendency for nominal level data is the median

Learning Outcomes

  • Recognize, describe, and calculate the measures of the center of data: mean, median, and mode.

By now, everyone should know how to calculate mean, median and mode. They each give us a measure of Central Tendency (i.e. where the center of our data falls), but often give different answers. So how do we know when to use each? Here are some general rules:

  1.  Mean is the most frequently used measure of central tendency and generally considered the best measure of it. However, there are some situations where either median or mode are preferred.
  2. Median is the preferred measure of central tendency when:
    1.  There are a few extreme scores in the distribution of the data. (NOTE: Remember that a single outlier can have a great effect on the mean). b.
    2. There are some missing or undetermined values in your data. c.
    3. There is an open ended distribution (For example, if you have a data field which measures number of children and your options are [latex]0[/latex], [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], [latex]4[/latex], [latex]5[/latex] or “[latex]6[/latex] or more,” than the “[latex]6[/latex] or more field” is open ended and makes calculating the mean impossible, since we do not know exact values for this field).
    4. You have data measured on an ordinal scale.
  3. Mode is the preferred measure when data are measured in a nominal ( and even sometimes ordinal) scale.

Recommended: First read Measures of Shape


What are the measures of central tendency?

A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution.


There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution.


What is the mode?

The mode is the most commonly occurringvalue in a distribution.

Consider this dataset showing the retirement age of 11 people, in whole years:54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60This table shows a simple frequency distribution of the retirement age data.

Age

Frequency

54

3

55

1

56

1

57

2

58

2

60

2


The most commonly occurring value is 54, therefore the mode of this distribution is 54 years. Advantage of the mode:The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (non-numerical) data. Limitations of the mode:The are some limitations to using the mode. In some distributions, the mode may not reflect the centre of the distribution very well. When the distribution of retirement age is ordered from lowest to highest value, it is easy to see that the centre of the distribution is 57 years, but the mode is lower, at 54 years. 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The presence of more than one mode can limit the ability of the mode in describing the centre or typical value of the distribution because a single value to describe the centre cannot be identified.In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are different).In cases such as these, it may be better to consider using the median or mean, or group the data in to appropriate intervals, and find the modal class.

What is the median?

The median is the middlevalue in distribution when the values are arranged in ascending or descending order.

The median divides the distribution in half (there are 50% of observations on either side of the median value). In a distribution with an odd number of observations, the median value is the middle value. Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which is 57 years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 When the distribution has an even number of observations, the median value is the mean of the two middle values. In the following distribution, the two middle values are 56 and 57, therefore the median equals 56.5 years: 52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60Advantage of the median:The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical. Limitation of the median:The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

What is the mean?

The mean is the sum of the value of each observation in a dataset divided by the number of observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of observations (11) which equals 56.6 years.Advantage of the mean:The mean can be used for both continuous and discrete numeric data.Limitations of the mean:The mean cannot be calculated for categorical data, as the values cannot be summed.As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.What else do I need to know about the mean?The population mean is indicated by the Greek symbol (pronounced ‘mu’). When the mean is calculated on a distribution from a sample it is indicated by the symbol (pronounced X-bar).

How does the shape of a distribution influence the Measures of Central Tendency?

Symmetrical distributions:When a distribution is symmetrical, the mode, median and mean are all in the middle of the distribution. The following graph shows a larger retirement age dataset with a distribution which is symmetrical. The mode, median and mean all equal 58 years.

Skewed distributions:When a distribution is skewed the mode remains the most commonly occurring value, the median remains the middle value in the distribution, but the mean is generally ‘pulled’ in the direction of the tails. In a skewed distribution, the median is often a preferred measure of central tendency, as the mean is not usually in the middle of the distribution. A distribution is said to be positively or right skewed when the tail on the right side of the distribution is longer than the left side. In a positively skewed distribution it is common for the mean to be ‘pulled’ toward the right tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the median value, tend to be less than the mean value. The following graph shows a larger retirement age data set with a distribution which is right skewed. The data has been grouped into classes, as the variable being measured (retirement age) is continuous. The mode is 54 years, the modal class is 54-56 years, the median is 56 years and the mean is 57.2 years.

A distribution is said to be negatively or left skewed when the tail on the left side of the distribution is longer than the right side. In a negatively skewed distribution, it is common for the mean to be ‘pulled’ toward the left tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the median value, tend to be greater than the mean value. The following graph shows a larger retirement age dataset with a distribution which left skewed. The mode is 65 years, the modal class is 63-65 years, the median is 63 years and the mean is 61.8 years.


How do outliers influence the measures of central tendency?

Outliers are extreme, or atypical data value(s) that are notably different from the rest of the data.

It is important to detect outliers within a distribution, because they can alter the results of the data analysis. The mean is more sensitive to the existence of outliers than the median or mode. Consider the initial retirement age dataset again, with one difference; the last observation of 60 years has been replaced with a retirement age of 81 years. This value is much higher than the other values, and could be considered an outlier. However, it has not changed the middle of the distribution, and therefore the median value is still 57 years. 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 81As the all values are included in the calculation of the mean, the outlier will influence the mean value. (54+54+54+55+56+57+57+58+58+60+81 = 644), divided by 11 = 58.5 yearsIn this distribution the outlier value has increased the mean value. Despite the existence of outliers in a distribution, the mean can still be an appropriate measure of central tendency, especially if the rest of the data is normally distributed. If the outlier is confirmed as a valid extreme value, it should not be removed from the dataset. Several common regression techniques can help reduce the influence of outliers on the mean value.Return to Statistical Language Homepage


Further information:

External links:

easycalculation.com - Mean, Median, Mode Calculator
calculatorsoup.com - Descriptive Statistics calculator
calculatorsoup.com - Mean Median Mode calculator

What is the most appropriate measure of central tendency for nominal level data?

Three measures of central tendency are the mode, the median and the mean. The mode is used almost exclusively with nominal-level data, as it is the only measure of central tendency available for such variables.

Is median used for nominal data?

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical. Limitation of the median: The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

Is median the best measure of central tendency?

The median is the most informative measure of central tendency for skewed distributions or distributions with outliers. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.

Which measure of central tendency is used for nominal ordinal data?

The measures of central tendency you can use depends on the level of measurement of your data. For a nominal level, you can only use the mode to find the most frequent value. For an ordinal level or ranked data, you can also use the median to find the value in the middle of your data set.