How do you determine if the difference between two numbers is significant?

When reading about or conducting research, you are likely to come across the term ‘statistical significance’. ‘Significance’ generally refers to something having particular importance – but in research, ‘significance’ has a very different meaning. Statistical significance is a term used to describe how certain we are that a difference or relationship between two variables exists and isn’t due to chance. When a result is identified as being statistically significant, this means that you are confident that there is a real difference or relationship between two variables, and it’s unlikely that it’s a one-off occurrence.

However, it’s commonplace for statistical significance (i.e., being confident that chance wasn’t involved in your results) to be confused with general significance (i.e., having importance). A statistically significant finding may, or may not, have any real-world utility. Therefore, having a thorough understanding of what statistical significance is, and what factors contribute to it, is important for conducting sound research.

1 – Hypotheses:

A hypothesis is a particular type of prediction for what the outcomes of research will be, and comes in two forms. A null hypothesis predicts that there is no difference or relationship between two groups or variables of interest and therefore the two groups or variables are equal.In contrast, an alternate hypothesis predicts that there is a difference or relationship between two groups or variables of interest. In this case, the two groups or variables are not equal, and so could be greater or less than one another.

A key purpose of statistical significance testing is to determine whether your null hypothesis occurred by chance. If your null hypothesis occurred by chance, then we do not reject (retain) the null hypothesis and conclude there is no difference. Because the result occurred by chance,it is not likely to happen in the real world. However, if your null hypothesis did not occur by chance, then we reject the null hypothesis and conclude there is a difference. Because it did not occur by chance,it is likely to occur in the real world. This will in turn will affect the conclusions that you can draw from your research.

2 – The Likelihood of Error:

When dealing with chance, there is always the possibility of error – including Type I or Type II errors. A Type I error occurs when the null hypothesis is rejected when it should have been retained (i.e., a false positive). This means that the results are identified as significant when they actually occurred by chance. Because they occurred by chance, it is unlikely to happen in the real world and so should have been identified as non-significant. A Type II error occurs when the null hypothesis is retained when it should have been rejected (i.e., a false negative). This means that the results are identified as non-significant when they actually did not occur by chance. Not occurring by chance suggests that it is likely to happen in the real world, and so should have been identified as significant.

3 – Alpha and p Values:

Prior to any statistical analyses, it is important to determine what you will consider the definition of statistically significant to be. This is referred to as the alpha value, and represents the probability you are going to make a Type I error (i.e., reject the null hypothesis when it is true).Alpha values are typically set at .05 (5%), meaning that we are 95% confident that we won’t make a Type I error. However, more conservative tests will use smaller alpha values such as .01 (1%), meaning that we are 99% confident we won’t make a Type I error.Alpha is not to be confused with the p value, which is the specifically calculated probability of the obtained result occurring by chance. For statistical significance, alpha is used as the threshold value and the p value is compared to it. If the p value is above the alpha value (p> .05), our result is not statistically significant. If it is below our alpha (p< .05), then it is statistically significant.

4 – One or Two Tailed Tests:

Your hypotheses will determine which type of significance test you will need to conduct. A one-tailed hypothesis is where you predict a specific direction of the difference (higher, lower) or relationship (positive, negative) between the two groups or variables of interest. Therefore, with a one-tailed test, while your alpha value stays the same, you halve your p value because you are focusing on one specific direction only. On the other hand, a two-tailed hypothesis is where you do not predict a specific direction of the difference or relationship and as such with a two-tailed test you keep the p value as a whole number. Two-tailed tests are more widely used in research compared to one-tailed tests.

5 – Sample Size and Power:

Statistical power refers to the probability that the statistical test you are using will correctly reject a false null hypothesis. Type II errors are reduced by having enough statistical power, which is generally kept at 80% or higher. Statistical power is increased by having an adequate sample size. However, if your study is not adequately powered because you don’t have enough participants, this will affect statistical significance. Generally, if the alternate hypothesis is true and there is a difference or relationship to be observed, then with a larger sample the chances of seeing this difference or relationship will increase. If you see a difference or relationship between two small groups, you could reasonably expect that the difference or relationship would increase in prominence if the groups became larger.

Determining Statistical Significance Using Hand Calculations:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Determine your critical value:This step is unique to calculations done by hand. A critical value is a number that corresponds to the probability equal to your pre-determined alpha value, and for hand calculations will serve as the threshold for significance. Critical values are based on the number of tails in your test and your alpha value, which is why these parameters are determined first. There are different sets of critical values for each different type of statistical test you are conducting – these are easily accessible in statistics textbooks, or online.
  3. Calculate your test statistic: With your parameters set, perform the hand calculations needed. Your observed test statistic is the final numerical result.
  4. Compare your observed test statistic (Step 3) to the critical value (Step 2), and draw your conclusions:
    a. If your observed statistic is greater than the critical value (observed > critical), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.

    b. If your observed statistic is less than the critical value (observed < critical), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Determining Statistical Significance Using Software Packages:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Calculate your test statistic: With your parameters set, perform the calculations needed. Your observed test statistic is the final numerical result. You will note that next to your final result that there will be a p value next to it – software packages will calculate the specific p value for you.
  3. Compare your observed p value (Step 2) to your alpha value (Step 1), and draw your conclusions:
    a. If your p value is less than your alpha value (p< .050), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.
    b. If your p value is greater than your alpha value (p> .050), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Effect Sizes:

Just because a result has statistical significance, it doesn’t mean that the result has any real-world importance. To help ‘translate’ the result to the real world, we can use an effect size. An effect size is a numerical index of how much your dependent variable of interest is affected by the independent variable, and determines whether the observed effect is important enough to translate to the real world.Therefore, effect sizes should be interpreted alongside your significance results.The two main types of effect size include Cohen’s d, which indexes the size of the difference between two groups in units of standard deviation. For Cohen’s d, a score of 0.2 is a small effect, 0.5 is a medium effect, and 0.8 is a large effect. The other effect size is eta-squared, with measures the strength of the relationship between two variables. For eta-squared, a score of .05 is a weak effect, .10 is a medium effect, and .15 is a strong effect. Both of these effect sizes can be calculated by hand, or you can ask for it to be calculated for you as part of statistics software.

Helpful References:

Australian Bureau of Statistics (2011). Significance Testing.

How to make a survey How to write survey questions What are demographics? Correlation coefficients What is sampling? What is sample size? Open Ended vs Close Questions How to conduct research using surveys Quantitative Data Statistical Significance Multiple Choice Questions

How to determine if there is a significant difference between two groups?

If the means of the two groups are large relative to what we would expect to occur from sample to sample, we consider the difference to be significant. If the difference between the group means is small relative to the amount of sampling variability, the difference will not be significant.

What does it mean if the difference between two means is statistically significant?

Here's a recap of statistical significance: Statistically significant means a result is unlikely due to chance. The p-value is the probability of obtaining the difference we saw from a sample (or a larger one) if there really isn't a difference for all users.