When you look at survey results do you ask yourself ‘Are there numbers statistically significant? Yes or no?’
‘Significance’ is a commonly used word in statistics. Significance tests suggest confidence in survey findings; that the findings matter and are meaningful. But perhaps a term like ‘meaningful’ may be a more helpful way to assess statistics than significance tests.
Significant testing has been a widely accepted approach for determining whether statistics are worth reporting on. Significance testing is about determining the level of confidence that the survey would generate the same results if repeated; or if differences between findings taken at two points in time are reliable or meaningful. It is a way of trying to control these types of risks. Typically, in the literature, when applying a significance test, something is deemed to be significant if is comes out at 95 or 99 per cent levels of confidence. This suggests that anything even slightly under this arbitrary threshold is not worth reporting.
In reality, the level of risk––or degree of confidence––falls along a spectrum. Findings do not suddenly become significant when they reach a particular degree.
It is widely accepted that any analysis of statistics should be conducted appropriate to the research problem. Issues of validity and reliability are important considerations. Validity refers to the extent to which a research instrument measures what it claims to measure. Reliability is about the consistency of the research instrument over time.
Current literature suggests that significance testing is often used indiscriminately; as a default way of determining if survey findings are meaningful. The current industry thinking is that significance tests do not necessarily provide any advantages in relation to deciding whether differences between indicators are important or not. Applying significance tests and mechanically adhering to their results can be highly problematic and detrimental to critical thinking.
Informed human judgement is an important attribute to adopt as a researcher or evaluator. Human judgement implies that decision making in statistic inference is partly subjective, context dependent and goal oriented. Surveys are rarely conducted in isolation; they are often carried out following a needs analysis or assessment of a social situation that led to the identification of a need for inquiry. Surveys are commonly conducted as part of a wider study that gathers information in a range of ways, perhaps including qualitative research or other mixed methods approaches. Consequently, survey findings shouldn’t be analysed in isolation; they need to be considered within a broader contextual setting. Significance testing analyses survey results in isolation, without factoring in these other influences.
There can be no universal standard for determining meaningful changes in data, as each case must be judged on its own merits. It is becoming a more commonly held view amongst industry experts that it is better to use informed quantitative thinking rather than mindless statistics.
Ultimately, significance tests cannot make decisions for us. Rather than considering findings to be ‘significant’, it may be more helpful to think of what may be ‘notable’, or ‘meaningful’. This approach requires greater informed human judgement than a simple arbitrary test of significance. Only informed humans can make decisions that incorporate the use of statistics, analysed in a way that responds to the nature of the inquiry.