# Quantitative Data Types and Tests

**OVERVIEW**

Quantitative data is data which can be expressed numerically to indicate a quantity, amount, or measurement

- not all numbers constitute quantitative data (e.g. tax file number!)
- distinct from qualitative data

Quantitative data collection involves measurement of variables

A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. age, weight, etc.). In other words, they are “data points” that vary numerically between measurements.

Variables are dependent (outcome or response variable) or independent (predictor variable)

- Independent variables – values do
*not*depend on other variables; the cause or predictor in an experimental study (causality cannot be implied in observational studies) - Dependent variables – values depend on other variables; the effect or outcome in an experimental study

**DATA TYPES**

Quantitative variables can be continuous or discrete and have varying levels of measurement

Continuous variables

- Continuous variables are part of a continuous range of values (ie. height)
- the distinction between discrete and continuous variables is often blurred
- Discrete variables can only be certain values (e.g. whole numbers) (ie. number of cases of influenza, you can’t have half a case)
- discrete variables can generally be treated as continuous variables for the purpose of statistical testing (e.g. average number of children per family = 2.4)

continuous variables can be made discrete depending on how they are measured (e.g. a measuring tape that only measures height to the nearest cm)

Levels of measurement (lowest to highest)

- Interval data – continuous/ discrete variables that increase at constant intervals but do not start at true zero (ie. gauge pressure or temperature on C scale – 20C is not twice as hot as 10C)
- Ratio data – interval data that has a true zero (ie. absolute pressure of 200kPa is twice as great as 100kPa)

Categorical variables are qualitative data, not quantitative, even though they may be labelled as numbers

- Categorical variables are made up of categories identify seperate entities (e.g. gender, colours, etc)

**DATA COLLECTION**

Data collection involves either:

- observational/ correlational/ cross-sectional research – observe what naturally occurs without intervention
- e.g. point prevalence studies, case-control studies, retrospective or prospective cohort studies

- experimental research – observe the effects of an intervention
- e.g. controlled experiments, cross over studies, randomised control trials

and experimental research can have either:

- independent design (aka between groups or between subjects) – different people are exposed to the intervention or not
- repeated measures design (within subject) – same people are exposed to the intervention at different times

**VARIATION**

Variation in data from quantitative research can be systematic or unsystematic

- systematic variation results from differential effect of the research process on one experimental group compared with another; if this is not due to the intended experimental intervention then there is bias and a loss of internal validity
- decreased by:
- randomisation and blinding
- counterbalancing order in repeated measures designs (e.g. to overcome practice effects and boredom effects)

- decreased by:
- unsystematic variation results from random factors, and does not contribute to bias or internal validity

Many statistical tests work by identifying systematic and unsystematic variation in data and making a comparison

**CORRELATION AND CAUSATION**

Correlation is the presence of an association between dependent variables and an independent variable.

- causation cannot be proven from observational studies

Causation is when the independent variable is the cause and the dependent variable is the effect

- can only be determined from experimental studies, due to Hume’s requirements for inferring a cause and effect relationship:
- contiguity – cause and effect must be temporally associated
- sequentiality – cause must precede effect
- necessity – effect should never occur in the absence of the cause

note that Hume’s requirements are simplistic

- sometimes effect is not evident until a large amount of time after exposure to the cause (e.g. adult melanoma and sun exposure as a child)
- sometimes effects can have more than one cause, i.e. confounders (e.g. hypotension from beta blocker overdose versus calcium channel blocker overdose)
- plausibility is also useful, i.e. there is a mechanism that explains the cause and effect relationship

Mill proposed an additional criterion for causation

- exclusivity – all other causes must be ruled out

**DATA ANALYSIS**

Quantative data analysis involves both:

- graphical analysis – plotting data as graphs to visualise patterns in the distribution of data
- statistical tests – fitting statistical models to the data

**DESCRIPTIVE STATISTICS**

Describe the distribution of data

- central tendency
- mean
- median
- mode

- dispersion
- range
- inter-quartile range
- deviation: distance between an observed score and the mean for the variable under consideration
- variance: deviation squared
- standard deviation: square-root of variance

- shape
- normal distribution: bell-shaped with skew = 0 and kurtosis – 0
- skewness: lack of symmetry; positive skew = long tail at higher values, of the independent variable negative skew = long tail at lower values of the independent variable
- kurtosis: peakedness; leptokurtotic = “pointy” and platykurtotic = “flat”

**INFERENTIAL STATISTICS**

- used for hypothesis testing
- The null hypothesis (H0) is of the form there is no difference between these variables or groups or there is no association between these variables, one does not affect the value of the other
- The alternative hypothesis (H1) is that there is an association or difference

**STATISTICAL TESTS**

Use

- statistical tests are used to answer the question: “If the null hypothesis is true, how likely is it that I would observe the data that I have collected?” (usually expressed as a p-value)
- a two-tailed test is used to determine if the two vaules are different
- a one-tailed test is used to determine if one value is greater or smaller than the other

Types

- either parametric or non-parametric
- parametric methods makes assumptions about the distribution of data, non-parametric do not
- parametric methods are more powerful and should be used if possible, but require assumptions about the data to be met (e.g. normally distributed)

Parametric Tests

- Normal distribution (n > 60, mean, standard deviations, p value, alpha value, beta value)
- Students t Test (n < 60) – can be paired (same subjects on two different variables) or unpaired (independent samples); t statistic can only be computed for 2 groups or variables
- Analysis of variance (ANOVA): tests for differences between the means of 2 or more groups
- Pearson correlation co-efficient (Pearson’s
*R)*: tests for an association between two variables with an indication of strength - Regression or multiple regression: tests if an independent variable can predict another variable(s)

Non-parametric tests

- Mann-Whitney U test: equivalent to unpaired Students t-test
- Wilcoxon rank sum test: equivalent to paired t-test
- Wilcox signed rank test: equivalent to paired t-test
- Kruskal-Wallis: equivalent to one-way ANOVA
- Friedman’s: equivalent to repeated measures ANOVA
- Spearman’s rank order (
*ρ)*: equivalent to Pearson correlation co-efficient but for ranked data

**REFERENCES AND LINKS**

LITFL

*Textbooks*

- Harrell FE, Slaughter JC. Biostatistics for Biomedical Research. (pdf ebook)
- Navarro DJ and Foxcroft DR (2019). learning statistics with jamovi: a tutorial for psychology students and other beginners
*.*(Version 0.70). DOI: 10.24384/hgc3-7p15

*FOAM and web resources*

- Data Methods (advanced discussion forum)
- Frank Harrell’s Glossary of Statistical Terms (pdf)
- Jamovi (free, easy-to-use, open source statistics software)
- OECD Glossary of Statistical Terms
- The R Project for Statistical Computing
- Statistics How To
- Statistical Problems to Document and Avoid (Vanderbilt Wiki)
- Statistical Thinking (Frank Harrell’s blog)
- Statistical Modeling, Causal Inference, and Social Science (Andrew Gelman’s blog)

## Critical Care

Compendium

Chris is an Intensivist and ECMO specialist at the Alfred ICU in Melbourne. He is also a Clinical Adjunct Associate Professor at Monash University. He is a co-founder of the Australia and New Zealand Clinician Educator Network (ANZCEN) and is the Lead for the ANZCEN Clinician Educator Incubator programme. He is on the Board of Directors for the Intensive Care Foundation and is a First Part Examiner for the College of Intensive Care Medicine. He is an internationally recognised Clinician Educator with a passion for helping clinicians learn and for improving the clinical performance of individuals and collectives.

After finishing his medical degree at the University of Auckland, he continued post-graduate training in New Zealand as well as Australia’s Northern Territory, Perth and Melbourne. He has completed fellowship training in both intensive care medicine and emergency medicine, as well as post-graduate training in biochemistry, clinical toxicology, clinical epidemiology, and health professional education.

He is actively involved in in using translational simulation to improve patient care and the design of processes and systems at Alfred Health. He coordinates the Alfred ICU’s education and simulation programmes and runs the unit’s education website, INTENSIVE. He created the ‘Critically Ill Airway’ course and teaches on numerous courses around the world. He is one of the founders of the FOAM movement (Free Open-Access Medical education) and is co-creator of litfl.com, the RAGE podcast, the Resuscitology course, and the SMACC conference.

His one great achievement is being the father of three amazing children.

On Twitter, he is @precordialthump.

| INTENSIVE | RAGE | Resuscitology | SMACC