Quantitative Data Types and Tests • LITFL • CCC Research

OVERVIEW

Quantitative data is data which can be expressed numerically to indicate a quantity, amount, or measurement

not all numbers constitute quantitative data (e.g. tax file number!)
distinct from qualitative data

Quantitative data collection involves measurement of variables

A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. age, weight, etc.). In other words, they are “data points” that vary numerically between measurements.

Variables are dependent (outcome or response variable) or independent (predictor variable)

Independent variables – values do not depend on other variables; the cause or predictor in an experimental study (causality cannot be implied in observational studies)
Dependent variables – values depend on other variables; the effect or outcome in an experimental study

DATA TYPES

Quantitative variables can be continuous or discrete and have varying levels of measurement

Continuous variables

Continuous variables are part of a continuous range of values (ie. height)
the distinction between discrete and continuous variables is often blurred
- Discrete variables can only be certain values (e.g. whole numbers) (ie. number of cases of influenza, you can’t have half a case)
- discrete variables can generally be treated as continuous variables for the purpose of statistical testing (e.g. average number of children per family = 2.4)
  continuous variables can be made discrete depending on how they are measured (e.g. a measuring tape that only measures height to the nearest cm)

Levels of measurement (lowest to highest)

Interval data – continuous/ discrete variables that increase at constant intervals but do not start at true zero (ie. gauge pressure or temperature on C scale – 20C is not twice as hot as 10C)
Ratio data – interval data that has a true zero (ie. absolute pressure of 200kPa is twice as great as 100kPa)

Categorical variables are qualitative data, not quantitative, even though they may be labelled as numbers

Categorical variables are made up of categories identify seperate entities (e.g. gender, colours, etc)

DATA COLLECTION

Data collection involves either:

observational/ correlational/ cross-sectional research – observe what naturally occurs without intervention
- e.g. point prevalence studies, case-control studies, retrospective or prospective cohort studies
experimental research – observe the effects of an intervention
- e.g. controlled experiments, cross over studies, randomised control trials

and experimental research can have either:

independent design (aka between groups or between subjects) – different people are exposed to the intervention or not
repeated measures design (within subject) – same people are exposed to the intervention at different times

VARIATION

Variation in data from quantitative research can be systematic or unsystematic

systematic variation results from differential effect of the research process on one experimental group compared with another; if this is not due to the intended experimental intervention then there is bias and a loss of internal validity
- decreased by:
  - randomisation and blinding
  - counterbalancing order in repeated measures designs (e.g. to overcome practice effects and boredom effects)
unsystematic variation results from random factors, and does not contribute to bias or internal validity

Many statistical tests work by identifying systematic and unsystematic variation in data and making a comparison

CORRELATION AND CAUSATION

Correlation is the presence of an association between dependent variables and an independent variable.

causation cannot be proven from observational studies

Causation is when the independent variable is the cause and the dependent variable is the effect

can only be determined from experimental studies, due to Hume’s requirements for inferring a cause and effect relationship:
- contiguity – cause and effect must be temporally associated
- sequentiality – cause must precede effect
- necessity – effect should never occur in the absence of the cause

note that Hume’s requirements are simplistic

sometimes effect is not evident until a large amount of time after exposure to the cause (e.g. adult melanoma and sun exposure as a child)
sometimes effects can have more than one cause, i.e. confounders (e.g. hypotension from beta blocker overdose versus calcium channel blocker overdose)
plausibility is also useful, i.e. there is a mechanism that explains the cause and effect relationship

Mill proposed an additional criterion for causation

exclusivity – all other causes must be ruled out

DATA ANALYSIS

Quantative data analysis involves both:

graphical analysis – plotting data as graphs to visualise patterns in the distribution of data
statistical tests – fitting statistical models to the data

DESCRIPTIVE STATISTICS

Describe the distribution of data

central tendency
- mean
- median
- mode
dispersion
- range
- inter-quartile range
- deviation: distance between an observed score and the mean for the variable under consideration
- variance: deviation squared
- standard deviation: square-root of variance
shape
- normal distribution: bell-shaped with skew = 0 and kurtosis – 0
- skewness: lack of symmetry; positive skew = long tail at higher values, of the independent variable negative skew = long tail at lower values of the independent variable
- kurtosis: peakedness; leptokurtotic = “pointy” and platykurtotic = “flat”

INFERENTIAL STATISTICS

used for hypothesis testing
The null hypothesis (H0) is of the form there is no difference between these variables or groups or there is no association between these variables, one does not affect the value of the other
The alternative hypothesis (H1) is that there is an association or difference

STATISTICAL TESTS

Use

statistical tests are used to answer the question: “If the null hypothesis is true, how likely is it that I would observe the data that I have collected?” (usually expressed as a p-value)
a two-tailed test is used to determine if the two vaules are different
a one-tailed test is used to determine if one value is greater or smaller than the other

Types

either parametric or non-parametric
parametric methods makes assumptions about the distribution of data, non-parametric do not
parametric methods are more powerful and should be used if possible, but require assumptions about the data to be met (e.g. normally distributed)

Parametric Tests

Normal distribution (n > 60, mean, standard deviations, p value, alpha value, beta value)
Students t Test (n < 60) – can be paired (same subjects on two different variables) or unpaired (independent samples); t statistic can only be computed for 2 groups or variables
Analysis of variance (ANOVA): tests for differences between the means of 2 or more groups
Pearson correlation co-efficient (Pearson’s R): tests for an association between two variables with an indication of strength
Regression or multiple regression: tests if an independent variable can predict another variable(s)

Non-parametric tests

Mann-Whitney U test: equivalent to unpaired Students t-test
Wilcoxon rank sum test: equivalent to paired t-test
Wilcox signed rank test: equivalent to paired t-test
Kruskal-Wallis: equivalent to one-way ANOVA
Friedman’s: equivalent to repeated measures ANOVA
Spearman’s rank order (ρ): equivalent to Pearson correlation co-efficient but for ranked data

REFERENCES AND LINKS

LITFL

CCC — Qualitative data types and tests

Textbooks

Harrell FE, Slaughter JC. Biostatistics for Biomedical Research. (pdf ebook)
Navarro DJ and Foxcroft DR (2019). learning statistics with jamovi: a tutorial for psychology students and other beginners. (Version 0.70). DOI: 10.24384/hgc3-7p15

FOAM and web resources

Data Methods (advanced discussion forum)
Frank Harrell’s Glossary of Statistical Terms (pdf)
Jamovi (free, easy-to-use, open source statistics software)
OECD Glossary of Statistical Terms
The R Project for Statistical Computing
Statistics How To
Statistical Problems to Document and Avoid (Vanderbilt Wiki)
Statistical Thinking (Frank Harrell’s blog)
Statistical Modeling, Causal Inference, and Social Science (Andrew Gelman’s blog)

Critical Care

Compendium

…more CCC

Chris Nickson

Chris is an Intensivist and ECMO specialist at The Alfred ICU, where he is Deputy Director (Education). He is a Clinical Adjunct Associate Professor at Monash University, the Lead for the Clinician Educator Incubator programme, and a CICM First Part Examiner.

He is an internationally recognised Clinician Educator with a passion for helping clinicians learn and for improving the clinical performance of individuals and collectives. He was one of the founders of the FOAM movement (Free Open-Access Medical education) has been recognised for his contributions to education with awards from ANZICS, ANZAHPE, and ACEM.

His one great achievement is being the father of three amazing children.

On Bluesky, he is @precordialthump.bsky.social and on the site that Elon has screwed up, he is @precordialthump.

| INTENSIVE | RAGE | Resuscitology | SMACC