Quantitative Data Types and Tests
OVERVIEW
Quantitative data is data which can be expressed numerically to indicate a quantity, amount, or measurement
- not all numbers constitute quantitative data (e.g. tax file number!)
- distinct from qualitative data
Quantitative data collection involves measurement of variables
A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. age, weight, etc.). In other words, they are “data points” that vary numerically between measurements.
Variables are dependent (outcome or response variable) or independent (predictor variable)
- Independent variables – values do not depend on other variables; the cause or predictor in an experimental study (causality cannot be implied in observational studies)
- Dependent variables – values depend on other variables; the effect or outcome in an experimental study
DATA TYPES
Quantitative variables can be continuous or discrete and have varying levels of measurement
Continuous variables
- Continuous variables are part of a continuous range of values (ie. height)
- the distinction between discrete and continuous variables is often blurred
- Discrete variables can only be certain values (e.g. whole numbers) (ie. number of cases of influenza, you can’t have half a case)
- discrete variables can generally be treated as continuous variables for the purpose of statistical testing (e.g. average number of children per family = 2.4)
continuous variables can be made discrete depending on how they are measured (e.g. a measuring tape that only measures height to the nearest cm)
Levels of measurement (lowest to highest)
- Interval data – continuous/ discrete variables that increase at constant intervals but do not start at true zero (ie. gauge pressure or temperature on C scale – 20C is not twice as hot as 10C)
- Ratio data – interval data that has a true zero (ie. absolute pressure of 200kPa is twice as great as 100kPa)
Categorical variables are qualitative data, not quantitative, even though they may be labelled as numbers
- Categorical variables are made up of categories identify seperate entities (e.g. gender, colours, etc)
DATA COLLECTION
Data collection involves either:
- observational/ correlational/ cross-sectional research – observe what naturally occurs without intervention
- e.g. point prevalence studies, case-control studies, retrospective or prospective cohort studies
- experimental research – observe the effects of an intervention
- e.g. controlled experiments, cross over studies, randomised control trials
and experimental research can have either:
- independent design (aka between groups or between subjects) – different people are exposed to the intervention or not
- repeated measures design (within subject) – same people are exposed to the intervention at different times
VARIATION
Variation in data from quantitative research can be systematic or unsystematic
- systematic variation results from differential effect of the research process on one experimental group compared with another; if this is not due to the intended experimental intervention then there is bias and a loss of internal validity
- decreased by:
- randomisation and blinding
- counterbalancing order in repeated measures designs (e.g. to overcome practice effects and boredom effects)
- decreased by:
- unsystematic variation results from random factors, and does not contribute to bias or internal validity
Many statistical tests work by identifying systematic and unsystematic variation in data and making a comparison
CORRELATION AND CAUSATION
Correlation is the presence of an association between dependent variables and an independent variable.
- causation cannot be proven from observational studies
Causation is when the independent variable is the cause and the dependent variable is the effect
- can only be determined from experimental studies, due to Hume’s requirements for inferring a cause and effect relationship:
- contiguity – cause and effect must be temporally associated
- sequentiality – cause must precede effect
- necessity – effect should never occur in the absence of the cause
note that Hume’s requirements are simplistic
- sometimes effect is not evident until a large amount of time after exposure to the cause (e.g. adult melanoma and sun exposure as a child)
- sometimes effects can have more than one cause, i.e. confounders (e.g. hypotension from beta blocker overdose versus calcium channel blocker overdose)
- plausibility is also useful, i.e. there is a mechanism that explains the cause and effect relationship
Mill proposed an additional criterion for causation
- exclusivity – all other causes must be ruled out
DATA ANALYSIS
Quantative data analysis involves both:
- graphical analysis – plotting data as graphs to visualise patterns in the distribution of data
- statistical tests – fitting statistical models to the data
DESCRIPTIVE STATISTICS
Describe the distribution of data
- central tendency
- mean
- median
- mode
- dispersion
- range
- inter-quartile range
- deviation: distance between an observed score and the mean for the variable under consideration
- variance: deviation squared
- standard deviation: square-root of variance
- shape
- normal distribution: bell-shaped with skew = 0 and kurtosis – 0
- skewness: lack of symmetry; positive skew = long tail at higher values, of the independent variable negative skew = long tail at lower values of the independent variable
- kurtosis: peakedness; leptokurtotic = “pointy” and platykurtotic = “flat”
INFERENTIAL STATISTICS
- used for hypothesis testing
- The null hypothesis (H0) is of the form there is no difference between these variables or groups or there is no association between these variables, one does not affect the value of the other
- The alternative hypothesis (H1) is that there is an association or difference
STATISTICAL TESTS
Use
- statistical tests are used to answer the question: “If the null hypothesis is true, how likely is it that I would observe the data that I have collected?” (usually expressed as a p-value)
- a two-tailed test is used to determine if the two vaules are different
- a one-tailed test is used to determine if one value is greater or smaller than the other
Types
- either parametric or non-parametric
- parametric methods makes assumptions about the distribution of data, non-parametric do not
- parametric methods are more powerful and should be used if possible, but require assumptions about the data to be met (e.g. normally distributed)
Parametric Tests
- Normal distribution (n > 60, mean, standard deviations, p value, alpha value, beta value)
- Students t Test (n < 60) – can be paired (same subjects on two different variables) or unpaired (independent samples); t statistic can only be computed for 2 groups or variables
- Analysis of variance (ANOVA): tests for differences between the means of 2 or more groups
- Pearson correlation co-efficient (Pearson’s R): tests for an association between two variables with an indication of strength
- Regression or multiple regression: tests if an independent variable can predict another variable(s)
Non-parametric tests
- Mann-Whitney U test: equivalent to unpaired Students t-test
- Wilcoxon rank sum test: equivalent to paired t-test
- Wilcox signed rank test: equivalent to paired t-test
- Kruskal-Wallis: equivalent to one-way ANOVA
- Friedman’s: equivalent to repeated measures ANOVA
- Spearman’s rank order (ρ): equivalent to Pearson correlation co-efficient but for ranked data
REFERENCES AND LINKS
LITFL
Textbooks
- Harrell FE, Slaughter JC. Biostatistics for Biomedical Research. (pdf ebook)
- Navarro DJ and Foxcroft DR (2019). learning statistics with jamovi: a tutorial for psychology students and other beginners. (Version 0.70). DOI: 10.24384/hgc3-7p15
FOAM and web resources
- Data Methods (advanced discussion forum)
- Frank Harrell’s Glossary of Statistical Terms (pdf)
- Jamovi (free, easy-to-use, open source statistics software)
- OECD Glossary of Statistical Terms
- The R Project for Statistical Computing
- Statistics How To
- Statistical Problems to Document and Avoid (Vanderbilt Wiki)
- Statistical Thinking (Frank Harrell’s blog)
- Statistical Modeling, Causal Inference, and Social Science (Andrew Gelman’s blog)
Critical Care
Compendium
Chris is an Intensivist and ECMO specialist at The Alfred ICU, where he is Deputy Director (Education). He is a Clinical Adjunct Associate Professor at Monash University, the Lead for the Clinician Educator Incubator programme, and a CICM First Part Examiner.
He is an internationally recognised Clinician Educator with a passion for helping clinicians learn and for improving the clinical performance of individuals and collectives. He was one of the founders of the FOAM movement (Free Open-Access Medical education) has been recognised for his contributions to education with awards from ANZICS, ANZAHPE, and ACEM.
His one great achievement is being the father of three amazing children.
On Bluesky, he is @precordialthump.bsky.social and on the site that Elon has screwed up, he is @precordialthump.
| INTENSIVE | RAGE | Resuscitology | SMACC