Validity of Research and Measurements


In general terms, validity is “the quality of being true or correct”, it refers to the strength of results and how accurately they reflect the real world. Thus ‘validity’ can have quite different meanings depending on the context!

  • Reliability is distinct from validity, in that it refers to the consistency or repeatability of results
  • Two types of validity are considered when critically appraising clinical research studies:
    • internal validity
    • external validity
  • Validity applies to an outcome or measurement, not the instrument used to obtain it and is based on ‘validity evidence’


  • The extent to which the design and conduct of the trial eliminate the possibility of bias, such that observed effects can be attributed to the independent variable
  • refers to the accuracy of a trial
  • a study that lacks internal validity should not applied to any clinical setting
  • components:
    • power calculation
    • details of study context and intervention
    • avoid loss of follow up
    • standardised treatment conditions
    • control groups
    • objectivity from blinding and data handling
  • Clinical research can be internally valid despite poor external validity


  • The extent to which the results of a trial provide a correct basis for generalizations to other circumstances
  • Also called “generalizability or “applicability”
  • Studies can only be applied to clinical settings the same, or similar, to those used in the study
  • There are three components to external validity:
    • population validity – how well the study sample can be extrapolated to the population as a whole (based on randomized sampling)
    • ecological validity – the extent to which the study environment influences results (can the study be replicated in other contexts?)
    • internal/ construct validity – verified relationships between dependent and independent variables
  • Research findings cannot have external validity without being internally valid


Setting of the trial

  • healthcare system
  • country
  • recruitment from primary, secondary or tertiary care
  • selection of participating centers
  • selection of participating clinicians

Selection of patients

  • methods of pre-randomisation diagnosis and investigation
  • eligibility criteria
  • exclusion criteria
  • placebo run-in period
  • treatment run-in period
  • “enrichment” strategies
  • ratio of randomised patients to eligible non-randomised patients in participating centers
  • proportion of patients who decline randomisation

Characteristics of randomised patients

  • baseline clinical characteristics
  • racial group
  • uniformity of underlying pathology
  • stage in the natural history of disease
  • severity of disease
  • comorbidity
  • absolute risk of a poor outcome in the control group

Differences between trial protocol and routine practice

  • trial intervention
  • timing of treatment
  • appropriateness/ relevance of control intervention
  • adequacy of nontrial treatment – both intended and actual
  • prohibition of certain non-trial treatments
  • Therapeutic or diagnostic advances since trial was performed

Outcome measures and follow up

  • clinical relevance of surrogate outcomes
  • clinical relevance, validity, and reproducibility of complex scales
  • effect of intervention on most relevant components of composite outcomes
  • identification of who measured outcome
  • use of patient outcomes
  • frequency of follow up
  • adequacy of length of follow-up

Adverse effects of treatment

  • completeness of reporting of relevant adverse effects
  • rate of discontinuation of treatment
  • selection of trial centers on the basis of skill or experience
  • exclusion of patients at risk of complications
  • exclusion of patients who experienced adverse events during a run in period
  • intensity of trial safety procedures

MEASUREMENT VALIDITY (Downing & Yudkowsky, 2009)

Validity refers to the evidence presented to support or to refute the meaning or interpretation assigned to assessment data or results. It relates to whether a test, tool, instrument or device actually measures what it intends to measure.

Traditionally validity was viewed as a trinatarian concept based on:

  • Construct validity
    • degree to which the the test measures what it is meant to be measuring
    • e.g. the ideal depression score would include different variants of depression and be able to distinguish depression from stress and anxiety
  • Criterion validity
    • the extent to which a measure is related to an outcome, with two components:
      • Concurrent validity – compares measurements with an outcome at the same time (e.g. a concurrent “gold standard” test result)
      • Predictive validity – compares measurements with an outcome at the same time (e.g. do high exam marks predict subsequent incomes?)
  • Content validity
    • the degree to which the content of an instrument is an adequate reflection of all the components of the construct
    • e.g. a schizophrenia score would need to include both positive and negative symptoms

According to current validity theory in psychometrics, validity is a unitary concept and thus construct validity is the only form of validity. For instance in health professions education, validity evidence for assessments comes from (:

  • Content
    • relationship between test content and the construct of interest
    • theory; hypothesis about content
    • independent assessment of match between content sampled and domain of interest
    • solid, scientific, quantitative evidence
  • Response process
    • analysis of individual responses to stimuli
    • debriefing of examinees
    • process studies aimed at understanding what is measured and the soundness of intended score interpretations
    • quality assurance and quality control of assessment data
  • Internal structure (reliability)
    • data internal to assessments such as: reliability or reproducibility of scores; inter-item correlations; statistical characteristics of items; statistical analysis of item option function; factor studies of dimensionality; Differential Item Functioning (DIF) studies
  • Relationship to other variables
    • data external to assessments such as: correlations of assessment variable(s) to external, independent measures; hypothesis and theory driven investigations; correlational research based on previous studies, literature
      • a. Convergent and discriminant evidence: relationships between similar and different measures
      • b. Test-criterion evidence: relationships between test and criterion measure(s)
      • c. Validity generalization: can the validity evidence be generalized? Evidence that the validity studies may generalize to other settings.
  • Consequences
    • intended and unintended consequences of test use
    • differential consequences of test use
    • impact of assessment on students, instructors, schools, society
    • impact of assessments on curriculum; cost/benefit analysis with respect to tradeoff between instructional time and assessment time.


  • Note that strictly speaking we cannot comment on the validity of a test, tool, instrument, or device, only on the measurement that is obtained. This is because the the same test used in a different context (different operator, different subjects, different circumstances, at a different time) may not be valid. In other words, validity evidence applies to the data generated by an instrument, not the instrument itself.
  • Validity can be equated with accuracy, and reliability with precision
  • Face validity is a term commonly used as an indicator of validity – it is essential worthless! It means at ‘face value’, in other words, the degree to which the measure subjectively looks like what it is intended to measure.
  • The higher the stakes of measurement (e.g. test result), the higher the need for validity evidence.
  • You can never have too much validity evidence, but the minimum required varies with purpose (e.g. high stakes fellowship exam versus one of many progress tests)

References and Links

Journal articles and Textbooks

  • Downing SM, Yudkowsky R. (2009) Assessment in health professions education, Routledge, New York.
  • Rothwell PM. Factors that can affect the external validity of randomised controlled trials. PLoS Clin Trials. 2006 May;1(1):e9. [pubmed] [article]
  • Shankar-Hari M, Bertolini G, Brunkhorst FM, et al. Judging quality of current septic shock definitions and criteria. Critical care. 19(1):445. 2015. [pubmed] [article]

CCC 700 6

Critical Care


Chris is an Intensivist and ECMO specialist at the Alfred ICU in Melbourne. He is also a Clinical Adjunct Associate Professor at Monash University. He is a co-founder of the Australia and New Zealand Clinician Educator Network (ANZCEN) and is the Lead for the ANZCEN Clinician Educator Incubator programme. He is on the Board of Directors for the Intensive Care Foundation and is a First Part Examiner for the College of Intensive Care Medicine. He is an internationally recognised Clinician Educator with a passion for helping clinicians learn and for improving the clinical performance of individuals and collectives.

After finishing his medical degree at the University of Auckland, he continued post-graduate training in New Zealand as well as Australia’s Northern Territory, Perth and Melbourne. He has completed fellowship training in both intensive care medicine and emergency medicine, as well as post-graduate training in biochemistry, clinical toxicology, clinical epidemiology, and health professional education.

He is actively involved in in using translational simulation to improve patient care and the design of processes and systems at Alfred Health. He coordinates the Alfred ICU’s education and simulation programmes and runs the unit’s education website, INTENSIVE.  He created the ‘Critically Ill Airway’ course and teaches on numerous courses around the world. He is one of the founders of the FOAM movement (Free Open-Access Medical education) and is co-creator of litfl.com, the RAGE podcast, the Resuscitology course, and the SMACC conference.

His one great achievement is being the father of three amazing children.

On Twitter, he is @precordialthump.

| INTENSIVE | RAGE | Resuscitology | SMACC

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.