Validity of Research and Measurements • LITFL • CCC Research

OVERVIEW

In general terms, validity is “the quality of being true or correct”, it refers to the strength of results and how accurately they reflect the real world. Thus ‘validity’ can have quite different meanings depending on the context!

Reliability is distinct from validity, in that it refers to the consistency or repeatability of results
Two types of validity are considered when critically appraising clinical research studies:
- internal validity
- external validity
Validity applies to an outcome or measurement, not the instrument used to obtain it and is based on ‘validity evidence’

INTERNAL VALIDITY

The extent to which the design and conduct of the trial eliminate the possibility of bias, such that observed effects can be attributed to the independent variable
refers to the accuracy of a trial
a study that lacks internal validity should not applied to any clinical setting
components:
- power calculation
- details of study context and intervention
- avoid loss of follow up
- standardised treatment conditions
- control groups
- objectivity from blinding and data handling
Clinical research can be internally valid despite poor external validity

EXTERNAL VALIDITY

The extent to which the results of a trial provide a correct basis for generalizations to other circumstances
Also called “generalizability or “applicability”
Studies can only be applied to clinical settings the same, or similar, to those used in the study
There are three components to external validity:
- population validity – how well the study sample can be extrapolated to the population as a whole (based on randomized sampling)
- ecological validity – the extent to which the study environment influences results (can the study be replicated in other contexts?)
- internal/ construct validity – verified relationships between dependent and independent variables
Research findings cannot have external validity without being internally valid

FACTORS THAT AFFECT EXTERNAL VALIDITY OF CLINICAL RESEARCH (Rothwell, 2006)

Setting of the trial

healthcare system
country
recruitment from primary, secondary or tertiary care
selection of participating centers
selection of participating clinicians

Selection of patients

methods of pre-randomisation diagnosis and investigation
eligibility criteria
exclusion criteria
placebo run-in period
treatment run-in period
“enrichment” strategies
ratio of randomised patients to eligible non-randomised patients in participating centers
proportion of patients who decline randomisation

Characteristics of randomised patients

baseline clinical characteristics
racial group
uniformity of underlying pathology
stage in the natural history of disease
severity of disease
comorbidity
absolute risk of a poor outcome in the control group

Differences between trial protocol and routine practice

trial intervention
timing of treatment
appropriateness/ relevance of control intervention
adequacy of nontrial treatment – both intended and actual
prohibition of certain non-trial treatments
Therapeutic or diagnostic advances since trial was performed

Outcome measures and follow up

clinical relevance of surrogate outcomes
clinical relevance, validity, and reproducibility of complex scales
effect of intervention on most relevant components of composite outcomes
identification of who measured outcome
use of patient outcomes
frequency of follow up
adequacy of length of follow-up

Adverse effects of treatment

completeness of reporting of relevant adverse effects
rate of discontinuation of treatment
selection of trial centers on the basis of skill or experience
exclusion of patients at risk of complications
exclusion of patients who experienced adverse events during a run in period
intensity of trial safety procedures

MEASUREMENT VALIDITY (Downing & Yudkowsky, 2009)

Validity refers to the evidence presented to support or to refute the meaning or interpretation assigned to assessment data or results. It relates to whether a test, tool, instrument or device actually measures what it intends to measure.

Traditionally validity was viewed as a trinatarian concept based on:

Construct validity
- degree to which the the test measures what it is meant to be measuring
- e.g. the ideal depression score would include different variants of depression and be able to distinguish depression from stress and anxiety
Criterion validity
- the extent to which a measure is related to an outcome, with two components:
  - Concurrent validity – compares measurements with an outcome at the same time (e.g. a concurrent “gold standard” test result)
  - Predictive validity – compares measurements with an outcome at the same time (e.g. do high exam marks predict subsequent incomes?)
Content validity
- the degree to which the content of an instrument is an adequate reflection of all the components of the construct
- e.g. a schizophrenia score would need to include both positive and negative symptoms

According to current validity theory in psychometrics, validity is a unitary concept and thus construct validity is the only form of validity. For instance in health professions education, validity evidence for assessments comes from (:

Content
- relationship between test content and the construct of interest
- theory; hypothesis about content
- independent assessment of match between content sampled and domain of interest
- solid, scientific, quantitative evidence
Response process
- analysis of individual responses to stimuli
- debriefing of examinees
- process studies aimed at understanding what is measured and the soundness of intended score interpretations
- quality assurance and quality control of assessment data
Internal structure (reliability)
- data internal to assessments such as: reliability or reproducibility of scores; inter-item correlations; statistical characteristics of items; statistical analysis of item option function; factor studies of dimensionality; Differential Item Functioning (DIF) studies
Relationship to other variables
- data external to assessments such as: correlations of assessment variable(s) to external, independent measures; hypothesis and theory driven investigations; correlational research based on previous studies, literature
  - a. Convergent and discriminant evidence: relationships between similar and different measures
  - b. Test-criterion evidence: relationships between test and criterion measure(s)
  - c. Validity generalization: can the validity evidence be generalized? Evidence that the validity studies may generalize to other settings.
Consequences
- intended and unintended consequences of test use
- differential consequences of test use
- impact of assessment on students, instructors, schools, society
- impact of assessments on curriculum; cost/benefit analysis with respect to tradeoff between instructional time and assessment time.

Comments:

Note that strictly speaking we cannot comment on the validity of a test, tool, instrument, or device, only on the measurement that is obtained. This is because the the same test used in a different context (different operator, different subjects, different circumstances, at a different time) may not be valid. In other words, validity evidence applies to the data generated by an instrument, not the instrument itself.
Validity can be equated with accuracy, and reliability with precision
Face validity is a term commonly used as an indicator of validity – it is essential worthless! It means at ‘face value’, in other words, the degree to which the measure subjectively looks like what it is intended to measure.
The higher the stakes of measurement (e.g. test result), the higher the need for validity evidence.
You can never have too much validity evidence, but the minimum required varies with purpose (e.g. high stakes fellowship exam versus one of many progress tests)

References and Links

Journal articles and Textbooks

Downing SM, Yudkowsky R. (2009) Assessment in health professions education, Routledge, New York.
Rothwell PM. Factors that can affect the external validity of randomised controlled trials. PLoS Clin Trials. 2006 May;1(1):e9. [pubmed] [article]
Shankar-Hari M, Bertolini G, Brunkhorst FM, et al. Judging quality of current septic shock definitions and criteria. Critical care. 19(1):445. 2015. [pubmed] [article]

Critical Care

Compendium

…more CCC

Chris Nickson

Chris is an Intensivist and ECMO specialist at The Alfred ICU, where he is Deputy Director (Education). He is a Clinical Adjunct Associate Professor at Monash University, the Lead for the Clinician Educator Incubator programme, and a CICM First Part Examiner.

He is an internationally recognised Clinician Educator with a passion for helping clinicians learn and for improving the clinical performance of individuals and collectives. He was one of the founders of the FOAM movement (Free Open-Access Medical education) has been recognised for his contributions to education with awards from ANZICS, ANZAHPE, and ACEM.

His one great achievement is being the father of three amazing children.

On Bluesky, he is @precordialthump.bsky.social and on the site that Elon has screwed up, he is @precordialthump.

| INTENSIVE | RAGE | Resuscitology | SMACC

References and Links

Critical Care

Chris Nickson

Leave a ReplyCancel reply