What Is A Cohort Study?

A cohort study is a particular form of longitudinal study that sample a cohort (a group of people who share a defining characteristic, typically those who experienced a common event in a selected period, such as birth or graduation), performing a cross-section at intervals through time. While a cohort study is a panel study, a panel study is not always a cohort study as individuals in a panel study do not always share a common characteristic.

Cohort studies represent one of the fundamental designs of epidemiology which are used in research in the fields of medicine, nursing, psychology, social science, and in any field reliant on ‘difficult to reach’ answers that are based on evidence (statistics).

In medicine for instance, while clinical trials are used primarily for assessing the safety of newly developed pharmaceuticals before they are approved for sale, epidemiological analysis on how risk factors affect the incidence of diseases is often used to identify the causes of diseases in the first place, and to help provide pre-clinical justification for the plausibility of protective factors (treatments).

Cohort studies differ from clinical trials in that no intervention, treatment, or exposure is administered to participants in a cohort design; and no control group is defined. Rather, cohort studies are largely about the life histories of segments of populations, and the individual people who constitute these segments. Exposures or protective factors are identified as preexisting characteristics of participants.

The study is controlled by including other common characteristics of the cohort in the statistical analysis. Both exposure/treatment and control variables are measured at baseline.

Participants are then followed over time to observe the incidence rate of the disease or outcome in question. Regression analysis can then be used to evaluate the extent to which the exposure or treatment variable contributes to the incidence of the disease, while accounting for other variables that may be at play.

Hierarchy Of Evidence

Double-blind randomized controlled trials (RCTs) are generally considered superior methodology in the hierarchy of evidence in treatment, because they allow for the most control over other variables that could affect the outcome, and the randomization and blinding processes reduce bias in the study design. This minimizes the chance that results will be influenced by confounding variables, particularly ones that are unknown.

However, educated hypotheses based on prior research and background knowledge are used to select variables to be included in the regression model for cohort studies, and statistical methods can be used to identify and account potential confounders from these variables. Bias can also be mitigated in a cohort study when selecting participants for the cohort.

It is also important to note that RCTs may not be suitable in all cases; such as when the outcome is a negative health effect and the exposure is hypothesized to be a risk factor for the outcome. Ethical standards, and morality, would prevent the use of risk factors in RCTs. The natural or incidental exposure to these risk factors (e.g. time spent in the sun), or self-administered exposure (e.g. smoking), can measured without subjecting participants to risk factors outside of their individual lifestyles, habits, and choices.

A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are currently living, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort.

The comparison group may be the general population from which the cohort is drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.

Cohort Study Applications

In medicine, a cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected association between cause and effect; failure to refute a hypothesis often strengthens confidence in it. Crucially, the cohort is identified before the appearance of the disease under investigation.

The study groups follow a group of people who do not have the disease for a period of time and see who develops the disease (new incidence). The cohort cannot therefore be defined as a group of people who already have the disease.

Prospective (longitudinal) cohort studies between exposure and disease strongly aid in studying causal associations, though distinguishing true causality usually requires further corroboration from further experimental trials.

The advantage of prospective cohort study data is that it can help determine risk factors for contracting a new disease because it is a longitudinal observation of the individual through time, and the collection of data at regular intervals, so recall error is reduced. However, cohort studies are expensive to conduct, are sensitive to attrition and take a long follow-up time to generate useful data.

Nevertheless, the results that are obtained from long-term cohort studies are of substantially superior quality to those obtained from retrospective/cross-sectional studies. Prospective cohort studies are considered to yield the most reliable results in observational epidemiology. They enable a wide range of exposure-disease associations to be studied.

Some cohort studies track groups of children from their birth, and record a wide range of information (exposures) about them. The value of a cohort study depends on the researchers’ capacity to stay in touch with all members of the cohort. Some studies have continued for decades.

In a cohort study, the population under investigation consists of individuals who are at risk of developing a specific disease or health outcome.

Cohort Study Examples

An example of an epidemiological question that can be answered using a cohort study is whether exposure to X (say, smoking) associates with outcome Y (say, lung cancer). In 1951 commenced the British Doctors Study, a cohort that included both smokers (the exposed group) and non-smokers (the unexposed group). The study continued through 2001. By 1956, the study provided convincing proof of the association of smoking with the incidence of lung cancer.

In a cohort study, the groups are matched in terms of many other variables such as economic status and other health status so that the variable being assessed, the independent variable (in this case, smoking) can be isolated as the cause of the dependent variable (in this case, lung cancer). In this example, a statistically significant increase in the incidence of lung cancer in the smoking group as compared to the non-smoking group is evidence in favor of the hypothesis.

However, rare outcomes, such as lung cancer, are generally not studied with the use of a cohort study, but are rather studied with the use of a case-control study.

Shorter term studies are commonly used in medical research as a form of clinical trial, or means to test a particular hypothesis of clinical importance. Such studies typically follow two groups of patients for a period of time and compare an endpoint or outcome measure between the two groups.

Two examples of cohort studies that have been going on for more than 50 years are the Framingham Heart Study and the National Child Development Study (NCDS), the most widely researched of the British birth cohort studies. Key findings of NCDS and a detailed profile of the study appear in the International Journal of Epidemiology.

The Dunedin Longitudinal Study, started in 1975, has been studying the thousand people born in Dunedin, New Zealand in 1972-73. The subjects are interviewed regularly, with Phase 45 starting in 2017.

The largest cohort study in women is the Nurses’ Health Study. Started in 1976, it is tracking over 120,000 nurses and has been analyzed for many different conditions and outcomes.

The largest cohort study in Africa is the Birth to Twenty Study, which began in 1990 and tracks a cohort of over 3,000 children born in the weeks following Nelson Mandela’s release from prison.

Other famous examples are the Grant Study tracking a number of Harvard graduates from ca. 1950.77, the Whitehall Study tracking 10,308 British civil servants, and the Caerphilly Heart Disease Study, which since 1979 has studied a representative sample of 2,512 men, drawn from the Welsh town of Caerphilly.

Porta, Miquel, ed. (2014)
A Dictionary of Epidemiology
New York: Oxford University Press. ISBN: 978-0199976737