ehrql / Git / [e988c2] /docs/how-to/examples.md

Models:

philipB/

ehrql

Downloads: 1

[e988c2]: / docs / how-to / examples.md

History

Download this file

950 lines (705 with data), 31.3 kB

How to use this page

You can either read this page from start to end
to get an idea of the kinds of queries you can make with ehrQL.

Or you can use the navigation bar at the top-right of this page,
to see a list of the examples,
and then jump to a specific example of interest.

The examples are organised firstly by the table which they pull data from -
for a more complete guide to the tables, refer to the
Table Schemas section of the
ehrQL documentation.

Understanding these examples

The populations defined with `define_population()`

In each of these examples,
we specify that the population is all patients
via dataset.define_population(patients.exists_for_patient()).

In practice,
you will likely want to adapt an example to filter to a specific population of interest.
Refer to the define_population() documentation.

Some examples using `codelist_from_csv()`

⚠️ Some examples refer to CSV codelists using the
codelist_from_csv function,
but are incomplete.
To actually use these code example,
you will need to correctly complete the function call.
The codelists are not provided as a part of these examples.

For example, instead of:

asthma_codelist = codelist_from_csv("XXX", column="YYY")

you will need a line more like:

asthma_codelist = codelist_from_csv("your-asthma-codelist.csv", column="code")

which provides the filename your-asthma-codelist.csv
and the name of the CSV column with codes.

Using codelists with category columns

Some codelists will have a category column that groups individual codes into categories. For example, this codelist for ethnicity has 2 category columns, which represent categories at both 6 and 16 levels. To make use of these categories, you can use codelist_from_csv() as follows:

ethnicity_codelist = codelist_from_csv("ethnicity_codelist_with_categories", column="snomedcode", category_column="Grouping_6")

If you include an argument for category_column, the codelist returned will be a dictionary mapping individual codes to their respective categories. Without the category_column argument, the codelist returned will be a list of codes.

You can see an example of how to access these categories within your dataset definition below.

Patients

Examples for the patients table.

Finding patient demographics

Finding each patient's sex

from ehrql import create_dataset
from ehrql.tables.core import patients

dataset = create_dataset()
dataset.sex = patients.sex
dataset.define_population(patients.exists_for_patient())

The possible values are "female", "male", "intersex", and "unknown".

Finding each patient's date of birth

from ehrql import create_dataset
from ehrql.tables.core import patients

dataset = create_dataset()
dataset.date_of_birth = patients.date_of_birth
dataset.define_population(patients.exists_for_patient())

Finding each patient's age

from ehrql import create_dataset
from ehrql.tables.core import patients

dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01")
dataset.define_population(patients.exists_for_patient())

Alternatively, using a native Python date:

from datetime import date
from ehrql import create_dataset
from ehrql.tables.core import patients

dataset = create_dataset()
dataset.age = patients.age_on(date(2023, 1, 1))
dataset.define_population(patients.exists_for_patient())

Or using an index_date variable:

from ehrql import create_dataset
from ehrql.tables.core import patients

index_date = "2023-01-01"
dataset = create_dataset()
dataset.age = patients.age_on(index_date)
dataset.define_population(patients.exists_for_patient())

Assigning each patient an age band

from ehrql import create_dataset, case, when
from ehrql.tables.core import patients

dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_band = case(
        when(age < 20).then("0-19"),
        when(age < 40).then("20-39"),
        when(age < 60).then("40-59"),
        when(age < 80).then("60-79"),
        when(age >= 80).then("80+"),
        otherwise="missing",
)
dataset.define_population(patients.exists_for_patient())

Finding each patient's date of death in their primary care record

from ehrql import create_dataset
from ehrql.tables.core import patients

dataset = create_dataset()
dataset.date_of_death = patients.date_of_death
dataset.define_population(patients.exists_for_patient())

:notepad_spiral: This value comes from the patient's EHR record. You can find more information about the accuracy of this value in the reference schema.

ONS Deaths

Examples for the ons_deaths table.

Finding patient demographics

Finding each patient's date, underlying_cause_of_death, and first noted additional medical condition noted on the death certificate from ONS records

from ehrql import create_dataset
from ehrql.tables.core import ons_deaths, patients

dataset = create_dataset()
dataset.date_of_death = ons_deaths.date
dataset.underlying_cause_of_death = ons_deaths.underlying_cause_of_death
dataset.cause_of_death = ons_deaths.cause_of_death_01
dataset.define_population(patients.exists_for_patient())

:notepad_spiral: There are currently multiple cause of death fields. We aim to resolve these to a single feature in the future.

Finding patients with a particular cause of death

The ons_deaths table has multiple "cause of death" fields. Using the
cause_of_death_is_in()
method we can match a codelist against all of these at once.

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import ons_deaths, patients

dataset = create_dataset()

cause_of_death_X_codelist = codelist_from_csv("XXX", column="YYY")

dataset.died_with_X = ons_deaths.cause_of_death_is_in(cause_of_death_X_codelist)
dataset.define_population(patients.exists_for_patient())

Addresses

Examples for the TPP addresses table.

Finding each patient's IMD rank

from ehrql import create_dataset
from ehrql.tables.tpp import addresses, patients

dataset = create_dataset()
dataset.imd = addresses.for_patient_on("2023-01-01").imd_rounded
dataset.define_population(patients.exists_for_patient())

The original IMD ranking is rounded to the nearest 100.
The rounded IMD ranking ranges from 0 to 32,800.

See this code comment about how we choose one address if a patient has multiple registered addresses on the given date.

Calculating each patient's IMD quintile and/or decile

from ehrql import create_dataset
from ehrql.tables.tpp import addresses, patients

dataset = create_dataset()

patient_address = addresses.for_patient_on("2023-01-01")
dataset.imd_quintile = patient_address.imd_quintile
dataset.imd_decile = patient_address.imd_decile
dataset.define_population(patients.exists_for_patient())

Finding each patient's rural/urban classification

from ehrql import create_dataset
from ehrql.tables.tpp import addresses, patients

dataset = create_dataset()
dataset.rural_urban = addresses.for_patient_on("2023-01-01").rural_urban_classification
dataset.define_population(patients.exists_for_patient())

The meaning of this value is as follows:

1 - Urban major conurbation
2 - Urban minor conurbation
3 - Urban city and town
4 - Urban city and town in a sparse setting
5 - Rural town and fringe
6 - Rural town and fringe in a sparse setting
7 - Rural village and dispersed
8 - Rural village and dispersed in a sparse setting

Finding each patient's MSOA

from ehrql import create_dataset
from ehrql.tables.tpp import addresses, patients

dataset = create_dataset()
dataset.msoa_code = addresses.for_patient_on("2023-01-01").msoa_code
dataset.define_population(patients.exists_for_patient())

Finding multiple attributes of each patient's address

from ehrql import create_dataset
from ehrql.tables.tpp import addresses, patients

dataset = create_dataset()
address = addresses.for_patient_on("2023-01-01")
dataset.imd_rounded = address.imd_rounded
dataset.rural_urban_classification = address.rural_urban_classification
dataset.msoa_code = address.msoa_code
dataset.define_population(patients.exists_for_patient())

Practice Registrations

Examples for the practice_registrations table.

Finding each patient's practice's pseudonymised identifier

from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations, patients

dataset = create_dataset()
dataset.practice = practice_registrations.for_patient_on("2023-01-01").practice_pseudo_id
dataset.define_population(patients.exists_for_patient())

Finding each patient's practice's STP

from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations, patients

dataset = create_dataset()
dataset.stp = practice_registrations.for_patient_on("2023-01-01").practice_stp
dataset.define_population(patients.exists_for_patient())

Finding each patient's practice's region

from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations, patients

dataset = create_dataset()
dataset.region = practice_registrations.for_patient_on("2023-01-01").practice_nuts1_region_name
dataset.define_population(patients.exists_for_patient())

Finding multiple attributes of each patient's practice

from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations, patients

dataset = create_dataset()
registration = practice_registrations.for_patient_on("2023-01-01")
dataset.practice = registration.practice_pseudo_id
dataset.stp = registration.practice_stp
dataset.region = registration.practice_nuts1_region_name
dataset.define_population(patients.exists_for_patient())

Excluding patients based on study dates

The following example ensures that the dataset only includes patients registered at a
single practice for the entire duration of the study, plus at least 3 months prior to the
study start.

from ehrql import create_dataset, codelist_from_csv, months
from ehrql.tables.tpp import patients, practice_registrations

study_start_date = "2022-01-01"
study_end_date = "2022-12-31"

dataset = create_dataset()

# find registrations that exist for the full study period, and at least 3 months
# prior
registrations = (
    practice_registrations.where(
        practice_registrations.start_date.is_on_or_before(study_start_date - months(3))
    )
    .except_where(
        practice_registrations.end_date.is_on_or_before(study_end_date)
    )
)

dataset.define_population(registrations.exists_for_patient())

Clinical Events

Examples for the clinical_events table.

Finding patient demographics

Finding each patient's ethnicity

Ethnicity can be defined using a codelist. There are a lot of individual codes that can used to indicate a patients' fine-grained ethnicity. To make analysis more manageable, ethnicity is therefore commonly grouped into higher level categories. Above, we described how you can import codelists that have a category column. You can use a codelist with a category column to map clinical event codes for ethnicity to higher level categories as in this example:

from ehrql import create_dataset
from ehrql.tables.core import clinical_events, patients
from ehrql import codelist_from_csv

dataset = create_dataset()

ethnicity_codelist = codelist_from_csv(
    "ethnicity_codelist_with_categories",
    column="snomedcode",
    category_column="Grouping_6",
)

dataset.latest_ethnicity_code = (
    clinical_events.where(clinical_events.snomedct_code.is_in(ethnicity_codelist))
    .where(clinical_events.date.is_on_or_before("2023-01-01"))
    .sort_by(clinical_events.date)
    .last_for_patient()
    .snomedct_code
)
dataset.latest_ethnicity_group = dataset.latest_ethnicity_code.to_category(
    ethnicity_codelist
)
dataset.define_population(patients.exists_for_patient())

Does each patient have an event matching some criteria?

Does each patient have a clinical event matching a code in a codelist?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_had_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

Does each patient have a clinical event matching a code in a codelist in a time period?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_recent_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_between("2022-07-01", "2023-01-01")
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

What is the first/last event matching some criteria?

The first_for_patient() and last_for_patient() methods can only be used on a sorted frame.
Frames can be sorted by calling the sort_by() method with the column to sort the frame by.

What is the earliest/latest clinical event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.first_asthma_diagnosis_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).first_for_patient().date
dataset.define_population(patients.exists_for_patient())

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.last_asthma_diagnosis_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).last_for_patient().date
dataset.define_population(patients.exists_for_patient())

What is the clinical event, matching some criteria, with the least/greatest value?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

hba1c_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()

hba1c_events = clinical_events.where(
        clinical_events.snomedct_code.is_in(hba1c_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
)

earliest_min_hba1c_event = hba1c_events.sort_by(
        clinical_events.numeric_value, clinical_events.date
).first_for_patient()

earliest_max_hba1c_event = hba1c_events.sort_by(
        # Note the leading minus sign to sort numeric_value in reverse order
        -clinical_events.numeric_value, clinical_events.date
).first_for_patient()

latest_min_hba1c_event = hba1c_events.sort_by(
        # Note the leading minus sign to sort numeric_value in reverse order
        -clinical_events.numeric_value, clinical_events.date
).last_for_patient()

latest_max_hba1c_event = hba1c_events.sort_by(
        clinical_events.numeric_value, clinical_events.date
).last_for_patient()

dataset.date_of_first_min_hba1c_observed = earliest_min_hba1c_event.date
dataset.date_of_first_max_hba1c_observed = earliest_max_hba1c_event.date
dataset.date_of_last_min_hba1c_observed = latest_min_hba1c_event.date
dataset.date_of_last_max_hba1c_observed = latest_max_hba1c_event.date

dataset.value_of_first_min_hba1c_observed = earliest_min_hba1c_event.numeric_value
dataset.value_of_first_max_hba1c_observed = earliest_max_hba1c_event.numeric_value
dataset.value_of_last_min_hba1c_observed = latest_min_hba1c_event.numeric_value
dataset.value_of_last_max_hba1c_observed = latest_max_hba1c_event.numeric_value

dataset.define_population(patients.exists_for_patient())

Getting properties of an event matching some criteria

What is the code of the first/last clinical event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.first_asthma_diagnosis_code = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).first_for_patient().snomedct_code
dataset.define_population(patients.exists_for_patient())

What is the date of the first/last clinical event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.first_asthma_diagnosis_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).first_for_patient().date
dataset.define_population(patients.exists_for_patient())

What is the code and date of the first/last clinical event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
first_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).first_for_patient()
dataset.first_asthma_diagnosis_code = first_asthma_diagnosis.snomedct_code
dataset.first_asthma_diagnosis_date = first_asthma_diagnosis.date
dataset.define_population(patients.exists_for_patient())

Performing arithmetic on numeric values of clinical events

Finding the mean observed value of clinical events matching some criteria

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

hba1c_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.mean_hba1c = clinical_events.where(
        clinical_events.snomedct_code.is_in(hba1c_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).numeric_value.mean_for_patient()
dataset.define_population(patients.exists_for_patient())

Finding events within a date range

Finding events within a fixed date range

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_recent_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_between("2022-07-01", "2023-01-01")
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

Finding events within a date range plus a constant

from ehrql import create_dataset, codelist_from_csv, weeks
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

index_date = "2022-07-01"

dataset = create_dataset()
dataset.has_recent_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_between(index_date, index_date + weeks(2))
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

Finding events within a dynamic date range

from ehrql import create_dataset, codelist_from_csv, months
from ehrql.tables.core import clinical_events, patients

diabetes_codelist = codelist_from_csv("XXX", column="YYY")
hba1c_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
first_diabetes_code_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(diabetes_codelist)
).sort_by(
        clinical_events.date
).first_for_patient().date

dataset.count_of_hba1c_tests_6mo_post_first_diabetes_code = clinical_events.where(
        clinical_events.snomedct_code.is_in(hba1c_codelist)
).where(
        clinical_events.date.is_on_or_between(first_diabetes_code_date, first_diabetes_code_date + months(6))
).count_for_patient()
dataset.define_population(patients.exists_for_patient())

Excluding events which have happened in the future

Data quality issues with many sources may result in events apparently happening in future dates (e.g. 9999-01-01), it is useful to filter these from your analysis.

from datetime import date
from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_recent_asthma_diagnosis = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date > "2022-07-01"
).where(
        clinical_events.date < date.today()
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

Extracting parts of dates and date differences

Finding the year an event occurred

from datetime import date
from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.year_of_first = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).sort_by(
        clinical_events.date
).first_for_patient().date.year
dataset.define_population(patients.exists_for_patient())

Finding the number of weeks between two events

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import clinical_events, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")
asthma_review_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
first_asthma_diagnosis_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).sort_by(clinical_events.date).first_for_patient().date

first_asthma_review_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_review_codelist)
).where(
        clinical_events.date.is_on_or_after(first_asthma_diagnosis_date)
).sort_by(clinical_events.date).first_for_patient().date

dataset.weeks_between_diagnosis_and_review = (first_asthma_review_date - first_asthma_diagnosis_date).weeks
dataset.define_population(patients.exists_for_patient())

Admitted Patient Care Spells (APCS)

Examples for the TPP apcs table.

Does each patient have an event matching some criteria?

Does each patient have a hospitalisation event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.tpp import apcs, patients

cardiac_diagnosis_codes = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_recent_cardiac_admission = apcs.where(
        apcs.primary_diagnosis.is_in(cardiac_diagnosis_codes)
).where(
        apcs.admission_date.is_on_or_between("2022-07-01", "2023-01-01")
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

Medications

Examples for the medications table.

Does each patient have an event matching some criteria?

Does each patient have a medication event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import medications, patients

statin_medications = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.has_recent_statin_prescription = medications.where(
        medications.dmd_code.is_in(statin_medications)
).where(
        medications.date.is_on_or_between("2022-07-01", "2023-01-01")
).exists_for_patient()
dataset.define_population(patients.exists_for_patient())

How many events does each patient have matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import medications, patients

statin_medications = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.number_of_statin_prescriptions_in_last_year = medications.where(
        medications.dmd_code.is_in(statin_medications)
).where(
        medications.date.is_on_or_between("2022-01-01", "2023-01-01")
).count_for_patient()
dataset.define_population(patients.exists_for_patient())

What is the earliest/latest medication event matching some criteria?

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import medications, patients

statin_medications = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.first_statin_prescription_date = medications.where(
        medications.dmd_code.is_in(statin_medications)
).where(
        medications.date.is_on_or_after("2022-07-01")
).sort_by(
        medications.date
).first_for_patient().date
dataset.define_population(patients.exists_for_patient())

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import medications, patients

statin_medications = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
dataset.last_statin_prescription_date = medications.where(
        medications.dmd_code.is_in(statin_medications)
).where(
        medications.date.is_on_or_after("2022-07-01")
).sort_by(
        medications.date
).last_for_patient().date
dataset.define_population(patients.exists_for_patient())

Extracting parts of dates and date differences

Finding prescriptions made in particular months of the year

from ehrql import create_dataset, codelist_from_csv
from ehrql.tables.core import medications, patients

amoxicillin_codelist = codelist_from_csv("XXX", column="YYY")

winter_months = [10,11,12,1,2,3]

dataset = create_dataset()
dataset.winter_amoxicillin_count = medications.where(
        medications.dmd_code.is_in(amoxicillin_codelist)
).where(
        medications.date.month.is_in(winter_months)
).count_for_patient()
dataset.define_population(patients.exists_for_patient())

Finding events occuring close in time to another event

Finding the code of the first medication after the first clinical event matching some criteria

from ehrql import create_dataset, codelist_from_csv, weeks
from ehrql.tables.core import clinical_events, medications, patients

asthma_codelist = codelist_from_csv("XXX", column="YYY")
inhaled_corticosteroid_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()
first_asthma_diagnosis_date = clinical_events.where(
        clinical_events.snomedct_code.is_in(asthma_codelist)
).where(
        clinical_events.date.is_on_or_after("2022-07-01")
).sort_by(
        clinical_events.date
).first_for_patient().date
dataset.first_asthma_diagnosis_date = first_asthma_diagnosis_date
dataset.count_ics_prescriptions_2wks_post_diagnosis = medications.where(
        medications.dmd_code.is_in(inhaled_corticosteroid_codelist)
).where(
        medications.date.is_on_or_between(first_asthma_diagnosis_date,first_asthma_diagnosis_date + weeks(2))
).count_for_patient()
dataset.define_population(patients.exists_for_patient())

Excluding medications for patients who have transferred between practices

Note that in these examples, the periods defined are illustrative only.

Excluding patients based on prescription date

from ehrql import case, create_dataset, codelist_from_csv, when, weeks
from ehrql.tables.tpp import medications, patients, practice_registrations

def meets_registrations_criteria(medication_date):
    # For this medication date, find whether a registration exists where
    # the start date and end dates are within a 12 weeks
    # prior/after to the prescription

    start_cutoff_date = medication_date - weeks(12)
    end_cutoff_date = medication_date + weeks(12)
    return (
        practice_registrations.where(
        practice_registrations.start_date.is_on_or_before(start_cutoff_date)
        )
        .except_where(
        practice_registrations.end_date.is_on_or_before(end_cutoff_date)
        )
        .exists_for_patient()
    )

medication_codelist = codelist_from_csv("XXX", column="YYY")

dataset = create_dataset()

# First relevant prescription per patient
first_prescription = (
    medications.where(
        medications.dmd_code.is_in(medication_codelist)
    )
    .sort_by(medications.date)
    .first_for_patient()
)

# Include only prescriptions that fall within accepatable registration dates
dataset.prescription_date = case(
    when(meets_registrations_criteria(first_prescription.date))
    .then(first_prescription.date)
)
dataset.define_population(patients.exists_for_patient())

Decision support values

Examples for the TPP decision support values table.

Finding the most recent decision support value

Finding each patient's EFI (electronic frailty index)

from ehrql import create_dataset
from ehrql.tables.tpp import decision_support_values

dataset = create_dataset()
latest_efi_record = (
  decision_support_values
    .electronic_frailty_index()
    .sort_by(decision_support_values.calculation_date)
    .last_for_patient()
)
dataset.latest_efi = latest_efi_record.numeric_value
dataset.latest_efi_date = latest_efi_record.calculation_date
dataset.define_population(decision_support_values.exists_for_patient())

950 lines (705 with data), 31.3 kB

How to use this page

Understanding these examples

The populations defined with define_population()

Some examples using codelist_from_csv()

Using codelists with category columns

Patients

Finding patient demographics

Finding each patient's sex

Finding each patient's date of birth

Finding each patient's age

Assigning each patient an age band

Finding each patient's date of death in their primary care record

ONS Deaths

Finding patient demographics

Finding each patient's date, underlying_cause_of_death, and first noted additional medical condition noted on the death certificate from ONS records

Finding patients with a particular cause of death

Addresses

Finding attributes related to each patient's address as of a given date

Finding each patient's IMD rank

Calculating each patient's IMD quintile and/or decile

Finding each patient's rural/urban classification

Finding each patient's MSOA

Finding multiple attributes of each patient's address

Practice Registrations

Finding attributes related to each patient's GP practice as of a given date

Finding each patient's practice's pseudonymised identifier

Finding each patient's practice's STP

Finding each patient's practice's region

Finding multiple attributes of each patient's practice

Excluding patients based on study dates

Clinical Events

Finding patient demographics

Finding each patient's ethnicity

Does each patient have an event matching some criteria?

Does each patient have a clinical event matching a code in a codelist?

Does each patient have a clinical event matching a code in a codelist in a time period?

What is the first/last event matching some criteria?

What is the earliest/latest clinical event matching some criteria?

What is the clinical event, matching some criteria, with the least/greatest value?

Getting properties of an event matching some criteria

What is the code of the first/last clinical event matching some criteria?

What is the date of the first/last clinical event matching some criteria?

What is the code and date of the first/last clinical event matching some criteria?

Performing arithmetic on numeric values of clinical events

Finding the mean observed value of clinical events matching some criteria

Finding events within a date range

Finding events within a fixed date range

Finding events within a date range plus a constant

Finding events within a dynamic date range

Excluding events which have happened in the future

Extracting parts of dates and date differences

Finding the year an event occurred

Finding the number of weeks between two events

Admitted Patient Care Spells (APCS)

Does each patient have an event matching some criteria?

Does each patient have a hospitalisation event matching some criteria?

Medications

Does each patient have an event matching some criteria?

Does each patient have a medication event matching some criteria?

How many events does each patient have matching some criteria?

What is the earliest/latest medication event matching some criteria?

Extracting parts of dates and date differences

Finding prescriptions made in particular months of the year

Finding events occuring close in time to another event

Finding the code of the first medication after the first clinical event matching some criteria

Excluding medications for patients who have transferred between practices

Excluding patients based on prescription date

Decision support values

Finding the most recent decision support value

Finding each patient's EFI (electronic frailty index)

The populations defined with `define_population()`

Some examples using `codelist_from_csv()`