Download this file

3865 lines (3161 with data), 132.1 kB

tpp schema

Available on backends: TPP

This schema defines the data (both primary care and externally linked) available in the
OpenSAFELY-TPP backend. For more information about this backend, see
"SystmOne Primary Care".

from ehrql.tables.tpp import (
    addresses,
    apcs,
    apcs_cost,
    appointments,
    clinical_events,
    clinical_events_ranges,
    covid_therapeutics,
    decision_support_values,
    ec,
    ec_cost,
    emergency_care_attendances,
    ethnicity_from_sus,
    household_memberships_2020,
    medications,
    occupation_on_covid_vaccine_record,
    ons_deaths,
    opa,
    opa_cost,
    opa_diag,
    opa_proc,
    open_prompt,
    parents,
    patients,
    practice_registrations,
    sgss_covid_all_tests,
    ukrr,
    vaccinations,
    wl_clockstops,
    wl_openpathways,
)

many rows per patient

addresses

Geographic characteristics of the home address a patient registers with a practice.
Each row in this table is one registration period per patient.
Occasionally, a patient has multiple active registrations on a given date.
The postcode from the address is mapped to an Output Area,
from which other larger geographic representations can be derived
(see various ONS publications for more detail).

Example ehrQL usage of addresses

Columns
address_id 🔗 integer

Unique address identifier.

start_date 🔗 date

Date patient moved to address.

end_date 🔗 date

Date patient moved out of address.

address_type 🔗 integer

Type of address:

  • 0 - Permanent
  • 1 - Temporary
  • 3 - Correspondence only

  • Possible values: 0, 1, 3

rural_urban_classification 🔗 integer

Rural urban classification:

  • 1 - Urban major conurbation
  • 2 - Urban minor conurbation
  • 3 - Urban city and town
  • 4 - Urban city and town in a sparse setting
  • 5 - Rural town and fringe
  • 6 - Rural town and fringe in a sparse setting
  • 7 - Rural village and dispersed
  • 8 - Rural village and dispersed in a sparse setting

  • Always >= 1 and <= 8

imd_rounded 🔗 integer

Index of Multiple Deprivation (IMD)
rank of each lower layer super output area (LSOA), rounded to the nearest 100, where
lower values represent more deprived areas. E.g. 1 is the most deprived LSOA in the country
and 32,844 is the least deprived (though in this field these are rounded to 0 and 32,800
respectively)

  • Always >= 0, <= 32800, and a multiple of 100
imd_quintile 🔗 string

Index of Multiple Deprivation (IMD) LSOA rank mapped to quintiles. NB this does not
return an integer 1-5, instead it is a string to make clear that 1 is the most, and 5 is the
least deprived. Possible values are:

  • 1 (most deprived)
  • 2
  • 3
  • 4
  • 5 (least deprived)
  • unknown

The number of lower layer super output areas (LSOAs) in 2011 was 32,844. As this is not divisible
by 5 the number of LSOAs in each quintile is not the same. We have used the same boundaries as
provided in the data available to download here with the first quintile containing
6,568 LSOAs, and the other 4 quintiles containing 6,569 LSOAs.

View definition
imd = addresses.imd_rounded
return case(
    # Although the lowest IMD rank is 1, we need to check >= 0 because
    # we're using the imd_rounded field rather than the actual imd
    when((imd >= 0) & (imd <= int(32844 * 1 / 5))).then("1 (most deprived)"),
    when(imd <= int(32844 * 2 / 5)).then("2"),
    when(imd <= int(32844 * 3 / 5)).then("3"),
    when(imd <= int(32844 * 4 / 5)).then("4"),
    when(imd <= int(32844 * 5 / 5)).then("5 (least deprived)"),
    otherwise="unknown",
)
imd_decile 🔗 string

Index of Multiple Deprivation (IMD) LSOA rank mapped to deciles. NB this does not
return an integer 1-10, instead it is a string to make clear that 1 is the most, and 10 is the
least deprived. Possible values are:

  • 1 (most deprived)
  • 2
  • ...
  • 9
  • 10 (least deprived)
  • unknown

The number of lower layer super output areas (LSOAs) in 2011 was 32,844. As this is not divisible
by 10 the number of LSOAs in each decile is not the same. We have used the same boundaries as
provided in the data available to download here with deciles 1, 2, 4, 6, 7 and 9
containing 3,284 LSOAs, and deciles 3, 5, 8 and 10 containing 3,285

View definition
imd = addresses.imd_rounded
return case(
    # Although the lowest IMD rank is 1, we need to check >= 0 because
    # we're using the imd_rounded field rather than the actual imd
    when((imd >= 0) & (imd <= int(32844 * 1 / 10))).then("1 (most deprived)"),
    when(imd <= int(32844 * 2 / 10)).then("2"),
    when(imd <= int(32844 * 3 / 10)).then("3"),
    when(imd <= int(32844 * 4 / 10)).then("4"),
    when(imd <= int(32844 * 5 / 10)).then("5"),
    when(imd <= int(32844 * 6 / 10)).then("6"),
    when(imd <= int(32844 * 7 / 10)).then("7"),
    when(imd <= int(32844 * 8 / 10)).then("8"),
    when(imd <= int(32844 * 9 / 10)).then("9"),
    when(imd <= int(32844 * 10 / 10)).then("10 (least deprived)"),
    otherwise="unknown",
)
msoa_code 🔗 string

Middle Layer Super Output Areas (MSOA) code.

  • Matches regular expression: E020[0-9]{5}
has_postcode 🔗 boolean

Indicating whether a valid postcode is recorded for the patient.

care_home_is_potential_match 🔗 boolean

Indicating whether the patient's address matched with a care home, using TPP's algorithm.

care_home_requires_nursing 🔗 boolean

Indicating whether the patient's address matched with a care home that required nursing.

care_home_does_not_require_nursing 🔗 boolean

Indicating whether the patient's address matched with a care home that did not require nursing.

Methods
for_patient_on(date) 🔗

Return each patient's registered address as it was on the supplied date.

Where there are multiple registered addresses we prefer any which have a known
postcode (though we never have access to this postcode) as this is used by TPP
to cross-reference other data associated with the address, such as the MSOA or
index of multiple deprevation.

Where there are multiple of these we prefer the most recently registered address
and then, if there are multiple of these, the one with the longest duration. If
there's stil an exact tie we choose arbitrarily based on the address ID.


View method definition

spanning_addrs = addresses.where(addresses.start_date <= date).except_where(
    addresses.end_date < date
)
ordered_addrs = spanning_addrs.sort_by(
    case(when(addresses.has_postcode).then(1), otherwise=0),
    addresses.start_date,
    addresses.end_date,
    addresses.address_id,
)
return ordered_addrs.last_for_patient()
</details>

many rows per patient

apcs

Admitted Patient Care Spells (APCS) data is provided via the NHS Secondary Uses Service.

This table gives core details of spells.

Each row is an in-hospital spell: a period of continuous care within a single trust.

Refer to the OpenSAFELY documentation on the APCS data source
and the GitHub issue discussing more of the background context.

Example ehrQL usage of apcs

Columns
apcs_ident 🔗 integer

Unique identifier for the spell used across the APCS tables.

  • Never NULL
admission_date 🔗 date

The admission date of the hospital provider spell.

discharge_date 🔗 date

The date of discharge from a hospital provider spell.

discharge_destination 🔗 string

The destination of a patient on completion of a hospital provider spell, or that the patient died or was a stillbirth. The code is a 2 character string (e.g. 19 = usual place of residence, 54 = NHS run care home). Refer to this documentation for a full list of codes.

discharge_method 🔗 string

The method of discharge from a hospital provider spell. Currently the code is a 1 character string (e.g. 1 = discharged on clinical advice, 4 = died). Refer to this documentation for a full list of codes.

spell_core_hrg_sus 🔗 string

The core Healthcare Resource Group (HRG) code for the spell according to the derivations made by NHS Digital prior to import to the National Commissioning Data Repository (NCDR). HRGs are used to assign baseline tariff costs.

admission_method 🔗 string

Code identifying admission method. Refer to APCS data source documentation for details of codes.

primary_diagnosis 🔗 ICD-10 code

Code indicating primary diagnosis. This is not necessarily the primary reason for admission, and could represent an escalation/complication of initial reason for admission.

secondary_diagnosis 🔗 ICD-10 code

Code indicating secondary diagnosis. This is a single code giving the first listed secondary diagnosis, but there may other secondary diagnoses listed in the all_diagnoses field below.

all_diagnoses 🔗 Multiple ICD-10 codes

List of all diagnoses as ICD-10 codes.

Note that the codes are not quite in the standard ICD-10 format in that they
omit the dot character e.g. instead of I80.1 it will be written I801.

The codes are arranged in clusters separated by commas, with each cluster
separated by two pipe characters (||). These separators may or may not be
surrounded by spaces. For example:

||E119 ,J849 ,K869 ,M069 ,Z824 ,Z867 ||I801 ,I802 ,N179 ,N183

A hospital "spell" is made up of 1 or more "episodes". The || is the separator
between episodes. I.e. the example above is a spell of two episodes with 6
diagnosis codes recorded in the first episode, and 4 recorded in the second.

This field can be queried using the
contains method.
This uses simple substring matching to find a code anywhere inside the
field. For example, to match the code N17.1 (Acute renal failure with
acute cortical necrosis) you could use:

apcs.where(apcs.all_diagnoses.contains("N171"))

You can take advantage of the hierarchical structure of ICD-10 by searching
the just the prefix of a code. For example to match all N17 (Acute renal
failure) codes you could use:

apcs.where(apcs.all_diagnoses.contains("N17"))

Finally there is also
contains_any_of.
So if you were looking for any of a list of ICD10 codes called icd10_diagnosis_codes you
could do:

apcs.where(apcs.all_diagnoses.contains_any_of(icd10_diagnosis_list))
all_procedures 🔗 Multiple OPCS4 codes

List of all procedures as OPCS-4 codes.

Note that the codes are not quite in the standard OPCS-4 format in that they
omit the dot character e.g. instead of W23.2 it will be written W232.

The codes are arranged in clusters separated by commas, with each cluster
separated by two pipe characters (||). These separators may or may not be
surrounded by spaces. For example:

||E851,T124,X403||Y532,Z921

A hospital "spell" is made up of 1 or more "episodes". The || is the separator
between episodes. I.e. the example above is a spell of two episodes with 3
procedure codes recorded in the first episode, and 2 recorded in the second.

This field can be queried using the
contains method.
This uses simple substring matching to find a code anywhere inside the
field. For example, to match the code W23.2 (Secondary open reduction of
fracture of bone and extramedullary fixation HFQ) you could use:

apcs.where(apcs.all_procedures.contains("W232"))

You can take advantage of the hierarchical structure of OPCS-4 by searching
the just the prefix of a code. For example to match all X30 (Injection of therapeutic substance)
codes you could use:

apcs.where(apcs.all_procedures.contains("X30"))

Finally there is also
contains_any_of.
So if you were looking for any of a list of OPCS-4 codes called opcs4_procedure_codes you
could do:

apcs.where(apcs.all_procedures.contains_any_of(opcs4_procedure_list))
days_in_critical_care 🔗 integer

Number of days spent in critical care. This is counted in number of days (or part-days) not the number of nights as per normal "length of stay" calculations. Note the definition of critical care may vary between trusts.

patient_classification 🔗 string

Refer to APCS data source documentation for details.

many rows per patient

apcs_cost

Admitted Patient Care Spells (APCS) data is provided via the NHS Secondary Uses Service.

This table gives details of spell cost.

Each row is an in-hospital spell: a period of continuous care within a single trust.

Note that data only goes back a couple of years.

Columns
apcs_ident 🔗 integer

Unique identifier for the spell used across the APCS tables.

  • Never NULL
grand_total_payment_mff 🔗 float

The grand total payment for the activity (Net_SLA_Payment + Tariff_MFF_Payment) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment).

tariff_initial_amount 🔗 float

The base national tariff.

tariff_total_payment 🔗 float

The total payment according to the national tariff.

admission_date 🔗 date

The admission date of the hospital provider spell.

discharge_date 🔗 date

The date of discharge from a hospital provider spell.

many rows per patient

appointments

Appointments in primary care.

!!! warning
In TPP this data comes from the "Appointment" table. This table has not yet been
well characterised, so there are some issues around how to interpret findings
from it. The data contains records created when an appointment is made with a GP
practice, but may not capture absolutely all GP/patient interactions, for
example it's uncertain whether an ad-hoc call to a patient would be included.
There are also duplicate events in the table that we need to better understand.

As a consequence, if you try to use the appointment table, you will see warnings
when running your code locally, and failures when the GitHub action tests your
code. If you need access to the appointments data, please speak to your
OpenSAFELY co-pilot. We will be considering projects on a case by case basis
until it can enter the normal stable pool of data.

A **very important** caveat for this data: there are some circumstances where
historical appointment records will be incomplete, for example when a patient
moves from a practice using a different EHR provider, or when a practice changes
EHR provider. If your study could be negatively affected by such missing data,
it may be important to use the
[`practice_registrations.spanning_with_systmone()`](#practice_registrations.spanning_with_systmone)
method to identify patients which have a suitably continuous practice
registration during the study period.

Some further investigation of the appointments data in TPP can be found in this
King's fund report
.

And you can find out more about the associated database table in the short data report.
It shows:

  • Date ranges for booked_date, start_date, and seen_date
  • Row counts by month for booked_date and start_date
  • The distribution of lead times (start_date - booked_date)
  • Row counts for each value of status

To view it, you will need a login for OpenSAFELY Jobs and the Project Collaborator
or Project Developer role for the project. The
workspace shows when the code that comprises the report was run;
the code itself is in the appointments-short-data-report
repository on GitHub.

Appointments vs Consultations

"Consultation" is a very broad concept in SystmOne. It covers the things you might
expect, like a patient sitting down in front of a GP. But it also covers things like
some new pathology results arriving. There is no direct, explicit relationship
between an appointment and a consultation; but if an appointment results in any form
of recorded patient interaction then a corresponding consultation will be created.

The only way to link appointments and consultation is via the date they happened.
For instance, given some appointments you can find events which occurred on the same
day using:

clinical_events.where(clinical_events.date.is_in(appointments.date))

!!! tip
Querying this table is similar to using Cohort Extractor's
patients.with_gp_consultations function. However, that function filters by
the status of the appointment. To achieve a similar result with this table:

```py
appointments.where(
    appointments.status.is_in([
        "Arrived",
        "In Progress",
        "Finished",
        "Visit",
        "Waiting",
        "Patient Walked Out",
    ])
)
```
Columns
booked_date 🔗 date

The date the appointment was booked

start_date 🔗 date

The date the appointment was due to start

seen_date 🔗 date

The date the patient was seen

status 🔗 string

The status of the appointment

  • Possible values: Booked, Arrived, Did Not Attend, In Progress, Finished, Requested, Blocked, Visit, Waiting, Cancelled by Patient, Cancelled by Unit, Cancelled by Other Service, No Access Visit, Cancelled Due To Death, Patient Walked Out

many rows per patient

clinical_events

Each record corresponds to a single clinical or consultation event for a patient.

Each event is recorded twice: once with a CTv3 code, and again with the equivalent
SNOMED-CT code. Each record will have only one of the ctv3_code or snomedct_code
columns set and the other will be null. This allows you to query the table using
either a CTv3 codelist or SNOMED-CT codelist and all records using the other coding
system will be effectively ignored.

Note that event codes do not change in this table. If an event code in the coding
system becomes inactive, the event will still be coded to the inactive code.
As such, codelists should include all relevant inactive codes.

Detailed information on onward referrals is not currently available. A subset of
referrals are recorded in the clinical events table but this data will be incomplete.

Columns
date 🔗 date
snomedct_code 🔗 SNOMED-CT code
ctv3_code 🔗 CTV3 (Read v3) code
numeric_value 🔗 float
consultation_id 🔗 integer

ID of the consultation associated with this event

many rows per patient

clinical_events_ranges

Each record corresponds to a single clinical or consultation event for a patient,
as presented in clinical_events, but with additional fields regarding the event's
numeric_value.

!!! warning
Use of this table carries a severe performance penalty and should only be
done so if the additional fields it provides are neccesary for a study.

These additional fields are:

  • any comparators (if present) recorded with an event's numeric_value (e.g. '<9.5')
  • the lower bound of the reference range associated with an event's numeric_value
  • the upper bound of the reference range associated with an event's numeric_value
Columns
date 🔗 date
snomedct_code 🔗 SNOMED-CT code
ctv3_code 🔗 CTV3 (Read v3) code
numeric_value 🔗 float
lower_bound 🔗 float

The lower bound of the reference range associated with an event's
numeric_value

upper_bound 🔗 float

The upper bound of the reference range associated with an event's
numeric_value

comparator 🔗 string

If an event's numeric_value is returned with a comparator, e.g. as '<9.5',
then this column contains that comparator

  • Possible values: ~, =, >=, >, <, <=
consultation_id 🔗 integer

ID of the consultation associated with this event

many rows per patient

covid_therapeutics

The COVID Therapeutics dataset contains information on COVID treatments used in inpatient
and outpatient settings.

Metadata

  • Data provider NHS England
  • Participation / Coverage Inpatients and outpatients treated with antivirals/nMABs for COVID-19 in England
  • Provenance Data sourced largely from BlueTeq system (forms completed by clinicians)
  • Update frequency in OpenSAFELY Approximately weekly
  • Delay between event occurring and event appearing in OpenSAFELY Approximately 2-9 days
  • Collected information Treatment start date; therapeutic intervention; COVID indication, current status, risk group, region

Overview

Antivirals and neutralising monoclonal antibodies (nMABs) for COVID-19 can be
administered in inpatient setting or, for outpatients, in COVID Medicine Delivery
Units (CMDUs) specifically set up for this purpose. For patients considered for
these treatments, clinicians submit completed forms to NHS England. Each row
represents one completed form for one course of treatment. Data received by
OpenSAFELY currently covers patients who were approved for treatment. The patient
may or may not have actually received the treatment or completed the course (but we
assume that they usually do). They may have another form completed for another
treatment, either because it was decided to give them a different treatment, or for
some other reason. They may in theory also have another form completed some months
later for another instance of infection.

Treatment dates may be in the past or future at the point when the form is
submitted.

Note that this dataset may contain duplicate rows – full duplicates are removed
but there may remain some partial duplicates.

More Information

Columns
covid_indication 🔗 string

Treatment setting/indication.

  • Possible values: non_hospitalised, hospitalised_with, hospital_onset
current_status 🔗 string

Status of form/application.

  • Possible values: Approved, Treatment Complete, Treatment Not Started, Treatment Stopped
intervention 🔗 string

Intervention or therapeutic name. Expected to be one of:

  • Baricitinib
  • Casirivimab and imdevimab
  • Molnupiravir
  • Paxlovid
  • Remdesivir
  • sarilumab (sic)
  • Sotrovimab
  • Tocilizumab
received 🔗 date

Date form submitted.

region 🔗 string

NHS England region in which the CMDU submitting the form is located.

risk_cohort 🔗 string

High-risk group to which the patient was considered to belong. Derived from
tick-boxes. Multiple groups can be selected and will be comma separated,
e.g. liver disease,rare neurological conditions.

This series only contains data for events where the intervention was one of
Sotroviman, Molnupiravir, or Casirivimab & imdevimab.

The available groups as at the time of writing are listed below. However
note that the precise wording used has changed over time and so filtering by
a specific disease name may not be reliable.

  • Downs syndrome
  • HIV or AIDS
  • IMID
  • haematologic malignancy
  • haematological diseases
  • immune deficiencies
  • liver disease
  • primary immune deficiencies
  • rare neurological conditions
  • rare neurological diseases
  • renal disease
  • sickle cell disease
  • solid cancer
  • solid organ recipients
  • stem cell transplant recipients
treatment_start_date 🔗 date

Entered by the clinician and can represent either a future planned start
date or a past date at the time of form submission.

many rows per patient

decision_support_values

Returns values computed by decision support algorithms, for example the
Electronic Frailty Index (EFI)

Columns
calculation_date 🔗 date

Date of calculation for the decision support algorithm.

numeric_value 🔗 float

The value computed by the decision support algorithm

algorithm_description 🔗 string

The description of the decision support algorithm.

algorithm_version 🔗 string

The version of the decision support algorithm.

Methods
electronic_frailty_index() 🔗

Returns every calculated electronic frailty index v1 (EFI) for each patient.


View method definition

return decision_support_values.where(
    decision_support_values.algorithm_description == "UK Electronic Frailty Index (eFI)"
).where(decision_support_values.algorithm_version == "1.0")
</details>

many rows per patient

ec

Emergency care attendances data — the Emergency Care Data Set (ECDS) —
is provided via the NHS Secondary Uses Service.

This table gives core details of attendances.

Refer to the OpenSAFELY documentation on the ECDS data source
and the GitHub issue that discusses more of the background context.

Columns
ec_ident 🔗 integer

Unique identifier for the attendance used across the EC tables.

  • Never NULL
arrival_date 🔗 date

The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.

sus_hrg_code 🔗 string

The core Healthcare Resource Group (HRG) code derived by sus+, used for tariff application.

  • Matches regular expression: [a-zA-Z]{2}[0-9]{2}[a-zA-Z]

many rows per patient

ec_cost

Emergency care attendances data is provided via the NHS Secondary Uses Service.

This table gives details of attendance costs.

Columns
ec_ident 🔗 integer

Unique identifier for the attendance used across the EC tables.

  • Never NULL
grand_total_payment_mff 🔗 float

The grand total payment for the activity (Net_SLA_Payment + Tariff_MFF_Payment) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment).

tariff_total_payment 🔗 float

The total payment according to the national tariff.

arrival_date 🔗 date

The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.

ec_decision_to_admit_date 🔗 date

The date a decision to admit was made (if applicable).

ec_injury_date 🔗 date

The date the patient was injured (if applicable).

many rows per patient

emergency_care_attendances

Emergency care attendances data is provided via the NHS Secondary Uses Service.

This table gives details of attendances.

Note that there is a limited number of diagnoses allowed within this dataset,
and so will not match with the range of diagnoses allowed in other datasets
such as the primary care record.

Columns
id 🔗 integer

Unique identifier for the attendance used across the EC tables.

  • Never NULL
arrival_date 🔗 date

The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.

discharge_destination 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the intended destination of the patient following discharge from the emergency care department.

diagnosis_01 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_02 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_03 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_04 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_05 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_06 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_07 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_08 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_09 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_10 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_11 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_12 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_13 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_14 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_15 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_16 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_17 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_18 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_19 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_20 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_21 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_22 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_23 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

diagnosis_24 🔗 SNOMED-CT code

The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.

one row per patient

ethnicity_from_sus

This finds the most frequently used national ethnicity code for each patient from
the various SUS (Secondary Uses Service) tables.

Specifically it uses ethnicity codes from the following tables:

APCS (In-patient hospital admissions)
EC (A&E attendances)
OPA (Out-patient hospital appointments)

Codes are as defined by "Ethnic Category Code 2001" — the 16+1 ethnic data
categories used in the 2001 census:
https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html

Codes beginning Z ("Not stated") and 99 ("Not known") are excluded.

Where there is a tie for the most common code the lexically greatest code is used.

Columns
code 🔗 string

First character of recorded ethncity code (national code):
https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html

  • Possible values: A, B, C, D, E, F, G, H, J, K, L, M, N, P, R, S

one row per patient

household_memberships_2020

Inferred household membership as of 2020-02-01, as determined by TPP using an as yet
undocumented algorithm.

Columns
household_pseudo_id 🔗 integer
household_size 🔗 integer

many rows per patient

medications

The medications table provides data about prescribed medications in primary care.

Prescribing data, including the contents of the medications table are standardised
across clinical information systems such as SystmOne (TPP). This is a requirement
for data transfer through the
Electronic Prescription Service
in which data passes from the prescriber to the pharmacy for dispensing.

Medications are coded using
dm+d codes.
The medications table is structured similarly to the clinical_events
table, and each row in the table is made up of a patient identifier, an event (dm+d)
code, and an event date. For this table, the event refers to the issue of a medication
(coded as a dm+d code), and the event date, the date the prescription was issued.

Factors to consider when using medications data

Depending on the specific area of research, you may wish to exclude medications
in particular periods. For example, in order to ensure medication data is stable
following a change of practice, you may want to exclude patients for a period after
the start of their practice registration . You may also want to
exclude medications for patients for a period prior to their leaving a practice.
Alternatively, for research looking at a specific period of
interest, you may simply want to ensure that all included patients were registered
at a single practice for a minimum time prior to the study period, and were
registered at the same practice for the duration of the study period.

Examples of using ehrQL to calculation such periods can be found in the documentation
on how to
use ehrQL to answer specific questions using the medications table

Columns
date 🔗 date
dmd_code 🔗 dm+d code
consultation_id 🔗 integer

ID of the consultation associated with this event

many rows per patient

occupation_on_covid_vaccine_record

This data is from the NHS England COVID-19 data store,
and reflects information collected at the point of vaccination
where recipients are asked by vaccination staff
whether they are in the category of health and care worker.

Refer to the OpenSAFELY database build report
to see when this data was last updated.

See the GitHub issue that discusses more of the background context.

Columns
is_healthcare_worker 🔗 boolean

one row per patient

ons_deaths

Registered deaths

Date and cause of death based on information recorded when deaths are
certified and registered in England and Wales from February 2019 onwards.
The data provider is the Office for National Statistics (ONS).
This table is updated approximately weekly in OpenSAFELY.

This table includes the underlying cause of death and up to 15 medical conditions
mentioned on the death certificate. These codes (cause_of_death_01 to
cause_of_death_15) are not ordered meaningfully.

More information about this table can be found in following documents provided by the ONS:

In the associated database table ONS_Deaths,
a small number of patients have multiple registered deaths.
This table contains the earliest registered death.
The ehrql.tables.raw.core.ons_deaths table contains all registered deaths.

!!! warning
There is also a lag in ONS death recording caused amongst other things by things
like autopsies and inquests delaying reporting on cause of death. This is
evident in the OpenSAFELY historical database coverage
report

Example ehrQL usage of ons_deaths

TPP specific information

!!! tip
Note that this version of the table, which includes a place of death field, is
only available in the tpp schema and not the core schema.

Columns
date 🔗 date

Patient's date of death.

underlying_cause_of_death 🔗 ICD-10 code

Patient's underlying cause of death.

cause_of_death_01 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_02 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_03 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_04 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_05 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_06 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_07 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_08 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_09 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_10 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_11 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_12 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_13 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_14 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

cause_of_death_15 🔗 ICD-10 code

Medical condition mentioned on the death certificate.

place 🔗 string

Patient's place of death.

  • Possible values: Care Home, Elsewhere, Home, Hospice, Hospital, Other communal establishment
Methods
cause_of_death_is_in(codelist) 🔗

Match codelist against the underlying_cause_of_death field and all 15
separate cause_of_death fields.

This method evaluates as True if any code in the codelist matches any of
these fields.


View method definition

columns = [
    "underlying_cause_of_death",
    *[f"cause_of_death_{i:02d}" for i in range(1, 16)],
]
conditions = [getattr(ons_deaths, column).is_in(codelist) for column in columns]
return functools.reduce(operator.or_, conditions)
</details>

many rows per patient

opa

Outpatient appointments data (OPA) is provided via the NHS Secondary Uses Service.

This table gives core details of outpatient appointments.

Refer to the GitHub issue that describes limitations
of the outpatient appointments data

and the GitHub issue that discusses more of the background context.

Columns
opa_ident 🔗 integer

Unique identifier for the appointment used across the OPA tables.

  • Never NULL
appointment_date 🔗 date

The date of an appointment.

attendance_status 🔗 string

Indicates whether or not an appointment for a care contact took place. If the appointment did not take place it also indicates whether or not advanced warning was given. Refer to the NHS Data Model and Dictionary entry for "attended or did not attend" for details on code meanings.

  • Possible values: 5, 6, 7, 2, 3, 4, 0
consultation_medium_used 🔗 string

Identifies the communication mechanism used to relay information between the care professional and the person who is the subject of the consultation, during a care activity. Refer to the NHS Data Model and Dictionary entry for "consultation mechanism" for details on code meanings. Note that the allowed codes as listed in TPP's data appear to be a subset of the codes listed in the NHS Data Model and Dictionary.

  • Possible values: 01, 02, 03, 04, 05, 09, 10, 11, 98
first_attendance 🔗 string

An indication of whether a patient is making a first attendance or contact; or a follow-up attendance or contact and whether the consultation medium used national code was face to face communication or telephone or telemedicine web camera. Refer to the NHS Data Model and Dictionary entry for "first attendance" for details on code meanings. Note that the allowed codes as listed in TPP's data contain an additional 9 code over the NHS Data Model and Dictionary entry.

  • Possible values: 1, 2, 3, 4, 5, 9
hrg_code 🔗 string

The Healthcare Resource Group (HRG) code assigned to the activity, used to assign baseline tariff costs.

  • Matches regular expression: [a-zA-Z]{2}[0-9]{2}[a-zA-Z]
treatment_function_code 🔗 string

The treatment function under which the patient is treated. It may be the same as the main specialty code or a different treatment function which will be the care professional's treatment interest.

many rows per patient

opa_cost

Outpatient appointments data is provided via the NHS Secondary Uses Service.

This table gives details of outpatient appointment costs.

Note that data only goes back a couple of years.

Columns
opa_ident 🔗 integer

Unique identifier for the appointment used across the OPA tables.

  • Never NULL
tariff_opp 🔗 float

The base national tariff where the procedure tariff is applicable.

grand_total_payment_mff 🔗 float

The grand total payment for the activity (Net_SLA_Payment + Tariff_MFF_Payment) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment).

tariff_total_payment 🔗 float

The total payment according to the national tariff.

appointment_date 🔗 date

The date of an appointment.

referral_request_received_date 🔗 date

The date the referral request was received by the health care provider.

many rows per patient

opa_diag

Outpatient appointments data is provided via the NHS Secondary Uses Service.

This table gives details of outpatient appointment diagnoses.

Note that diagnoses are often absent from outpatient records.

Columns
opa_ident 🔗 integer

Unique identifier for the appointment used across the OPA tables.

  • Never NULL
primary_diagnosis_code 🔗 ICD-10 code

The international classification of diseases (ICD) code used to identify the primary patient diagnosis. Note that this will typically not be completed.

primary_diagnosis_code_read 🔗 CTV3 (Read v3) code

The Read coded clinical terms code to identify the primary patient diagnosis. Note that this will typically not be completed.

secondary_diagnosis_code_1 🔗 ICD-10 code

The international classification of diseases (ICD) code used to identify the secondary patient diagnosis. Note that this will typically not be completed.

secondary_diagnosis_code_1_read 🔗 CTV3 (Read v3) code

The Read coded clinical terms used to identify the secondary patient diagnosis. Note that this will typically not be completed.

appointment_date 🔗 date

The date of an appointment.

referral_request_received_date 🔗 date

The date the referral request was received by the health care provider.

many rows per patient

opa_proc

Outpatient appointments data is provided via the NHS Secondary Uses Service.

This table gives details of outpatient procedures.
Typically, procedures will only be recorded where they attract a specified payment.
The majority of appointments will have no procedure recorded.

Columns
opa_ident 🔗 integer

Unique identifier for the appointment used across the OPA tables.

  • Never NULL
primary_procedure_code 🔗 OPCS-4 code

The OPCS classification of interventions and procedures code which is used to identify the primary patient procedure carried out.

primary_procedure_code_read 🔗 CTV3 (Read v3) code

The Read coded clinical terms code which is used to identify the primary patient procedure carried out.

procedure_code_2 🔗 OPCS-4 code

TODO

procedure_code_2_read 🔗 CTV3 (Read v3) code

The Read coded clinical terms for a procedure other than the primary procedure.

appointment_date 🔗 date

The date of an appointment.

referral_request_received_date 🔗 date

The date the referral request was received by the health care provider.

many rows per patient

open_prompt

This table contains responses to questions from the OpenPROMPT project.

You can find out more about this table in the associated short data report. To view
it, you will need a login for Level 4. The
workspace shows when the code that comprises the report was run;
the code itself is in the airmid-short-data-report repository on
GitHub.

Columns
ctv3_code 🔗 CTV3 (Read v3) code

The response to the question, as a CTV3 code. Alternatively, if the question does not admit a CTV3 code as the response, then the question, as a CTV3 code.

  • Never NULL
snomedct_code 🔗 SNOMED-CT code

The response to the question, as a SNOMED CT code. Alternatively, if the question does not admit a SNOMED CT code as the response, then the question, as a SNOMED CT code.

creation_date 🔗 date

The date the survey was administered

  • Never NULL
consultation_date 🔗 date

The response to the question, as a date, if the question admits a date as the response. Alternatively, the date the survey was administered.

  • Never NULL
consultation_id 🔗 integer

The ID of the survey

  • Never NULL
numeric_value 🔗 float

The response to the question, as a number

one row per patient

parents

Provides the internal pseudonymous ID of the patient's mother, if this is recorded
in the SystmOne database.

We remove any records which have an "end date" specified: this is indicative of an
incorrect record having been amended in the database. We also remove any obviously
unsuitable records, specifically those where the mother is recorded as male (noting
that this is sex assigned at birth), or where the mother's date of birth is not
before the child's. Finally we remove any cases where more than one valid record
exists and we don't know which is correct.

It is not currently clear whether these records are intended to capture birth
mothers or those with parental responsibility.

At the time of writing (2024-09-23) the underlying Relationship table contains
approximately 3.8 million rows, specifying 2.7 million distinct
relationships (relations can be expressed as both parent-to-child and
child-to-parent, hence the high rate of duplicates).

  • Removing male parents discards about 120,000 of these.
  • Removing relationships with end dates discards a further 175,000.
  • Removing those where the parent is younger than the child discards a futher
    8,000.
  • Finally, removing ambiguous records (i.e. multiple conflicting valid entries)
    discards another 4,000.

This leaves a total of about 2.5 million patients with a valid mother_id
record.

Columns
mother_id 🔗 integer

The patient_id of the patient's mother

one row per patient

patients

Patients in primary care.

Representativeness

You can find out more about the representativeness of these data in the
OpenSAFELY-TPP backend in:

The OpenSAFELY Collaborative, Colm D. Andrews, Anna Schultze, Helen J. Curtis, William J. Hulme, John Tazare, Stephen J. W. Evans, et al. 2022.
"OpenSAFELY: Representativeness of Electronic Health Record Platform OpenSAFELY-TPP Data Compared to the Population of England."
Wellcome Open Res 2022, 7:191.
https://doi.org/10.12688/wellcomeopenres.18010.1

Orphan records

If a practice becomes aware that a patient has moved house,
then the practice deducts, or removes, the patient's records from their register.
If the patient doesn't register with a new practice within a given amount of time
(normally from four to eight weeks),
then the patient's records are permanently deducted and are orphan records.
There are roughly 1.6 million orphan records.

Recording of death in primary care

Dates of death appear in two places in the data made available via OpenSAFELY: the
primary care record, and the death certificate data supplied by the ONS.

ONS death data are considered the gold standard for identifying patient death in
England because they are based on the MCCDs (Medical Certificate of Cause of Death)
which the last attending doctor has a statutory duty to complete.

While there is generally a lag between the death being recorded in ONS data and it
appearing in the primary care record, the coverage of recorded death is almost
complete and the date of death is usually reliable when it appears. There is also a
lag in ONS death recording (see ons_deaths below for more detail).
You can find out more about the accuracy of date of death recording in primary care
in:

Gallagher, A. M., Dedman, D., Padmanabhan, S., Leufkens, H. G. M. & de Vries, F 2019. The accuracy of date of death recording in the Clinical
Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations.
Pharmacoepidemiol. Drug Saf. 28, 563–569.
https://doi.org/10.1002/pds.4747

By contrast, cause of death is often not accurate in the primary care record so we
don't make it available to query here.

Example ehrQL usage of patients

Columns
date_of_birth 🔗 date

Patient's date of birth.

  • Always the first day of a month
  • Never NULL
sex 🔗 string

Patient's sex.

  • Possible values: female, male, intersex, unknown
  • Never NULL
date_of_death 🔗 date

Patient's date of death.

Methods
age_on(date) 🔗

Patient's age as an integer, in whole elapsed calendar years, as it would be on
the given date.

This method takes no account of whether the patient is alive on the given date.
In particular, it may return negative values if the given date is before the
patient's date of birth.


View method definition

return (date - patients.date_of_birth).years
</details>
is_alive_on(date) 🔗

Whether a patient is alive on the given date, based on the date of death
recorded in their primary care record. NB this is only based on the primary
care record. Please see the section above about the accuracy of death data.

If the date provided is before a person was born, then this helper function will
actually return True, despite the person not being alive yet. For most research
this is likely the expected behaviour.


View method definition

return patients.date_of_death.is_after(date) | patients.date_of_death.is_null()
</details>
is_dead_on(date) 🔗

Whether a patient has a date of death in their primary care record before the given date.

A person is classed as dead if the date provided is after their death date.


View method definition

return patients.date_of_death.is_not_null() & patients.date_of_death.is_before(date)
</details>

many rows per patient

practice_registrations

Each record corresponds to a patient's registration with a practice.

Example ehrQL usage of practice_registrations

TPP specific information

See the TPP backend information
for details of which patients are included.

Columns
start_date 🔗 date

Date patient joined practice.

  • Never NULL
end_date 🔗 date

Date patient left practice.

practice_pseudo_id 🔗 integer

Pseudonymised practice identifier.

  • Never NULL
practice_stp 🔗 string

ONS code of practice's STP (Sustainability and Transformation Partnership).
STPs have been replaced by ICBs (Integrated Care Boards), and ICB codes will be available soon.

  • Matches regular expression: E540000[0-9]{2}
practice_nuts1_region_name 🔗 string

Name of the NUTS level 1 region of England to which the practice belongs.
For more information see:
https://www.ons.gov.uk/methodology/geography/ukgeographies/eurostat

  • Possible values: North East, North West, Yorkshire and The Humber, East Midlands, West Midlands, East, London, South East, South West
practice_systmone_go_live_date 🔗 date

Date on which the practice started using the SystmOne EHR platform.

Most patient records will have been transferred from the previous EHR
platform but records which are specific to SystmOne will not exist before
this date. In particular, the appointments table should
only be considered accurate for a given practice after this date.

Methods
for_patient_on(date) 🔗

Return each patient's practice registration as it was on the supplied date.

Where a patient is registered with multiple practices we prefer the most recent
registration and then, if there are multiple of these, the one with the longest
duration. If there's still an exact tie we choose arbitrarily based on the
practice ID.


View method definition

spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where(
    practice_registrations.end_date < date
)
ordered_regs = spanning_regs.sort_by(
    practice_registrations.start_date,
    practice_registrations.end_date,
    practice_registrations.practice_pseudo_id,
)
return ordered_regs.last_for_patient()
</details>
exists_for_patient_on(date) 🔗

Returns whether a person was registered with a practice on the supplied date.

NB. The implementation currently uses spanning(). It would also have been
valid to implement as
practice_registrations.for_patient_on(date).exists_for_patient(), but for
internal reasons that is less efficient.


View method definition

return practice_registrations.spanning(date, date).exists_for_patient()
</details>
spanning(start_date, end_date) 🔗

Filter registrations to just those spanning the entire period between
start_date and end_date.


View method definition

return practice_registrations.where(
    practice_registrations.start_date.is_on_or_before(start_date)
    & (practice_registrations.end_date.is_after(end_date) | practice_registrations.end_date.is_null())
)
</details>
spanning_with_systmone(start_date, end_date) 🔗

Filter registrations to just those spanning the entire period between
start_date and end_date and where the practice has been using the SystmOne
EHR platform throughout that period (see
systmone_go_live_date).


View method definition

return practice_registrations.spanning(start_date, end_date).where(
    practice_registrations.practice_systmone_go_live_date <= start_date
)
</details>

many rows per patient

sgss_covid_all_tests

COVID-19 tests results from SGSS (the Second Generation Surveillance System).

For background on this data see the NHS DARS catalogue entry.
And for more detail on SGSS in general see UKHSA_Laboratory_Reporting_Guidelines.pdf.

Columns
specimen_taken_date 🔗 date

Date on which specimen was collected.

  • Never NULL
is_positive 🔗 boolean

Whether the specimin tested positive for SARS-CoV-2.

  • Never NULL
lab_report_date 🔗 date

Date on which the labaratory reported the result.

  • Never NULL
was_symptomatic 🔗 boolean

Whether the patient reported symptoms of COVID-19 at the time the specimen
was collected. May be NULL if unknown.

sgtf_status 🔗 integer

Provides information on whether a PCR test result exhibited "S-Gene Target
Failure" which can be used as a proxy for the presence of certain Variants
of Concern.

Results are provided as number between 0 and 9. We know the meaning of
some of these numbers based on an email from PHE:

0: S gene detected

Detectable S gene (CH3>0)

Detectable y ORF1ab CT value (CH1) <=30 and >0

Detectable N gene CT value (CH2) <=30 and >0

1: Isolate with confirmed SGTF

Undetectable S gene; CT value (CH3) =0

Detectable ORF1ab gene; CT value (CH2) <=30 and >0

Detectable N gene; CT value (CH1) <=30 and >0

9: Cannot be classified

Null are where the target is not S Gene. I think LFTs are currently
also coming across as 9 so will need to review those to null as well as
clearly this is a PCR only variable.

However the values 2, 4 and 8 also occur in this column and we don't
currently have documentation on their meaning.

  • Always >= 0 and <= 9
variant 🔗 string

Where a specific SARS-CoV-2 variant was identified this column provides the details.

This appears to be effectively a free-text field with a large variety of
possible values. Some have an obvious meaning e.g. B.1.617.2,
VOC-21JAN-02, VUI-21FEB-04.

Others less so e.g. VOC-22JAN-O1_probable:V-21OCT-01_low-qc.

variant_detection_method 🔗 string

Where a specific SARS-CoV-2 variant was identified this provides the method
used to do so.

  • Possible values: Private Lab Sequencing, Reflex Assay, Sanger Provisional Result

many rows per patient

ukrr

The UK Renal Registry (UKRR) contains data on patients under secondary renal care
(advanced chronic kidney disease stages 4 and 5, dialysis, and kidney transplantation)

Columns
dataset 🔗 string

The cohort of patients.

Values are:

  • '2019_prevalence' - a prevalence cohort of patients alive and on RRT in December 2019
  • '2020_prevalence' - a prevalence cohort of patients alive and on RRT in December 2020
  • '2021_prevalence' - a prevalence cohort of patients alive and on RRT in December 2021
  • '2020_incidence' - an incidence cohort of patients who started RRT in 2020
  • '2020_ckd' - a snapshot prevalence cohort of patient with Stage 4 or 5 CKD who were reported to the UKRR to be under renal care in December 2020.

  • Possible values: 2019_prevalence, 2020_prevalence, 2021_prevalence, 2020_incidence, 2020_ckd

renal_centre 🔗 string

The code of the main renal centre a patient is registered with

rrt_start_date 🔗 date

The latest start date for renal replacement therapy

latest_creatinine 🔗 float

Most recent creatinine held by UKRR

latest_egfr 🔗 float

Most recent eGFR held by UKRR

treatment_modality_start 🔗 string

The treatment modality at rrt_start_date.

Values such as ICHD, HHD, HD, PD, Tx.

treatment_modality_prevalence 🔗 string

The treatment modality from the prevalence data

many rows per patient

vaccinations

This table contains information on administered vaccinations,
identified using either the target disease (e.g., Influenza),
or the vaccine product name (e.g., Optaflu).
For more information about this table see the
"Vaccinaton names in the OpenSAFELY-TPP database" report.

Vaccinations that were administered at work or in a pharmacy might not be
included in this table.

Columns
vaccination_id 🔗 integer

Vaccination identifier.

date 🔗 date

The date the vaccination was administered.

target_disease 🔗 string

Vaccine's target disease.

product_name 🔗 string

Vaccine's product name.

many rows per patient

wl_clockstops

National Waiting List Clock Stops

This dataset contains all completed referral-to-treatment (RTT) pathways with a "clock stop" date between May 2021 and May 2022.
Patients referred for non-emergency consultant-led treatment are on RTT pathways.
The "clock start" date is the date of the first referral that starts the pathway.
The "clock stop" date is when the patient either: receives treatment;
declines treatment;
enters a period of active monitoring;
no longer requires treatment;
or dies.
The time spent waiting is the difference in these two dates.

A patient may have multiple rows if they have multiple completed RTT pathways;
however, there is only one row per unique pathway.
Because referral identifiers aren't necessarily unique between hospitals,
unique RTT pathways can be identified using a combination of:

  • pseudo_organisation_code_patient_pathway_identifier_issuer
  • pseudo_patient_pathway_identifier
  • pseudo_referral_identifier
  • referral_to_treatment_period_start_date

For more information, see
"Consultant-led Referral to Treatment Waiting Times Rules and Guidance".

Columns
activity_treatment_function_code 🔗 string

The treatment function

  • Matches regular expression: [a-zA-Z0-9]{3}
priority_type_code 🔗 string

The priority type.

Note that a small number of rows contain values which are not in the list
below. These are converted to NULL in this representation of the data. If
you need to access the original values, please see the corresponding raw
table
.

  • Possible values: routine, urgent, two week wait
pseudo_organisation_code_patient_pathway_identifier_issuer 🔗 string
pseudo_patient_pathway_identifier 🔗 string
pseudo_referral_identifier 🔗 string
referral_request_received_date 🔗 date

The date the referral was received, for the referral that started the original pathway

referral_to_treatment_period_end_date 🔗 date

Clock stop for the completed pathway

referral_to_treatment_period_start_date 🔗 date

Clock start for the completed pathway

source_of_referral_for_outpatients 🔗 string
waiting_list_type 🔗 string

The waiting list type on completion of the pathway.

Note that a small number of rows contain values which are not in the list
below. These are converted to NULL in this representation of the data. If
you need to access the original values, please see the corresponding raw
table
.

  • Possible values: ORTT, IRTT, PTLO, PTLI, RTTO, RTTI
week_ending_date 🔗 date

The Sunday of the week that the pathway relates to

many rows per patient

wl_openpathways

National Waiting List Open Pathways

This dataset contains all people on open (incomplete) RTT or not current RTT (non-RTT) pathways as of May 2022.
It is a snapshot of everyone still awaiting treatment as of May 2022 (i.e., the clock hasn't stopped).
Patients referred for non-emergency consultant-led treatment are on RTT pathways,
while patients referred for non-consultant-led treatment are on non-RTT pathways.
For each pathway, there is one row for every week that the patient is still waiting.
Because referral identifiers aren't necessarily unique between hospitals,
unique RTT pathways can be identified using a combination of:

  • pseudo_organisation_code_patient_pathway_identifier_issuer
  • pseudo_patient_pathway_identifier
  • pseudo_referral_identifier
  • referral_to_treatment_period_start_date

For more information, see
"Consultant-led Referral to Treatment Waiting Times Rules and Guidance".

Columns
activity_treatment_function_code 🔗 string

The treatment function

  • Matches regular expression: [a-zA-Z0-9]{3}
current_pathway_period_start_date 🔗 date

Latest clock start for this pathway period

priority_type_code 🔗 string

The priority type.

Note that a small number of rows contain values which are not in the list
below. These are converted to NULL in this representation of the data. If
you need to access the original values, please see the corresponding raw
table
.

  • Possible values: routine, urgent, two week wait
pseudo_organisation_code_patient_pathway_identifier_issuer 🔗 string
pseudo_patient_pathway_identifier 🔗 string
pseudo_referral_identifier 🔗 string
referral_request_received_date 🔗 date

The date the referral was received, for the referral that started the original pathway

referral_to_treatment_period_end_date 🔗 date

If the pathway is open, then NULL

referral_to_treatment_period_start_date 🔗 date

Latest clock start for this pathway. If the pathway is not a current pathway, then NULL.

source_of_referral 🔗 string

National referral source code for the referral that created the original pathway

  • Matches regular expression: [a-zA-Z0-9]{2}
waiting_list_type 🔗 string

The waiting list type.

Note that a small number of rows contain values which are not in the list
below. These are converted to NULL in this representation of the data. If
you need to access the original values, please see the corresponding raw
table
.

  • Possible values: ORTT, IRTT, ONON, INON, PTLO, PTLI, RTTO, RTTI
week_ending_date 🔗 date

The Sunday of the week that the pathway relates to