This page describes how to install and use the OpenSAFELY VS Code extension to assist in
learning and writing ehrQL.
It uses a set of local dummy tables to allow you to inspect the contents of ehrQL tables, columns,
datasets and queries. This means you can use known data (the contents of your dummy tables) to
check if your ehrQL queries are extracting data as you expect.
e.g. given a dummy patient table
patient_id | date_of_birth |
---|---|
1 | 1990-01-01 |
2 | 1980-01-01 |
show(patients.age_on("2020-01-01"))
would display
patient_id | value |
---|---|
1 | 30 |
2 | 40 |
You can check if the extension is already installed by opening an ehrQL dataset definition
file in VS Code. With the file open, click on the the dropdown next to the Run button. If the extension is installed, the first (default) option in the dropdown menu will be "OpenSAFELY: Debug ehrQL dataset".
If you are creating a new repo from the research template, the extension will already be
installed when you start up a codespace.
If you have an existing repo that does not have the extension installed, follow the
instructions to update your codespace.
Click on the Extensions icon in the left hand menu bar in VS Code, or go to
File > Preferences > Extensions.
This will display a list of installed and available
extensions. Search for "opensafely" to find the OpenSAFELY extension.
In a Codespace, updates to the extension will be installed automatically the next time
your codespace starts up. Locally, you will see a notification on the Extensions icon when
you have extensions with updates available, and you will need to manually click on the
"Restart extension" button link to install the updated extension.
The extension requires a folder containing dummy tables. By default, this is expected
to be called dummy_tables
, and located at the top level of your study repo folder. The
location can be configured in the extension settings.
ehrQL provides some example data for the core ehrQL tables, which you can fetch by
running
opensafely exec ehrql:v1 dump-example-data
This will create a folder called example-data
, which you can rename to dummy_tables
for
use by the extension.
Alternatively you can supply your own dummy tables or use your dataset definition to
generate dummy tables for you.
To generate dummy tables from a dataset definition, dataset_definition.py
, and
save them to a folder called dummy_tables
:
opensafely exec ehrql:v1 create-dummy-tables dataset_definition.py dummy_tables
We can show the contents of an ehrQL dataset, table, column or query by using the show()
.
Import the function:
from ehrql import show
Show the contents of an ehrQL element:
show(<element>)
Click on the Run button, or Ctrl+Shift+P and select the "OpenSAFELY: Debug ehrQL dataset"
command.
The following dataset definition filters patients to only those over 18, and shows the
age
variable and the corresponding date of birth value from the patients
table (with an optional label), and the final dataset output.
from ehrql import create_dataset, show
from ehrql.tables.core import patients
age = patients.age_on("2022-01-01")
show(age, patients.date_of_birth, label="Age")
dataset = create_dataset()
dataset.define_population(age >= 18)
dataset.age = age
show(dataset)
Running the extension opens an adjacent panel to display the contents of the show()
calls.
As we saw in the example above, show()
can be called with multiple ehrQL elements; in this
case, age
, and date_of_birth
from the patients
table. These both contain one row per
patient, and are shown in a single table.
We can show()
any number of one-row-per-patient ehrQL series in a single output table, e.g.:
show(patients.sex, clinical_events.count_for_patient())
Or multiple many-rows-per-patient ehrQL series, as long as they come from the same table.
show(clinical_events.date, clinical_events.numeric_value)
Attempting to use show()
with a combination of one-row-per-patient and many-rows-per-patient
series with raise an error.
e.g. The following is invalid:
show(patients.sex, clinical_events.date)
Instead, show these series separately:
show(patients.sex)
show(clinical_events.date)
The tables will be displayed one after another in the output panel.
If you write some invalid ehrQL in your dataset definition, you will see an error message
printed to the display panel:
show(
medications.where(medications.date >= "2016-01-01").sort_by(medications.dat).first_for_patient()
)
Error loading file 'dataset_definition.py':
Traceback (most recent call last):
File "/home/becky/datalab/ehrql/dataset_definition.py", line 21, in <module>
medications.where(medications.date >= "2016-01-01").sort_by(medications.dat).first_for_patient()
^^^^^^^^^^^^^^^
AttributeError: 'medications' object has no attribute 'dat'
❔ Can you work out what this is telling us?
Refer to the catalogue of errors for details of common error messages and what they mean.