These tests specify the expected behaviour of ehrQL, and by extension, the query model.
They are also used to generate documentation.
toc.py
defines the Table of Contents for the documentation generated by tests. A new spec test
subdirectory or file must be defined here in order for it to appear in the docs.
Within the spec directory, each test directory is considered a chapter in the docs, each file
is considered to be a section, and each test is a paragraph.
e.g.
tests
├── spec
│ ├── aggregate_frame <-- chapter
│ │ ├── __init__.py <-- contains chapter title
│ │ ├── test_count_for_patient.py <-- section
│ │ └── test_exists_for_patient.py <-- section
Chapter titles are specified with a title
attribute in the test directory's __init__.py
.
Section titles are specified with a title
attribute in each test file.
Paragraph titles are extracted from the test names, or from a title
attribute in the test
function.
E.g. test_count_for_patient.py
contains 2 tests:
- test_count_for_patient_on_event_frame()
- test_count_for_patient_on_patient_frame()
Assuming this is included in toc.py
, the following structure will be generated in the docs:
- Aggregating event and patient frames # from
title
inaggregate_frame/__init__.py
1.1 Counting the rows for each patient # fromtitle
inaggregate_frame/test_count_for_patient.py
1.1.1 Count for patient on event frame # from test name
1.1.2 Count for patient on patient frame # from test name
Optional additional text can be included under a Chapter, Section or Paragraph title in the
documentation as follows:
text
attribute in the test directory's __init__.py
.text
attribute in a test file.Spec tests follow a standardised structure in order to allow documentation generation. Each
spec test uses the spec_test
fixture, which has the following components:
spec_test(
table_data, # a dict (see below) defining the tables in the test database
e.where(e.b1).i1.sum_for_patient(), # the ehrQL code being tested
{ # expected results
1: (101 + 102),
2: 201,
},
)
Each test must define a dict named table_data
, with keys e
(event level data, with multiple rows
per patient) and/or p
(patient level data, with one row per patient).
This sets up the test data. The first column is always patient ID and must be of integer type;
it can be given a heading in the test for readability if desired, but it does not require one.
Subsequent columns must take one of a discrete set of names, which indicate the type of the column
in the test database.
See tests/spec/tables.py
for available column names and types.
E.g. The following table data dict sets up:
- an event table with an integer column i1
and a boolean column b1
, with 3 rows for patient 1,
and 2 rows for patient 2
- a patient table with a date column d1
with rows for 3 patients, 2 of which also appear in the
event table.
table_data = {
e: """
| i1 | b1
--+-----+-----
1 | 101 | T
1 | 102 | T
1 | 103 | F
2 | 201 | T
2 | 202 |
""",
p: """
| d1
--+-----------
1 | 1990-01-02
2 | 1990-01-02
3 |
"""
}
Define the ehrQL code that's being tested, using the table data. This should always be
expected to return a Series.
e.g. in the table data above, to get the sum of the i1
column where b1
is True:
e.where(e.b1).i1.sum_for_patient()
Describe the expected resulsts as a dict, with patient IDs as keys.
e.g. in the example above, the expected sums would be:
{
1: (101 + 102),
2: 201,
}