--- a +++ b/tests/spec/README.md @@ -0,0 +1,128 @@ +# Spec tests + +These tests specify the expected behaviour of ehrQL, and by extension, the query model. + +They are also used to generate documentation. + + +## Table of Contents + +`toc.py` defines the Table of Contents for the documentation generated by tests. A new spec test +subdirectory or file must be defined here in order for it to appear in the docs. + +## Folder structure + +Within the spec directory, each test directory is considered a chapter in the docs, each file +is considered to be a section, and each test is a paragraph. + +e.g. +``` +tests + ├── spec + │ ├── aggregate_frame <-- chapter + │ │ ├── __init__.py <-- contains chapter title + │ │ ├── test_count_for_patient.py <-- section + │ │ └── test_exists_for_patient.py <-- section +``` + +### Titles +Chapter titles are specified with a `title` attribute in the test directory's `__init__.py`. +Section titles are specified with a `title` attribute in each test file. +Paragraph titles are extracted from the test names, or from a `title` attribute in the test +function. + +E.g. `test_count_for_patient.py` contains 2 tests: +- `test_count_for_patient_on_event_frame()` +- `test_count_for_patient_on_patient_frame()` + +Assuming this is included in `toc.py`, the following structure will be generated in the docs: + +> 1. Aggregating event and patient frames # from `title` in `aggregate_frame/__init__.py` +> 1.1 Counting the rows for each patient # from `title` in `aggregate_frame/test_count_for_patient.py` +> 1.1.1 Count for patient on event frame # from test name +> 1.1.2 Count for patient on patient frame # from test name + + +### Additional text +Optional additional text can be included under a Chapter, Section or Paragraph title in the +documentation as follows: + +- Chapter: add a `text` attribute in the test directory's `__init__.py`. +- Section: add a `text` attribute in a test file. +- Paragraph: add a docstring to the test function. + + +## Test structure + +Spec tests follow a standardised structure in order to allow documentation generation. Each +spec test uses the `spec_test` fixture, which has the following components: + +``` +spec_test( + table_data, # a dict (see below) defining the tables in the test database + e.where(e.b1).i1.sum_for_patient(), # the ehrQL code being tested + { # expected results + 1: (101 + 102), + 2: 201, + }, +) +``` + +### Defining table data +Each test must define a dict named `table_data`, with keys `e` (event level data, with multiple rows +per patient) and/or `p` (patient level data, with one row per patient). + +This sets up the test data. The first column is always patient ID and must be of integer type; +it can be given a heading in the test for readability if desired, but it does not require one. +Subsequent columns must take one of a discrete set of names, which indicate the type of the column +in the test database. + +See `tests/spec/tables.py` for available column names and types. + +E.g. The following table data dict sets up: +- an event table with an integer column `i1` and a boolean column `b1`, with 3 rows for patient 1, + and 2 rows for patient 2 +- a patient table with a date column `d1` with rows for 3 patients, 2 of which also appear in the + event table. +``` +table_data = { + e: """ + | i1 | b1 + --+-----+----- + 1 | 101 | T + 1 | 102 | T + 1 | 103 | F + 2 | 201 | T + 2 | 202 | + """, + p: """ + | d1 + --+----------- + 1 | 1990-01-02 + 2 | 1990-01-02 + 3 | + """ +} +``` + +### Defining the ehrQL code to return a series + +Define the ehrQL code that's being tested, using the table data. This should always be +expected to return a Series. + +e.g. in the table data above, to get the sum of the `i1` column where `b1` is True: + +```e.where(e.b1).i1.sum_for_patient()``` + +### Expected results + +Describe the expected resulsts as a dict, with patient IDs as keys. + +e.g. in the example above, the expected sums would be: + +``` +{ + 1: (101 + 102), + 2: 201, +} +```