Spec tests

These tests specify the expected behaviour of ehrQL, and by extension, the query model.

They are also used to generate documentation.

toc.py defines the Table of Contents for the documentation generated by tests. A new spec test
subdirectory or file must be defined here in order for it to appear in the docs.

Folder structure

Within the spec directory, each test directory is considered a chapter in the docs, each file
is considered to be a section, and each test is a paragraph.

e.g.

tests
    ├── spec
    │   ├── aggregate_frame                       <-- chapter
    │   │   ├── __init__.py                       <-- contains chapter title
    │   │   ├── test_count_for_patient.py         <-- section
    │   │   └── test_exists_for_patient.py        <-- section

Titles

Chapter titles are specified with a title attribute in the test directory's __init__.py.
Section titles are specified with a title attribute in each test file.
Paragraph titles are extracted from the test names, or from a title attribute in the test
function.

E.g. test_count_for_patient.py contains 2 tests:
- test_count_for_patient_on_event_frame()
- test_count_for_patient_on_patient_frame()

Assuming this is included in toc.py, the following structure will be generated in the docs:

Aggregating event and patient frames # from title in aggregate_frame/__init__.py
1.1 Counting the rows for each patient # from title in aggregate_frame/test_count_for_patient.py
1.1.1 Count for patient on event frame # from test name
1.1.2 Count for patient on patient frame # from test name

Additional text

Optional additional text can be included under a Chapter, Section or Paragraph title in the
documentation as follows:

Chapter: add a text attribute in the test directory's __init__.py.
Section: add a text attribute in a test file.
Paragraph: add a docstring to the test function.

Test structure

Spec tests follow a standardised structure in order to allow documentation generation. Each
spec test uses the spec_test fixture, which has the following components:

spec_test(
    table_data,                          # a dict (see below) defining the tables in the test database
    e.where(e.b1).i1.sum_for_patient(),   # the ehrQL code being tested
    {                                    # expected results
        1: (101 + 102),
        2: 201,
    },
)

Defining table data

Each test must define a dict named table_data, with keys e (event level data, with multiple rows
per patient) and/or p (patient level data, with one row per patient).

This sets up the test data. The first column is always patient ID and must be of integer type;
it can be given a heading in the test for readability if desired, but it does not require one.
Subsequent columns must take one of a discrete set of names, which indicate the type of the column
in the test database.

See tests/spec/tables.py for available column names and types.

E.g. The following table data dict sets up:
- an event table with an integer column i1 and a boolean column b1, with 3 rows for patient 1,
and 2 rows for patient 2
- a patient table with a date column d1 with rows for 3 patients, 2 of which also appear in the
event table.

table_data = {
    e: """
          |  i1 |  b1
        --+-----+-----
        1 | 101 |  T
        1 | 102 |  T
        1 | 103 |  F
        2 | 201 |  T
        2 | 202 |
    """,
    p: """
          |  d1
        --+-----------
        1 | 1990-01-02
        2 | 1990-01-02
        3 |
    """
}

Defining the ehrQL code to return a series

Define the ehrQL code that's being tested, using the table data. This should always be
expected to return a Series.

e.g. in the table data above, to get the sum of the i1 column where b1 is True:

e.where(e.b1).i1.sum_for_patient()

Expected results

Describe the expected resulsts as a dict, with patient IDs as keys.

e.g. in the example above, the expected sums would be:

{
    1: (101 + 102),
    2: 201,
}