just
The just
command provides a list of available recipes:
just list
Running any of the just
commands that need it will set up a local environment and install dependencies.
Tests are divided into the following categories.
Each category lives in its own directory (for example tests/unit
) and has its own just
command to run it (for
example just test-unit
).
To run all tests, as they're run in CI with code coverage):
just test-all
To run just one category of tests:
just test-<category>
For example:
just test-unit
Additional arguments can be passed to any test commands, for example:
just test-unit --verbose --durations=10
For maximum flexibility, the test
command can be used to run individual test files or tests, or to do other clever things with pytest
. It just delegates to pytest
.
For example:
just test tests/integration/backends/test_tpp.py
There are further notes on using pytest
in the wiki here:
https://github.com/opensafely-core/ehrql/wiki/Tips-for-using-pytest
The generative tests use Hypothesis to generate variable definitions (in
the query model) and test data. They then execute the resulting dataset
definitions using all the available query engines, and check that the
results are the same.
To get the most out of our generative tests we want them to run for a
long time and to explore different parts of the query space each time
they're run. But neither of these qualities are desirable in CI.
For this reason, the CI configuration generates only a small number of
examples, of limited query depth, and in a determinstic fashion. They
thus function more as a test that the generative test machinery is
working correctly than as a serious attempt to do generative testing.
When running locally, although randomisation is enabled by default,
the number of examples is still very small (to keep test runtimes
reasonable). To do some "proper" generative testing you can run the
command:
just test-generative
which increases the example count by setting GENTEST_EXAMPLES
.
It would be worth running these locally when adding new query model
operations or making significant changes to a query engine. You may even
want to crank the settings further e.g.
GENTEST_EXAMPLES=10000 GENTEST_MAX_DEPTH=20 just test-generative
You can control which query engines the tests exercise using the
enviornment variable GENTEST_QUERY_ENGINES
. For instance, if you have
made a change to the basic SQL-building logic in BaseSQLQueryEngine and
you want to rapidly test this with a large number of examples you could
compare just the in-memory and SQLite engines using:
GENTEST_QUERY_ENGINES='in_memory sqlite' GENTEST_EXAMPLES=10000 just test-generative
In addition to whatever you do locally, a scheduled Github Actions
workflow runs the generative test overnight with settings as high as we
can get away with and alerts us in Slack if it finds a failure.
You can get Hypothesis to dump statistics at the end of the run with --hypothesis-show-statistics
,
or (more usefully) dump some of our own statistics about the generated data and queries by setting GENTEST_DEBUG=t
.
When debugging a failure you'll probably want to reproduce it.
Hypothesis often struggles to shrink the examples it finds, and even
small examples can appear overwhelmingly verbose due to the repetitive
nature of query model reprs. To help with this there is some tooling,
and a process to follow:
Copy the dataset
and data
arguments from the example that
Hypothesis displays and paste them into a new file. (Don't worry
about stripping indentation or trailing commas here.)
Run the command:
just gentest-example-simplify PATH_TO_FILE.py
This should transform the copied code into a valid test example and
pull out some of the repeated elements into variables.
Run the command:
just gentest-example-run PATH_TO_FILE.py
This should execute the example and confirm that the test fails in
the expected way.
To further simplify the example you can copy a repeated element,
assign it to a variable and then re-run gentest-example-simplify
on
the file. This will replace occurances of that element with a
reference to the variable.
Hypothesis can generate query graphs that are very deeply nested; after 100 draws in a test example, hypothesis will return the example as invalid. In order to avoid this, the
variable strategies check for a maximum depth and return a terminal node if the maximum depth is exceeded (A SelectColumn
node for a series strategy, and a SelectTable
or SelectPatientTable
for a table strategy). The max depth defaults to 15 and can be overridden with environment variable GENTEST_MAX_DEPTH
.
See the generative tests documentation for more details.
Please think carefully about how to test code that you are adding or changing.
We think that test maintainability is a big risk for this system, so we're trying to be very deliberate about the kind of tests that we write.
You should follow these guidelines and raise it with the rest of the team for discussion if you think that they are problematic.
Contrary to practice in some quarters we allow disk access by unit tests because it doesn't seem to cause any significant slow-down in those tests at the moment.
We'll keep this under review.
Logging is very verbose and is turned off by default in tests. To turn it on, set the
environment variable LOG_SQL=1
and pass the -s
option to turn
off log capture in pytest.
LOG_SQL=1 just test-all -s
The files for test categories that target individual modules (for example unit and integration tests) are organized into roughly the same directory structure as the ehrql
package itself.
Generally a module ehrql.foo
will have a corresponding test file like tests/unit/test_foo.py
.
However we do not stick slavishly to this: where appropriate we may collapse tests for submodules like ehrql.foo.{bar,bam}
into a single test file like tests/unit/test_foo.py
,
or break tests for a module like ehrql.foo
into multiple test files like tests/unit/foo/test_{one,another}_aspect.py
.
Test categories that run against the ehrQL as a whole or against multiple components (for example spec and acceptance tests) have their own internal structure.
Our approach to code coverage is to fail the build with less than 100% coverage, but be reasonably liberal about allowing lines to be marked as not requiring coverage.
If you make a change that results in a newly un-covered line, you should make a good attempt to test it and expect to have to justify any omissions to PR reviewers;
but for genuinely hard or impossible to hit cases it's okay to mark them as no cover
.
Any no cover
pragmas should include a note explaining why the code can't be hit.
Common cases are configured in pyproject.toml
with their own explanations.
For tests that need to run against a database, we run the database in a Docker container.
Each run of the tests starts the database if it's not already running and then leaves it running at the end to speed up future runs.
(Each test cleans out the schema to avoid pollution.)
There is a just
command to remove the database containers:
just remove-database-containers
Set the environment variable LOG_SQL=1
(or anything non-empty) to get all SQL queries logged to the console. To get SQL queries in test runs, also use -s
to turn
off log capture in pytest.
ehrQL is responsible for enforcing certain security boundaries within the OpenSAFELY platform. These are narrowly defined and the sections of the code which handle them are small and well-contained, so the vast majority of changes to ehrQL will not go anywhere near them. Nevertheless, it's important that anyone writing or reviewing ehrQL code be aware of these so they know to be alert for changes which could possibly have an impact.
ehrQL only accesses sensitive patient data while running inside a secure OpenSAFELY job processing pipeline, which means that all the most critical security properties are already enforced. This leaves ehrQL itself with two responsibilities:
* Ensure that each ehrQL job can only access the data to which it is supposed to have access.
* Ensure that patient data is only written to the locations at which it is supposed to be written.
These considerations affect three areas of the codebase.
Users interact with ehrQL by writing Python code which builds a graph of query model objects describing the data to be selected. To ensure that the user's Python code has no direct access to data or ability to modify the environment we run it in a separate, highly restricted process and retrieve a JSON-serialised specification of the query graph. This is all handled by the loaders
module.
Functions which need to evaluate user-supplied code should always use the methods provided in the loaders
module and never import user code directly. (Given the fiddly process needed to import user code it is unlikely anyone would attempt to do this without looking for the pre-defined loader
methods in any case.)
The Backend
class provides a modify_dataset()
hook which allows the backend to add additional restrictions to the user's query to control what data it returns. Any changes to the query processing workflow must ensure that this hook continues to be called. This is currently enforced by an integration test which ensures that the TPPBackend.modify_dataset()
hook continues to behave as expected.
The logs which ehrQL produces are treated as being at a different privacy level from the outputs it writes to disk. It is therefore important that the logs themselves never contain individual patient data. Fortunately this property is relatively easy to maintain because so few parts of the codebase deal directly with patient data. Data is retrieved by the execute_query_with_results()
method on the base query engine. This returns an iterator of rows of data. Functions which consume rows of this iterator must avoid logging any values obtained from the rows. (Functions which merely wrap the iterator without consuming it will never have a reference to any patient data and so are not at risk in the same way.)
The key parts of the codebase which deals with individual rows of data are the file_formats
module, which handles writing files to disk, and the sqlalchemy_exec_utils
module, which handles batch fetching of results. Careful attention must be paid to any log calls in this modules to ensure that we are not logging individual patient data.
Note that "logs" here includes everything written to stdout/stderr, including calls to print
, not just lines written using Python's logging mechanisms.
Starting with version 4.0, Bash is licenced under GPLv3.
Because of this, macOS still ships with version 3.2, which is incompatible with some scripts in this repository.
We recommend using homebrew to install a more recent version, ensure that the new version is on your $PATH
, and restart your Terminal/shell session if necessary.
brew install bash
We previously used mypy and type annotations to perform correctness checking of the code base.
However, we made the decision to remove this stack after finding it was not a good fit for large parts of the code base.
This does not mean we've abandoned type annotations entirely.
The query_model
module still makes heavy use of them and implements its own runtime checking to enforce them.
And developers should feel free to use them wherever this aids clarity vs a docstring or a comment.
Dataclasses have also retained their annotations to avoid initialising all fields with None.
The documentation in this repository forms part of the main OpenSAFELY documentation.
It can also be built as a standalone documentation site with MkDocs to preview content changes, by running:
just docs-serve
⚠️ The documentation will look slightly different from OpenSAFELY's. Relative links to
sections of the main documentation outside of the /ehrql sections will not work (although
a scheduled Github Action runs overnight to check them).
We may be able to improve this later, depending on the behaviour of the mkdocs plugin that
we use: see https://github.com/opensafely-core/ehrql/issues/1126
These are handled in the main OpenSAFELY documentation repository.
If you need to redirect URLs —
and this should be fairly infrequent —
make any changes to the _redirects
file in the main documentation repository,
and test them in a preview there.
ehrQL documentation is located in the docs directory. Local configuration is
specified in the mkdocs.yml
located at the repo root.
The docs/
directory contains some files which are generated from the ehrql code and from
other documentation files. Specifically these are files at:
The process for generating these files is described below.
When the main OpenSAFELY documentation is built, it imports the ehrql docs/
directory
and builds it within the main documentation site. This assumes that all generated documentation
has been updated already (see below for a description of pre-commit hooks and Github Actions
that mechanisms that check this happens).
opensafely/documentation
repository:The most likely use case is including the glossary.md
from the parent documentation.
To do so, use a slightly different snippet syntax:
!!! parent_snippet:'includes/glossary.md'
Some ehrQL documentation is generated from code in this repo.
See the spec tests docs for further information on writing tests that
contribute to the ehrQL docs.
An intermediate step generates the markdown files that are included in the documentation, which are located in docs/includes/generated_docs
.
To generate these files, run:
just generate-docs
If any of the files have changed you should include them in your PR.
To verify the documentation is up to date, run:
just docs-check-generated-docs-are-current
This will display a diff if there are any uncommitted changes to the generated markdown files. It also runs as a CI step, and will fail if there are changes that need to be committed to the repo.
All of the example tests can be run with:
just test-docs-examples
Dataset and measures definitions may be included:
* inline in Markdown files in docs/
,
labelled as code blocks with the ehrql
syntax,
* or as Python .py
files in docs/
.
codelist_from_csv()
For testing examples,
codelist_from_csv()
is currently patched out to work without any CSV,
nor are codelist codes validated.
The function signature of codelist_from_csv()
calls from examples is checked.
This may be improved in future to make the testing more rigorous;
see #1694.
Examples in the documentation Markdown source will be tested as part of the test suite
if you place complete examples in a code block with the ehrql
syntax label: ```ehrql
This will still highlight the code as if it were Python.
⚠️ The ehrql
syntax label is for inline and complete ehrQL blocks only.
We use the SuperFences extension for extracting Markdown fences.
Refer to the SuperFences documentation for more details of the fence format.
Python files in the docs/
directory are assumed to be working ehrQL dataset or measures definitions.
They are also tested in the test suite.
If included in the documentation using the snippet syntax,
they must be used with a python
syntax label.
(If they were labelled as ehrql
,
the snippet line itself would be extracted from the Markdown,
and treated as a dataset definition.)
Merges to the main branch in this repo trigger a deployment of the main OpenSAFELY documentation via a Github Action.
This uses the DOC_WRITE_TOKEN
, which is a Github fine-grained personal access token with access to the opensafely/documentation repository and repository permissions Read access to metadata
and Read and Write access to actions
.
For notes on how ehrQL is currently deployed in the production OpenSAFELY environment (including how to deploy an updated version), please refer to the backend-server playbook.