Switch to unified view

a b/docs/how-to/dummy-measures-data.md
1
# How to use dummy data in an ehrQL measures definition
2
3
Refer to the [Measures reference documentation](../reference/language.md#measures) for more
4
information on how to use measures.
5
6
Similarly to [dataset definitions](dummy-data.md), there are also three ways to use dummy
7
data with a measures definition in ehrQL.
8
9
1. [Let ehrQL generate dummy measures from your measures definition](#let-ehrql-generate-dummy-measures-from-your-measures-definition)
10
11
1. [Supply your own dummy measures](#supply-your-own-dummy-measures)
12
13
1. [Supply your own dummy tables](#supply-your-own-dummy-tables)
14
15
16
## Let ehrQL generate dummy measures from your measures definition
17
18
You do not need to add anything to the measures definition itself in order to generate a dummy
19
dataset in this way. ehrQL will use the measures definition to set up dummy data and generate
20
matching patients.
21
22
By default, ten patients will be generated in a dummy measures output. If you need to increase this number, you can configure it in the measures definition with:
23
24
```
25
measures.configure_dummy_data(population_size=1000)
26
```
27
28
:warning: Increasing the population size will increase the time required to generate the
29
measures.
30
31
32
## Supply your own dummy measures
33
34
You can provide a dummy measures file in the following formats.
35
36
|Format        |File extension|
37
|--------------|--------------|
38
|CSV           |.csv          |
39
|Compressed CSV|.csv.gz       |
40
|Arrow         |.arrow        |
41
42
:warning: Your file must have the relevant file extension shown in the table
43
above.
44
45
For example, take this simple measures definition:
46
47
```ehrql
48
from ehrql import create_measures, years
49
from ehrql.measures import INTERVAL
50
from ehrql.tables.core import patients, clinical_events
51
52
events_in_interval = clinical_events.where(clinical_events.date.is_during(INTERVAL))
53
had_event = events_in_interval.exists_for_patient()
54
intervals = years(2).starting_on("2020-01-01")
55
measures = create_measures()
56
57
measures.define_measure(
58
    "had_event_by_sex",
59
    numerator=had_event,
60
    denominator=patients.exists_for_patient(),
61
    group_by={"sex": patients.sex},
62
    intervals=intervals,
63
)
64
```
65
66
And this dummy measures, in a CSV file named `dummy_measures.csv`:
67
68
|measure|interval_start|interval_end|ratio|numerator|denominator|sex|
69
|-------|--------------|------------|-----|---------|-----------|---|
70
|had_event_by_sex|2020-01-01|2020-12-31|0.25|2|8|female|
71
|had_event_by_sex|2020-01-01|2020-12-31|0.5|3|6|male|
72
|had_event_by_sex|2021-01-01|2021-12-31|0.1|1|10|female|
73
|had_event_by_sex|2021-01-01|2021-12-31|0.0|0|2|male|
74
75
76
Run the measures definition with the dummy measures output file:
77
78
```
79
opensafely exec ehrql:v1 generate-measres measures_definition.py --dummy-data-file dummy_measures.csv
80
```
81
82
Now, instead of generated dummy measures output, you'll see the data from the dummy data file that you provided.
83
84
![A screenshot of VS Code, showing the terminal after the `opensafely exec` command was run](opensafely_exec_dummy_measures_data_file.png)
85
86
### Dummy measures errors
87
88
ehrQL will check the column names, types and categorical values in your dummy measures output file. If errors are found, they will be shown in the terminal output.
89
90
91
## Supply your own dummy tables
92
93
A measures definition uses the same underlying data tables as a dataset definition. As such,
94
you can use [the same process](dummy-data.md#supply-your-own-dummy-dataset) to supply dummy data tables for a measures definition.