|
a |
|
b/docs/how-to/dummy-measures-data.md |
|
|
1 |
# How to use dummy data in an ehrQL measures definition |
|
|
2 |
|
|
|
3 |
Refer to the [Measures reference documentation](../reference/language.md#measures) for more |
|
|
4 |
information on how to use measures. |
|
|
5 |
|
|
|
6 |
Similarly to [dataset definitions](dummy-data.md), there are also three ways to use dummy |
|
|
7 |
data with a measures definition in ehrQL. |
|
|
8 |
|
|
|
9 |
1. [Let ehrQL generate dummy measures from your measures definition](#let-ehrql-generate-dummy-measures-from-your-measures-definition) |
|
|
10 |
|
|
|
11 |
1. [Supply your own dummy measures](#supply-your-own-dummy-measures) |
|
|
12 |
|
|
|
13 |
1. [Supply your own dummy tables](#supply-your-own-dummy-tables) |
|
|
14 |
|
|
|
15 |
|
|
|
16 |
## Let ehrQL generate dummy measures from your measures definition |
|
|
17 |
|
|
|
18 |
You do not need to add anything to the measures definition itself in order to generate a dummy |
|
|
19 |
dataset in this way. ehrQL will use the measures definition to set up dummy data and generate |
|
|
20 |
matching patients. |
|
|
21 |
|
|
|
22 |
By default, ten patients will be generated in a dummy measures output. If you need to increase this number, you can configure it in the measures definition with: |
|
|
23 |
|
|
|
24 |
``` |
|
|
25 |
measures.configure_dummy_data(population_size=1000) |
|
|
26 |
``` |
|
|
27 |
|
|
|
28 |
:warning: Increasing the population size will increase the time required to generate the |
|
|
29 |
measures. |
|
|
30 |
|
|
|
31 |
|
|
|
32 |
## Supply your own dummy measures |
|
|
33 |
|
|
|
34 |
You can provide a dummy measures file in the following formats. |
|
|
35 |
|
|
|
36 |
|Format |File extension| |
|
|
37 |
|--------------|--------------| |
|
|
38 |
|CSV |.csv | |
|
|
39 |
|Compressed CSV|.csv.gz | |
|
|
40 |
|Arrow |.arrow | |
|
|
41 |
|
|
|
42 |
:warning: Your file must have the relevant file extension shown in the table |
|
|
43 |
above. |
|
|
44 |
|
|
|
45 |
For example, take this simple measures definition: |
|
|
46 |
|
|
|
47 |
```ehrql |
|
|
48 |
from ehrql import create_measures, years |
|
|
49 |
from ehrql.measures import INTERVAL |
|
|
50 |
from ehrql.tables.core import patients, clinical_events |
|
|
51 |
|
|
|
52 |
events_in_interval = clinical_events.where(clinical_events.date.is_during(INTERVAL)) |
|
|
53 |
had_event = events_in_interval.exists_for_patient() |
|
|
54 |
intervals = years(2).starting_on("2020-01-01") |
|
|
55 |
measures = create_measures() |
|
|
56 |
|
|
|
57 |
measures.define_measure( |
|
|
58 |
"had_event_by_sex", |
|
|
59 |
numerator=had_event, |
|
|
60 |
denominator=patients.exists_for_patient(), |
|
|
61 |
group_by={"sex": patients.sex}, |
|
|
62 |
intervals=intervals, |
|
|
63 |
) |
|
|
64 |
``` |
|
|
65 |
|
|
|
66 |
And this dummy measures, in a CSV file named `dummy_measures.csv`: |
|
|
67 |
|
|
|
68 |
|measure|interval_start|interval_end|ratio|numerator|denominator|sex| |
|
|
69 |
|-------|--------------|------------|-----|---------|-----------|---| |
|
|
70 |
|had_event_by_sex|2020-01-01|2020-12-31|0.25|2|8|female| |
|
|
71 |
|had_event_by_sex|2020-01-01|2020-12-31|0.5|3|6|male| |
|
|
72 |
|had_event_by_sex|2021-01-01|2021-12-31|0.1|1|10|female| |
|
|
73 |
|had_event_by_sex|2021-01-01|2021-12-31|0.0|0|2|male| |
|
|
74 |
|
|
|
75 |
|
|
|
76 |
Run the measures definition with the dummy measures output file: |
|
|
77 |
|
|
|
78 |
``` |
|
|
79 |
opensafely exec ehrql:v1 generate-measres measures_definition.py --dummy-data-file dummy_measures.csv |
|
|
80 |
``` |
|
|
81 |
|
|
|
82 |
Now, instead of generated dummy measures output, you'll see the data from the dummy data file that you provided. |
|
|
83 |
|
|
|
84 |
 |
|
|
85 |
|
|
|
86 |
### Dummy measures errors |
|
|
87 |
|
|
|
88 |
ehrQL will check the column names, types and categorical values in your dummy measures output file. If errors are found, they will be shown in the terminal output. |
|
|
89 |
|
|
|
90 |
|
|
|
91 |
## Supply your own dummy tables |
|
|
92 |
|
|
|
93 |
A measures definition uses the same underlying data tables as a dataset definition. As such, |
|
|
94 |
you can use [the same process](dummy-data.md#supply-your-own-dummy-dataset) to supply dummy data tables for a measures definition. |