A measure definition file must define a collection of measures called measures
.
measures = create_measures()
Add measures to the collection using define_measure
:
measures.define_measure(
name="adult_proportion",
numerator=patients.age_on(INTERVAL.start_date) >=18,
denominator=patients.exists_for_patient()
)
To create a collection of measures use the create_measures
function.
Add a measure to the collection of measures to be generated.
name
The name of the measure, as a string. Only used to identify the measure in the
output. Must contain only alphanumeric and underscore characters and must
start with a letter.
numerator
The numerator definition, which must be a patient series but can be either
boolean or integer.
denominator
The denominator definition, which must be a patient series but can be either
boolean or integer.
group_by
Optional groupings to break down the results by. If supplied, must be a
dictionary of the form:
{
"group_name": group_definition,
...
}
each group_name becomes a column in the output. It must contain only
alphanumeric and underscore characters and must start with a letter. It also
must not clash with any reserved column names like "numerator" or "ratio".
each group_definition must be a categorical patient series (i.e. a patient
series which takes only a fixed set of values).
intervals
A list of start/end date pairs over which to evaluate the measures. These can be
most conveniently generated using the starting_on()
/ending_on()
methods on
years
, months
, and weeks
e.g.
intervals = months(12).starting_on("2020-01-01")
The numerator
, denominator
and intervals
arguments can be omitted if
default values for them have been set using
define_defaults()
.
Define default values for a collection of measures. Useful to reduce
repetition when defining several measures which share common arguments.
Example usage:
measures.define_defaults(
intervals=months(6).starting_on("2020-01-01"),
)
Note that you can only define a single set of defaults and attempting to call
this method more than once is an error.
Configure the dummy data to be generated.
population_size
Maximum number of patients to generate.
Note that you may get fewer patients than this if the generator runs out of time
– see timeout
below.
legacy
Use legacy dummy data.
timeout
Maximum time in seconds to spend generating dummy data.
additional_population_constraint
An additional ehrQL query that can be used to constrain the population that will
be selected for dummy data. This is incompatible with legacy mode.
For example, if you wanted to ensure that two dates appear in a particular order in your
dummy data, you could add additional_population_constraint = dataset.first_date <
dataset.second_date
.
You can also combine constraints with &
as normal in ehrQL.
E.g. additional_population_constraint = patients.sex.is_in(['male', 'female']) & (
patients.age_on(some_date) < 80)
would give you dummy data consisting of only men
and women who were under the age of 80 on some particular date.
Example usage:
measures.configure_dummy_data(population_size=10000)
Configure disclosure control.
By default, numerators and denominators are subject to disclosure control.
First, values less than or equal to seven are replaced with zero (suppressed);
then, values are rounded to the nearest five.
To disable disclosure control:
measures.configure_disclosure_control(enabled=False)
For more information about disclosure control in OpenSAFELY, please see the
"Updated disclosure control
guidance" page.
This is a placeholder value to be used when defining numerator
, denominator
and
group_by
columns in a measure. This allows these definitions to be written once and
then be automatically evaluated over multiple different intervals. Can be used just
like any pair of dates in ehrQL.
Example usage:
clinical_events.date.is_during(INTERVAL)
Placeholder for the start date (inclusive) of the interval. Can be used like any other
date.
Example usage:
clinical_events.date.is_on_or_after(INTERVAL.start_date)
Placeholder for the end date (inclusive) of the interval. Can be used like any other
date.
Example usage:
clinical_events.date.is_on_or_before(INTERVAL.end_date)