[e988c2]: / docs / includes / generated_docs / language__dataset.md

Download this file

109 lines (83 with data), 3.7 kB


create_dataset()

A dataset defines the patients you want to include in your population and the
variables you want to extract for them.

A dataset definition file must define a dataset called dataset:

dataset = create_dataset()

Add variables to the dataset as attributes:

dataset.age = patients.age_on("2020-01-01")


class Dataset()

To create a dataset use the create_dataset function.

define_population(population_condition) 🔗

Define the condition that patients must meet to be included in the Dataset, in
the form of a boolean patient series.

Example usage:

dataset.define_population(patients.date_of_birth < "1990-01-01")

For more detail see the how-to guide on defining
populations
.

add_column(column_name, ehrql_query) 🔗

Add a column to the dataset.

column_name

The name of the new column, as a string.

ehrql_query

An ehrQL query that returns one row per patient.

Example usage:

dataset.add_column("age", patients.age_on("2020-01-01"))

Using .add_column is equivalent to = for adding a single column
but can also be used to add multiple columns, for example by iterating
over a dictionary. For more details see the guide on
"How to assign multiple columns to a dataset programmatically".

configure_dummy_data(population_size=10, legacy=False, timeout=60, additional_population_constraint=None) 🔗

Configure the dummy data to be generated.

population_size

Maximum number of patients to generate.

Note that you may get fewer patients than this if the generator runs out of time
– see timeout below.

legacy

Use legacy dummy data.

timeout

Maximum time in seconds to spend generating dummy data.

additional_population_constraint

An additional ehrQL query that can be used to constrain the population that will
be selected for dummy data. This is incompatible with legacy mode.

For example, if you wanted to ensure that two dates appear in a particular order in your
dummy data, you could add additional_population_constraint = dataset.first_date < dataset.second_date.

You can also combine constraints with & as normal in ehrQL.
E.g. additional_population_constraint = patients.sex.is_in(['male', 'female']) & ( patients.age_on(some_date) < 80) would give you dummy data consisting of only men
and women who were under the age of 80 on some particular date.

Example usage:

dataset.configure_dummy_data(population_size=10000)