|
a |
|
b/docs/how-to/define-population.md |
|
|
1 |
## Basic population definition |
|
|
2 |
|
|
|
3 |
You specify the patients you want to include in your dataset using the |
|
|
4 |
`define_population()` method. For example, to include all patients born |
|
|
5 |
in 1950 you would write something like this: |
|
|
6 |
```ehrql |
|
|
7 |
from ehrql import create_dataset |
|
|
8 |
from ehrql.tables.core import patients |
|
|
9 |
|
|
|
10 |
dataset = create_dataset() |
|
|
11 |
dataset.define_population(patients.date_of_birth.year == 1950) |
|
|
12 |
``` |
|
|
13 |
|
|
|
14 |
|
|
|
15 |
## Combining multiple inclusion criteria |
|
|
16 |
|
|
|
17 |
You can combine multiple inclusion criteria using the [logical operators](../reference/language.md#BoolPatientSeries.and): |
|
|
18 |
|
|
|
19 |
* `&` (and) |
|
|
20 |
* `|` (or) |
|
|
21 |
* `~` (not) |
|
|
22 |
|
|
|
23 |
For example, to include just men born in 1950 you could use the `&` |
|
|
24 |
operator. This says that patients must match the date of birth criterion |
|
|
25 |
**and** the sex criterion. |
|
|
26 |
```python |
|
|
27 |
dataset.define_population( |
|
|
28 |
(patients.date_of_birth.year == 1950) & (patients.sex == "male") |
|
|
29 |
) |
|
|
30 |
``` |
|
|
31 |
|
|
|
32 |
And, similarly you could include just women born in 1960 using: |
|
|
33 |
```python |
|
|
34 |
dataset.define_population( |
|
|
35 |
(patients.date_of_birth.year == 1960) & (patients.sex == "female") |
|
|
36 |
) |
|
|
37 |
``` |
|
|
38 |
|
|
|
39 |
To combine these populations together and include both men born in 1950 |
|
|
40 |
and women born in 1960 you could use the `|` operator. This says that |
|
|
41 |
patients must match **either** the first condition **or** the second: |
|
|
42 |
```python |
|
|
43 |
dataset.define_population( |
|
|
44 |
((patients.date_of_birth.year == 1950) & (patients.sex == "male")) |
|
|
45 |
| ((patients.date_of_birth.year == 1960) & (patients.sex == "female")) |
|
|
46 |
) |
|
|
47 |
``` |
|
|
48 |
|
|
|
49 |
!!! note "What's with all the parentheses?" |
|
|
50 |
|
|
|
51 |
ehrQL requires more parentheses around logical operators than you |
|
|
52 |
may be used to from other languages. This is a side-effect of the |
|
|
53 |
way the Python language (in which ehrQL is implemented) happens to |
|
|
54 |
work, but it is good practice in any case to be explicit about how |
|
|
55 |
you expect logical operations to be grouped together. If you miss |
|
|
56 |
out any required parentheses ehrQL should give you an error message |
|
|
57 |
explaining how to fix your code. |
|
|
58 |
|
|
|
59 |
|
|
|
60 |
## Excluding patients from your dataset |
|
|
61 |
|
|
|
62 |
To exclude patients matching a certain condition from your dataset you |
|
|
63 |
can use the "and not" pattern. That is, you can write your population |
|
|
64 |
definition in the form: |
|
|
65 |
|
|
|
66 |
inclusion_criteria & ~exclusion_criteria |
|
|
67 |
|
|
|
68 |
For example, to include patients born in 1950 and exclude patients |
|
|
69 |
who died before 2020 you could write: |
|
|
70 |
```python |
|
|
71 |
dataset.define_population( |
|
|
72 |
(patients.date_of_birth.year == 1950) & ~(patients.date_of_death.year < 2020) |
|
|
73 |
) |
|
|
74 |
``` |