Switch to unified view

a b/docs/how-to/define-population.md
1
## Basic population definition
2
3
You specify the patients you want to include in your dataset using the
4
`define_population()` method. For example, to include all patients born
5
in 1950 you would write something like this:
6
```ehrql
7
from ehrql import create_dataset
8
from ehrql.tables.core import patients
9
10
dataset = create_dataset()
11
dataset.define_population(patients.date_of_birth.year == 1950)
12
```
13
14
15
## Combining multiple inclusion criteria
16
17
You can combine multiple inclusion criteria using the [logical operators](../reference/language.md#BoolPatientSeries.and):
18
19
 * `&` (and)
20
 * `|` (or)
21
 * `~` (not)
22
23
For example, to include just men born in 1950 you could use the `&`
24
operator. This says that patients must match the date of birth criterion
25
**and** the sex criterion.
26
```python
27
dataset.define_population(
28
    (patients.date_of_birth.year == 1950) & (patients.sex == "male")
29
)
30
```
31
32
And, similarly you could include just women born in 1960 using:
33
```python
34
dataset.define_population(
35
    (patients.date_of_birth.year == 1960) & (patients.sex == "female")
36
)
37
```
38
39
To combine these populations together and include both men born in 1950
40
and women born in 1960 you could use the `|` operator. This says that
41
patients must match **either** the first condition **or** the second:
42
```python
43
dataset.define_population(
44
    ((patients.date_of_birth.year == 1950) & (patients.sex == "male"))
45
    | ((patients.date_of_birth.year == 1960) & (patients.sex == "female"))
46
)
47
```
48
49
!!! note "What's with all the parentheses?"
50
51
    ehrQL requires more parentheses around logical operators than you
52
    may be used to from other languages. This is a side-effect of the
53
    way the Python language (in which ehrQL is implemented) happens to
54
    work, but it is good practice in any case to be explicit about how
55
    you expect logical operations to be grouped together. If you miss
56
    out any required parentheses ehrQL should give you an error message
57
    explaining how to fix your code.
58
59
60
## Excluding patients from your dataset
61
62
To exclude patients matching a certain condition from your dataset you
63
can use the "and not" pattern. That is, you can write your population
64
definition in the form:
65
66
    inclusion_criteria & ~exclusion_criteria
67
68
For example, to include patients born in 1950 and exclude patients
69
who died before 2020 you could write:
70
```python
71
dataset.define_population(
72
    (patients.date_of_birth.year == 1950) & ~(patients.date_of_death.year < 2020)
73
)
74
```