Switch to unified view

a b/docs/reference/language.md
1
## Dataset
2
3
---8<-- 'includes/generated_docs/language__dataset.md'
4
5
---
6
7
8
## Frames
9
10
Frames are the starting point for building any query in ehrQL. You can
11
think of a Frame as being like a table in a database, in that it
12
contains multiple rows and multiple columns. But a Frame can have
13
operations applied to it like filtering or sorting to produce a new
14
Frame.
15
16
You don't need to define any Frames yourself. Instead you import them
17
from the various [schemas](schemas.md) available in `ehrql.tables` e.g.
18
```py
19
from ehrql.tables.core import patients
20
```
21
22
Frames have columns which you can access as attributes on the Frame e.g.
23
```py
24
dob = patients.date_of_birth
25
```
26
27
The [schema](schemas.md) documentation contains the full list of
28
available columns for each Frame. For example, see
29
[`ehrql.tables.core.patients`](schemas/core.md/#patients).
30
31
Accessing a column attribute on a Frame produces a [Series](#series),
32
which are documented elsewhere below.
33
34
Some Frames contain at most one row per patient, we call these
35
PatientFrames; others can contain multiple rows per patient, we call
36
these EventFrames.
37
38
---8<-- 'includes/generated_docs/language__frames.md'
39
40
---
41
42
43
## Series
44
45
A Series represents a column of values of a certain type. Some Series
46
contain at most one value per patient, we call these PatientSeries;
47
others can contain multiple values per patient, we call these
48
EventSeries. Values can be NULL (i.e. missing) but a Series can never
49
mix values of different types.
50
51
---8<-- 'includes/generated_docs/language__series.md'
52
53
---
54
55
56
## Date Arithmetic
57
58
ehrQL supports adding and subtracting durations from dates e.g.
59
`date_of_admission + days(10)` or `date_of_discharge - weeks(6)`. It
60
also supports finding the difference between dates e.g.
61
```py
62
days_in_hospital = (date_of_discharge - date_of_admission).days
63
```
64
65
When working with dates you should generally prefer using days or weeks
66
as units, rather than months or years, unless there's a specific reason
67
you care about the calendar. Adding or subtracting days or weeks to a
68
date is a simple, unambiguous process; adding years or months introduces
69
ambiguities and complexities as neither unit is a consistent length (see
70
[section below](#ambiguous-dates) for details). So, for example, unless
71
there is specific epidemiological significance to whether an event
72
happened exactly three calendar months ago, it is better to say "90
73
days" rather than "3 months".
74
75
Additionally, some dates in the patient data (e.g. dates of birth) have
76
been rounded to the first of the month for privacy purposes. It's
77
pointless applying precise calendar arithmetic to such dates.
78
79
**Tip**: it can be clearer to readers to write time periods as a product
80
rather than just a number e.g. it's easier to see that `days(5 * 365)`
81
is approximately five years than it is with `days(1825)`.
82
83
84
#### Ambiguous Dates
85
86
Adding years or months to a date sometimes has a clear answer e.g. 1
87
January 2001 is exactly one calendar year later than 1 January 2000, and
88
1 February is exactly one calendar month later. But not all cases are
89
clear, for instance: which day is exactly one year after 29 February
90
2000? There is no 29 February 2001, so should it be 28 February or 1
91
March? And which day is exactly one calendar month after 31 August?
92
There is no 31 September, so should it be 30 September or 1 October?
93
Different databases give different answers here.
94
95
ehrQL takes the approach that we consistently round *up* to the next
96
date. That is, whenever the naïve calculation takes us to a date which
97
doesn't exist (like 29 February 2001 or 31 September) we always take the
98
next *largest* date (1 March 2001, or 1 October).
99
100
As different databases take different approaches here, ehrQL has to go
101
to some lengths to ensure that we get the same results on every database
102
we support. This introduces complexity into the generated SQL which may
103
have performance costs in some circumstances. This is another reason to
104
prefer unambiguous units of days and weeks where possible.
105
106
---8<-- 'includes/generated_docs/language__date_arithmetic.md'
107
108
---
109
110
111
## Codelists
112
113
---8<-- 'includes/generated_docs/language__codelists.md'
114
115
---
116
117
118
## Functions
119
120
---8<-- 'includes/generated_docs/language__functions.md'
121
122
---
123
124
125
## Measures
126
127
Measures are used for calculating the ratio of one quantity to another
128
as it varies over time, and broken down by different demographic (or
129
other) groupings.
130
131
---8<-- 'includes/generated_docs/language__measures.md'