|
a |
|
b/docs/reference/language.md |
|
|
1 |
## Dataset |
|
|
2 |
|
|
|
3 |
---8<-- 'includes/generated_docs/language__dataset.md' |
|
|
4 |
|
|
|
5 |
--- |
|
|
6 |
|
|
|
7 |
|
|
|
8 |
## Frames |
|
|
9 |
|
|
|
10 |
Frames are the starting point for building any query in ehrQL. You can |
|
|
11 |
think of a Frame as being like a table in a database, in that it |
|
|
12 |
contains multiple rows and multiple columns. But a Frame can have |
|
|
13 |
operations applied to it like filtering or sorting to produce a new |
|
|
14 |
Frame. |
|
|
15 |
|
|
|
16 |
You don't need to define any Frames yourself. Instead you import them |
|
|
17 |
from the various [schemas](schemas.md) available in `ehrql.tables` e.g. |
|
|
18 |
```py |
|
|
19 |
from ehrql.tables.core import patients |
|
|
20 |
``` |
|
|
21 |
|
|
|
22 |
Frames have columns which you can access as attributes on the Frame e.g. |
|
|
23 |
```py |
|
|
24 |
dob = patients.date_of_birth |
|
|
25 |
``` |
|
|
26 |
|
|
|
27 |
The [schema](schemas.md) documentation contains the full list of |
|
|
28 |
available columns for each Frame. For example, see |
|
|
29 |
[`ehrql.tables.core.patients`](schemas/core.md/#patients). |
|
|
30 |
|
|
|
31 |
Accessing a column attribute on a Frame produces a [Series](#series), |
|
|
32 |
which are documented elsewhere below. |
|
|
33 |
|
|
|
34 |
Some Frames contain at most one row per patient, we call these |
|
|
35 |
PatientFrames; others can contain multiple rows per patient, we call |
|
|
36 |
these EventFrames. |
|
|
37 |
|
|
|
38 |
---8<-- 'includes/generated_docs/language__frames.md' |
|
|
39 |
|
|
|
40 |
--- |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
## Series |
|
|
44 |
|
|
|
45 |
A Series represents a column of values of a certain type. Some Series |
|
|
46 |
contain at most one value per patient, we call these PatientSeries; |
|
|
47 |
others can contain multiple values per patient, we call these |
|
|
48 |
EventSeries. Values can be NULL (i.e. missing) but a Series can never |
|
|
49 |
mix values of different types. |
|
|
50 |
|
|
|
51 |
---8<-- 'includes/generated_docs/language__series.md' |
|
|
52 |
|
|
|
53 |
--- |
|
|
54 |
|
|
|
55 |
|
|
|
56 |
## Date Arithmetic |
|
|
57 |
|
|
|
58 |
ehrQL supports adding and subtracting durations from dates e.g. |
|
|
59 |
`date_of_admission + days(10)` or `date_of_discharge - weeks(6)`. It |
|
|
60 |
also supports finding the difference between dates e.g. |
|
|
61 |
```py |
|
|
62 |
days_in_hospital = (date_of_discharge - date_of_admission).days |
|
|
63 |
``` |
|
|
64 |
|
|
|
65 |
When working with dates you should generally prefer using days or weeks |
|
|
66 |
as units, rather than months or years, unless there's a specific reason |
|
|
67 |
you care about the calendar. Adding or subtracting days or weeks to a |
|
|
68 |
date is a simple, unambiguous process; adding years or months introduces |
|
|
69 |
ambiguities and complexities as neither unit is a consistent length (see |
|
|
70 |
[section below](#ambiguous-dates) for details). So, for example, unless |
|
|
71 |
there is specific epidemiological significance to whether an event |
|
|
72 |
happened exactly three calendar months ago, it is better to say "90 |
|
|
73 |
days" rather than "3 months". |
|
|
74 |
|
|
|
75 |
Additionally, some dates in the patient data (e.g. dates of birth) have |
|
|
76 |
been rounded to the first of the month for privacy purposes. It's |
|
|
77 |
pointless applying precise calendar arithmetic to such dates. |
|
|
78 |
|
|
|
79 |
**Tip**: it can be clearer to readers to write time periods as a product |
|
|
80 |
rather than just a number e.g. it's easier to see that `days(5 * 365)` |
|
|
81 |
is approximately five years than it is with `days(1825)`. |
|
|
82 |
|
|
|
83 |
|
|
|
84 |
#### Ambiguous Dates |
|
|
85 |
|
|
|
86 |
Adding years or months to a date sometimes has a clear answer e.g. 1 |
|
|
87 |
January 2001 is exactly one calendar year later than 1 January 2000, and |
|
|
88 |
1 February is exactly one calendar month later. But not all cases are |
|
|
89 |
clear, for instance: which day is exactly one year after 29 February |
|
|
90 |
2000? There is no 29 February 2001, so should it be 28 February or 1 |
|
|
91 |
March? And which day is exactly one calendar month after 31 August? |
|
|
92 |
There is no 31 September, so should it be 30 September or 1 October? |
|
|
93 |
Different databases give different answers here. |
|
|
94 |
|
|
|
95 |
ehrQL takes the approach that we consistently round *up* to the next |
|
|
96 |
date. That is, whenever the naïve calculation takes us to a date which |
|
|
97 |
doesn't exist (like 29 February 2001 or 31 September) we always take the |
|
|
98 |
next *largest* date (1 March 2001, or 1 October). |
|
|
99 |
|
|
|
100 |
As different databases take different approaches here, ehrQL has to go |
|
|
101 |
to some lengths to ensure that we get the same results on every database |
|
|
102 |
we support. This introduces complexity into the generated SQL which may |
|
|
103 |
have performance costs in some circumstances. This is another reason to |
|
|
104 |
prefer unambiguous units of days and weeks where possible. |
|
|
105 |
|
|
|
106 |
---8<-- 'includes/generated_docs/language__date_arithmetic.md' |
|
|
107 |
|
|
|
108 |
--- |
|
|
109 |
|
|
|
110 |
|
|
|
111 |
## Codelists |
|
|
112 |
|
|
|
113 |
---8<-- 'includes/generated_docs/language__codelists.md' |
|
|
114 |
|
|
|
115 |
--- |
|
|
116 |
|
|
|
117 |
|
|
|
118 |
## Functions |
|
|
119 |
|
|
|
120 |
---8<-- 'includes/generated_docs/language__functions.md' |
|
|
121 |
|
|
|
122 |
--- |
|
|
123 |
|
|
|
124 |
|
|
|
125 |
## Measures |
|
|
126 |
|
|
|
127 |
Measures are used for calculating the ratio of one quantity to another |
|
|
128 |
as it varies over time, and broken down by different demographic (or |
|
|
129 |
other) groupings. |
|
|
130 |
|
|
|
131 |
---8<-- 'includes/generated_docs/language__measures.md' |