ehrql / Git / Diff of /docs/reference/language.md

Models:

philipB/

ehrql

Downloads: 1

Diff of /docs/reference/language.md [000000] .. [e988c2]

Switch to unified view

 b/docs/reference/language.md
+## Dataset
+---8<-- 'includes/generated_docs/language__dataset.md'
+---
+## Frames
+Frames are the starting point for building any query in ehrQL. You can
+think of a Frame as being like a table in a database, in that it
+contains multiple rows and multiple columns. But a Frame can have
+operations applied to it like filtering or sorting to produce a new
+Frame.
+You don't need to define any Frames yourself. Instead you import them
+from the various [schemas](schemas.md) available in `ehrql.tables` e.g.
+```py
+from ehrql.tables.core import patients
+```
+Frames have columns which you can access as attributes on the Frame e.g.
+```py
+dob = patients.date_of_birth
+```
+The [schema](schemas.md) documentation contains the full list of
+available columns for each Frame. For example, see
+[`ehrql.tables.core.patients`](schemas/core.md/#patients).
+Accessing a column attribute on a Frame produces a [Series](#series),
+which are documented elsewhere below.
+Some Frames contain at most one row per patient, we call these
+PatientFrames; others can contain multiple rows per patient, we call
+these EventFrames.
+---8<-- 'includes/generated_docs/language__frames.md'
+---
+## Series
+A Series represents a column of values of a certain type. Some Series
+contain at most one value per patient, we call these PatientSeries;
+others can contain multiple values per patient, we call these
+EventSeries. Values can be NULL (i.e. missing) but a Series can never
+mix values of different types.
+---8<-- 'includes/generated_docs/language__series.md'
+---
+## Date Arithmetic
+ehrQL supports adding and subtracting durations from dates e.g.
+`date_of_admission + days(10)` or `date_of_discharge - weeks(6)`. It
+also supports finding the difference between dates e.g.
+```py
+days_in_hospital = (date_of_discharge - date_of_admission).days
+```
+When working with dates you should generally prefer using days or weeks
+as units, rather than months or years, unless there's a specific reason
+you care about the calendar. Adding or subtracting days or weeks to a
+date is a simple, unambiguous process; adding years or months introduces
+ambiguities and complexities as neither unit is a consistent length (see
+[section below](#ambiguous-dates) for details). So, for example, unless
+there is specific epidemiological significance to whether an event
+happened exactly three calendar months ago, it is better to say "90
+days" rather than "3 months".
+Additionally, some dates in the patient data (e.g. dates of birth) have
+been rounded to the first of the month for privacy purposes. It's
+pointless applying precise calendar arithmetic to such dates.
+**Tip**: it can be clearer to readers to write time periods as a product
+rather than just a number e.g. it's easier to see that `days(5 * 365)`
+is approximately five years than it is with `days(1825)`.
+#### Ambiguous Dates
+Adding years or months to a date sometimes has a clear answer e.g. 1
+January 2001 is exactly one calendar year later than 1 January 2000, and
+February is exactly one calendar month later. But not all cases are
+clear, for instance: which day is exactly one year after 29 February
+? There is no 29 February 2001, so should it be 28 February or 1
+March? And which day is exactly one calendar month after 31 August?
+There is no 31 September, so should it be 30 September or 1 October?
+Different databases give different answers here.
+ehrQL takes the approach that we consistently round *up* to the next
+date. That is, whenever the naïve calculation takes us to a date which
+doesn't exist (like 29 February 2001 or 31 September) we always take the
+next *largest* date (1 March 2001, or 1 October).
+As different databases take different approaches here, ehrQL has to go
+to some lengths to ensure that we get the same results on every database
+we support. This introduces complexity into the generated SQL which may
+have performance costs in some circumstances. This is another reason to
+prefer unambiguous units of days and weeks where possible.
+---8<-- 'includes/generated_docs/language__date_arithmetic.md'
+---
+## Codelists
+---8<-- 'includes/generated_docs/language__codelists.md'
+---
+## Functions
+---8<-- 'includes/generated_docs/language__functions.md'
+---
+## Measures
+Measures are used for calculating the ratio of one quantity to another
+as it varies over time, and broken down by different demographic (or
+other) groupings.
+---8<-- 'includes/generated_docs/language__measures.md'