a b/docs/tutorials/detecting-dates.md
1
# Detecting dates
2
3
We now know how to match a terminology and qualify detected entities, which covers most use cases for a typical medical NLP project.
4
In this tutorial, we'll see how to use EDS-NLP to detect and normalise date mentions using `eds.dates`.
5
6
This can have many applications, for dating medical events in particular.
7
The `eds.consultation_dates` component, for instance,
8
combines the date detection capabilities with a few simple patterns to detect the date of the consultation, when mentioned in clinical reports.
9
10
## Dates in clinical notes
11
12
Consider the following example:
13
14
=== "French"
15
16
    ```
17
    Le patient est admis le 21 janvier pour une douleur dans le cou.
18
    Il se plaint d'une douleur chronique qui a débuté il y a trois ans.
19
    ```
20
21
=== "English"
22
23
    ```
24
    The patient is admitted on January 21st for a neck pain.
25
    He complains about chronique pain that started three years ago.
26
    ```
27
28
Clinical notes contain many different types of dates. To name a few examples:
29
30
| Type     | Description                         | Examples                                         |
31
| -------- | ----------------------------------- | ------------------------------------------------ |
32
| Absolute | Explicit date                       | `2022-03-03`                                     |
33
| Partial  | Date missing the day, month or year | `le 3 janvier/on January 3rd`, `en 2021/in 2021` |
34
| Relative | Relative dates                      | `hier/yesterday`, `le mois dernier/last month`   |
35
| Duration | Durations                           | `pendant trois mois/for three months`            |
36
37
!!! warning
38
39
    We show an English example just to explain the issue.
40
    EDS-NLP remains a **French-language** medical NLP library.
41
42
## Extracting dates
43
44
The followings snippet adds the `eds.dates` component to the pipeline:
45
46
```python
47
import edsnlp, edsnlp.pipes as eds
48
49
nlp = edsnlp.blank("eds")
50
nlp.add_pipe(eds.dates())  # (1)
51
52
text = (
53
    "Le patient est admis le 21 janvier pour une douleur dans le cou.\n"
54
    "Il se plaint d'une douleur chronique qui a débuté il y a trois ans."
55
)
56
57
# Detecting dates becomes trivial
58
doc = nlp(text)
59
60
# Likewise, accessing detected dates is hassle-free
61
dates = doc.spans["dates"]  # (2)
62
```
63
64
1. The date detection component is declared with `eds.dates`
65
2. Dates are saved in the `#!python doc.spans["dates"]` key
66
67
After this, accessing dates and there normalisation becomes trivial:
68
69
```python
70
# ↑ Omitted code above ↑
71
72
dates  # (1)
73
# Out: [21 janvier, il y a trois ans]
74
```
75
76
1. `dates` is a list of spaCy `Span` objects.
77
78
## Normalisation
79
80
We can review each date and get its normalisation:
81
82
| `date.text`        | `date._.date`                               |
83
| ------------------ | ------------------------------------------- |
84
| `21 janvier`       | `#!python {"day": 21, "month": 1}`          |
85
| `il y a trois ans` | `#!python {"direction": "past", "year": 3}` |
86
87
Dates detected by the pipeline component are parsed into a dictionary-like object.
88
It includes every information that is actually contained in the text.
89
90
To get a more usable representation, you may call the `to_datetime()` method.
91
If there's enough information, the date will be represented
92
in a `datetime.datetime` or `datetime.timedelta` object. If some information is missing,
93
It will return `None`.
94
Alternatively for this case, you can optionally set to `True` the parameter `infer_from_context` and
95
you may also give a value for `note_datetime`.
96
97
!!! note "Date normalisation"
98
99
    Since dates can be missing some information (eg `en août`), we refrain from
100
    outputting a `datetime` object in that case. Doing so would amount to guessing,
101
    and we made the choice of letting you decide how you want to handle missing dates.
102
103
## What next?
104
105
The `eds.dates` pipe component's role is merely to detect and normalise dates.
106
It is the user's responsibility to use this information in a downstream application.
107
108
For instance, you could use this pipeline to date medical entities. Let's do that.
109
110
### A medical event tagger
111
112
Our pipeline will detect entities and events separately,
113
and we will post-process the output `Doc` object to determine
114
whether a given entity can be linked to a date.
115
116
```python
117
import edsnlp, edsnlp.pipes as eds
118
from datetime import datetime
119
120
nlp = edsnlp.blank("eds")
121
nlp.add_pipe(eds.sentences())
122
nlp.add_pipe(eds.dates())
123
nlp.add_pipe(
124
    eds.matcher(
125
        regex=dict(admission=["admissions?", "admise?", "prise? en charge"]),
126
        attr="LOWER",
127
    )
128
)
129
130
text = (
131
    "Le patient est admis le 12 avril pour une douleur "
132
    "survenue il y a trois jours. "
133
    "Il avait été pris en charge l'année dernière. "
134
    "Il a été diagnostiqué en mai 1995."
135
)
136
137
doc = nlp(text)
138
```
139
140
At this point, the document is ready to be post-processed: its `ents` and `#!python spans["dates"]` are populated:
141
142
```python
143
# ↑ Omitted code above ↑
144
145
doc.ents
146
# Out: (admis, pris en charge)
147
148
doc.spans["dates"]
149
# Out: [12 avril, il y a trois jours, l'année dernière, mai 1995]
150
151
note_datetime = datetime(year=1999, month=8, day=27)
152
153
for i, date in enumerate(doc.spans["dates"]):
154
    print(
155
        i,
156
        " - ",
157
        date,
158
        " - ",
159
        date._.date.to_datetime(
160
            note_datetime=note_datetime, infer_from_context=False, tz=None
161
        ),
162
    )
163
# Out: 0  -  12 avril  -  None
164
# Out: 1  -  il y a trois jours  -  1999-08-24 00:00:00
165
# Out: 2  -  l'année dernière  -  1998-08-27 00:00:00
166
# Out: 3  -  mai 1995  -  None
167
168
169
for i, date in enumerate(doc.spans["dates"]):
170
    print(
171
        i,
172
        " - ",
173
        date,
174
        " - ",
175
        date._.date.to_datetime(
176
            note_datetime=note_datetime,
177
            infer_from_context=True,
178
            tz=None,
179
            default_day=15,
180
        ),
181
    )
182
# Out: 0  -  12 avril  -  1999-04-12 00:00:00
183
# Out: 1  -  il y a trois jours  -  1999-08-24 00:00:00
184
# Out: 2  -  l'année dernière  -  1998-08-27 00:00:00
185
# Out: 3  -  mai 1995  -  1995-05-15 00:00:00
186
```
187
188
As a first heuristic, let's consider that an entity can be linked to a date if the two are in the same
189
sentence. In the case where multiple dates are present, we'll select the closest one.
190
191
```python title="utils.py"
192
from spacy.tokens import Span
193
from typing import List, Optional
194
195
196
def candidate_dates(ent: Span) -> List[Span]:
197
    """Return every dates in the same sentence as the entity"""
198
    return [date for date in ent.doc.spans["dates"] if date.sent == ent.sent]
199
200
201
def get_event_date(ent: Span) -> Optional[Span]:
202
    """Link an entity to the closest date in the sentence, if any"""
203
204
    dates = candidate_dates(ent)  # (1)
205
206
    if not dates:
207
        return
208
209
    dates = sorted(
210
        dates,
211
        key=lambda d: min(abs(d.start - ent.end), abs(ent.start - d.end)),
212
    )
213
214
    return dates[0]  # (2)
215
```
216
217
1. Get all dates present in the same sentence.
218
2. Sort the dates, and keep the first item.
219
220
We can apply this simple function:
221
222
```python
223
import edsnlp, edsnlp.pipes as eds
224
from datetime import datetime
225
226
nlp = edsnlp.blank("eds")
227
nlp.add_pipe(eds.sentences())
228
nlp.add_pipe(eds.dates())
229
nlp.add_pipe(
230
    eds.matcher(
231
        regex=dict(admission=["admissions?", "admise?", "prise? en charge"]),
232
        attr="LOWER",
233
    )
234
)
235
236
text = (
237
    "Le patient est admis le 12 avril pour une douleur "
238
    "survenue il y a trois jours. "
239
    "Il avait été pris en charge l'année dernière."
240
)
241
242
doc = nlp(text)
243
now = datetime.now()
244
245
for ent in doc.ents:
246
    if ent.label_ != "admission":
247
        continue
248
    date = get_event_date(ent)
249
    print(
250
        f"{ent.text:<20}{date.text:<20}{date._.date.to_datetime(now).strftime('%d/%m/%Y'):<15}{date._.date.to_duration(now)}"
251
    )
252
# Out: admis               12 avril            12/04/2023     21 weeks 4 days 6 hours 3 minutes 26 seconds
253
# Out: pris en charge      l'année dernière    10/09/2022     -1 year
254
```
255
256
Which will output:
257
258
| `ent`          | `get_event_date(ent)` | `get_event_date(ent)._.date.to_datetime()` |
259
|----------------|-----------------------|--------------------------------------------|
260
| admis          | 12 avril              | `2020-04-12T00:00:00+02:00`                |
261
| pris en charge | l'année dernière      | `-1 year`                                  |