Switch to unified view

a b/docs/tutorials/qualifying-entities.md
1
# Qualifying entities
2
3
In the previous tutorial, we saw how to match a terminology on a text. Using the `#!python doc.ents` attribute, we can check whether a document mentions a concept of interest to build a cohort or
4
describe patients.
5
6
## The issue
7
8
However, consider the classical example where we look for the `diabetes` concept:
9
10
=== "French"
11
12
    ```
13
    Le patient n'est pas diabétique.
14
    Le patient est peut-être diabétique.
15
    Le père du patient est diabétique.
16
    ```
17
18
=== "English"
19
20
    ```
21
    The patient is not diabetic.
22
    The patient could be diabetic.
23
    The patient's father is diabetic.
24
    ```
25
26
None of these expressions should be used to build a cohort: the detected entity is either negated, speculative, or does not concern the patient themself. That's why we need to **qualify the matched
27
entities**.
28
29
!!! warning
30
31
    We show an English example just to explain the issue.
32
    EDS-NLP remains a **French-language** medical NLP library.
33
34
## The solution
35
36
We can use EDS-NLP's qualifier pipes to achieve that. Let's add specific components to our pipeline to detect these three modalities.
37
38
### Adding qualifiers
39
40
Adding qualifier pipes is straightforward:
41
42
```python hl_lines="25-29"
43
import edsnlp, edsnlp.pipes as eds
44
45
text = (
46
    "Motif de prise en charge : probable pneumopathie à COVID19, "
47
    "sans difficultés respiratoires\n"
48
    "Le père du patient est asthmatique."
49
)
50
51
regex = dict(
52
    covid=r"(coronavirus|covid[-\s]?19)",
53
    respiratoire=r"respiratoires?",
54
)
55
terms = dict(respiratoire="asthmatique")
56
57
nlp = edsnlp.blank("eds")
58
nlp.add_pipe(
59
    eds.matcher(
60
        regex=regex,
61
        terms=terms,
62
        attr="LOWER",
63
    ),
64
)
65
66
nlp.add_pipe(eds.sentences())  # (1)
67
nlp.add_pipe(eds.negation())  # Negation component
68
nlp.add_pipe(eds.hypothesis())  # Speculation pipe
69
nlp.add_pipe(eds.family())  # Family context detection
70
```
71
72
1. Qualifiers pipes need sentence boundaries to be set (see the [specific documentation](../pipes/qualifiers/index.md) for detail).
73
74
This code is complete, and should run as is.
75
76
### Reading the results
77
78
Let's output the results as a pandas DataFrame for better readability:
79
80
```python hl_lines="2 34-48"
81
import edsnlp, edsnlp.pipes as eds
82
import pandas as pd
83
84
text = (
85
    "Motif de prise en charge : probable pneumopathie à COVID19, "
86
    "sans difficultés respiratoires\n"
87
    "Le père du patient est asthmatique."
88
)
89
90
regex = dict(
91
    covid=r"(coronavirus|covid[-\s]?19)",
92
    respiratoire=r"respiratoires?",
93
)
94
terms = dict(respiratoire="asthmatique")
95
96
nlp = edsnlp.blank("eds")
97
nlp.add_pipe(
98
    eds.matcher(
99
        regex=regex,
100
        terms=terms,
101
        attr="LOWER",
102
    ),
103
)
104
105
nlp.add_pipe(eds.sentences())
106
107
nlp.add_pipe(eds.negation())  # Negation component
108
nlp.add_pipe(eds.hypothesis())  # Speculation pipe
109
nlp.add_pipe(eds.family())  # Family context detection
110
111
doc = nlp(text)
112
113
# Extraction as a pandas DataFrame
114
entities = []
115
for ent in doc.ents:
116
    d = dict(
117
        lexical_variant=ent.text,
118
        label=ent.label_,
119
        negation=ent._.negation,
120
        hypothesis=ent._.hypothesis,
121
        family=ent._.family,
122
    )
123
    entities.append(d)
124
125
df = pd.DataFrame.from_records(entities)
126
```
127
128
This code is complete, and should run as is.
129
130
We get the following result:
131
132
| lexical_variant | label        | negation | hypothesis | family |
133
|:----------------|:-------------|----------|------------|--------|
134
| COVID19         | covid        | False    | True       | False  |
135
| respiratoires   | respiratoire | True     | False      | False  |
136
| asthmatique     | respiratoire | False    | False      | True   |
137
138
## Conclusion
139
140
The qualifier pipes limits the number of false positives by detecting linguistic modulations such as negations or speculations.
141
Go to the [full documentation](/pipes/qualifiers) for a complete presentation of the different pipes,
142
their configuration options and validation performance.