|
a |
|
b/docs/utilities/connectors/omop.md |
|
|
1 |
# OMOP Connector |
|
|
2 |
|
|
|
3 |
We provide a connector between OMOP-formatted dataframes and spaCy documents. |
|
|
4 |
|
|
|
5 |
## OMOP-style dataframes |
|
|
6 |
|
|
|
7 |
Consider a corpus of just one document: |
|
|
8 |
|
|
|
9 |
``` |
|
|
10 |
Le patient est admis pour une pneumopathie au coronavirus. |
|
|
11 |
On lui prescrit du paracétamol. |
|
|
12 |
``` |
|
|
13 |
|
|
|
14 |
And its OMOP-style representation, separated in two tables `note` and `note_nlp` (here with selected columns) : |
|
|
15 |
|
|
|
16 |
`note`: |
|
|
17 |
|
|
|
18 |
| note_id | note_text | note_datetime | |
|
|
19 |
| ------: | :-------------------------------------------- | :------------ | |
|
|
20 |
| 0 | Le patient est admis pour une pneumopathie... | 2021-10-23 | |
|
|
21 |
|
|
|
22 |
`note_nlp`: |
|
|
23 |
|
|
|
24 |
| note_nlp_id | note_id | start_char | end_char | note_nlp_source_value | lexical_variant | |
|
|
25 |
| ----------: | ------: | ---------: | -------: | :-------------------- | :-------------- | |
|
|
26 |
| 0 | 0 | 46 | 57 | disease | coronavirus | |
|
|
27 |
| 1 | 0 | 77 | 88 | drug | paracétamol | |
|
|
28 |
|
|
|
29 |
## Using the connector |
|
|
30 |
|
|
|
31 |
The following snippet expects the tables `note` and `note_nlp` to be already defined (eg through PySpark's `toPandas()` method). |
|
|
32 |
|
|
|
33 |
```{ .python .no-check } |
|
|
34 |
import spacy |
|
|
35 |
from edsnlp.connectors.omop import OmopConnector |
|
|
36 |
|
|
|
37 |
# Instantiate a spacy pipeline |
|
|
38 |
nlp = spacy.blank("eds") |
|
|
39 |
|
|
|
40 |
# Instantiate the connector |
|
|
41 |
connector = OmopConnector(nlp) |
|
|
42 |
|
|
|
43 |
# Convert OMOP tables (note and note_nlp) to a list of documents |
|
|
44 |
docs = connector.omop2docs(note, note_nlp) |
|
|
45 |
doc = docs[0] |
|
|
46 |
|
|
|
47 |
doc.ents |
|
|
48 |
# Out: [coronavirus, paracétamol] |
|
|
49 |
|
|
|
50 |
doc.ents[0].label_ |
|
|
51 |
# Out: 'disease' |
|
|
52 |
|
|
|
53 |
doc.text == note.loc[0].note_text |
|
|
54 |
# Out: True |
|
|
55 |
``` |
|
|
56 |
|
|
|
57 |
The object `docs` now contains a list of documents that reflects the information contained in the OMOP-formatted dataframes. |