a b/docs/utilities/connectors/omop.md
1
# OMOP Connector
2
3
We provide a connector between OMOP-formatted dataframes and spaCy documents.
4
5
## OMOP-style dataframes
6
7
Consider a corpus of just one document:
8
9
```
10
Le patient est admis pour une pneumopathie au coronavirus.
11
On lui prescrit du paracétamol.
12
```
13
14
And its OMOP-style representation, separated in two tables `note` and `note_nlp` (here with selected columns) :
15
16
`note`:
17
18
| note_id | note_text                                     | note_datetime |
19
| ------: | :-------------------------------------------- | :------------ |
20
|       0 | Le patient est admis pour une pneumopathie... | 2021-10-23    |
21
22
`note_nlp`:
23
24
| note_nlp_id | note_id | start_char | end_char | note_nlp_source_value | lexical_variant |
25
| ----------: | ------: | ---------: | -------: | :-------------------- | :-------------- |
26
|           0 |       0 |         46 |       57 | disease               | coronavirus     |
27
|           1 |       0 |         77 |       88 | drug                  | paracétamol     |
28
29
## Using the connector
30
31
The following snippet expects the tables `note` and `note_nlp` to be already defined (eg through PySpark's `toPandas()` method).
32
33
```{ .python .no-check }
34
import spacy
35
from edsnlp.connectors.omop import OmopConnector
36
37
# Instantiate a spacy pipeline
38
nlp = spacy.blank("eds")
39
40
# Instantiate the connector
41
connector = OmopConnector(nlp)
42
43
# Convert OMOP tables (note and note_nlp) to a list of documents
44
docs = connector.omop2docs(note, note_nlp)
45
doc = docs[0]
46
47
doc.ents
48
# Out: [coronavirus, paracétamol]
49
50
doc.ents[0].label_
51
# Out: 'disease'
52
53
doc.text == note.loc[0].note_text
54
# Out: True
55
```
56
57
The object `docs` now contains a list of documents that reflects the information contained in the OMOP-formatted dataframes.