a b/docs/data/pandas.md
1
# Pandas
2
3
??? abstract "TLDR"
4
5
    ```{ .python .no-check }
6
    import edsnlp
7
8
    stream = edsnlp.data.from_pandas(df, converter="omop")
9
    stream = stream.map_pipeline(nlp)
10
    res = stream.to_pandas(converter="omop")
11
    # or equivalently
12
    edsnlp.data.to_pandas(stream, converter="omop")
13
    ```
14
15
We provide methods to read and write documents (raw or annotated) from and to Pandas DataFrames.
16
17
As an example, imagine that we have the following OMOP dataframe (we'll name it `note_df`)
18
19
| note_id | note_text                                     | note_datetime |
20
|--------:|:----------------------------------------------|:--------------|
21
|       0 | Le patient est admis pour une pneumopathie... | 2021-10-23    |
22
23
## Reading from a Pandas Dataframe {: #edsnlp.data.pandas.from_pandas }
24
25
::: edsnlp.data.pandas.from_pandas
26
    options:
27
        heading_level: 3
28
        show_source: false
29
        show_toc: false
30
        show_bases: false
31
32
33
## Writing to a Pandas DataFrame {: #edsnlp.data.pandas.to_pandas }
34
35
::: edsnlp.data.pandas.to_pandas
36
    options:
37
        heading_level: 3
38
        show_source: false
39
        show_toc: false
40
        show_bases: false
41
42
43
## Importing entities from a Pandas DataFrame
44
45
If you have a dataframe with entities (e.g., `note_nlp` in OMOP), you must join it with the dataframe containing the raw text (e.g., `note` in OMOP) to obtain a single dataframe with the entities next to the raw text. For instance, the second `note_nlp` dataframe that we will name `note_nlp_df`.
46
47
| note_nlp_id | note_id | start_char | end_char | note_nlp_source_value | lexical_variant |
48
|------------:|--------:|-----------:|---------:|:----------------------|:----------------|
49
|           0 |       0 |         46 |       57 | disease               | coronavirus     |
50
|           1 |       0 |         77 |       88 | drug                  | paracétamol     |
51
|         ... |     ... |        ... |      ... | ...                   | ...             |
52
53
```{ .python .no-check }
54
df = (
55
    note_df
56
    .set_index("note_id")
57
    .join(
58
        note_nlp_df
59
        .set_index('note_id')
60
        .groupby(level=0)
61
        .apply(pd.DataFrame.to_dict, orient='records')
62
        .rename("entities")
63
    )
64
).reset_index()
65
```
66
67
| note_id | note_text     | note_datetime |                                     entities |
68
|--------:|---------------|---------------|---------------------------------------------:|
69
|       0 | Le patient... | 2021-10-23    | `[{"note_nlp_id": 0, "start_char": 46, ...]` |
70
|     ... | ...           | ...           |                                          ... |