[cad161]: / docs / data / pandas.md

Download this file

71 lines (53 with data), 2.6 kB

Pandas

??? abstract "TLDR"

```{ .python .no-check }
import edsnlp

stream = edsnlp.data.from_pandas(df, converter="omop")
stream = stream.map_pipeline(nlp)
res = stream.to_pandas(converter="omop")
# or equivalently
edsnlp.data.to_pandas(stream, converter="omop")
```

We provide methods to read and write documents (raw or annotated) from and to Pandas DataFrames.

As an example, imagine that we have the following OMOP dataframe (we'll name it note_df)

note_id note_text note_datetime
0 Le patient est admis pour une pneumopathie... 2021-10-23

Reading from a Pandas Dataframe {: #edsnlp.data.pandas.from_pandas }

::: edsnlp.data.pandas.from_pandas
options:
heading_level: 3
show_source: false
show_toc: false
show_bases: false

Writing to a Pandas DataFrame {: #edsnlp.data.pandas.to_pandas }

::: edsnlp.data.pandas.to_pandas
options:
heading_level: 3
show_source: false
show_toc: false
show_bases: false

Importing entities from a Pandas DataFrame

If you have a dataframe with entities (e.g., note_nlp in OMOP), you must join it with the dataframe containing the raw text (e.g., note in OMOP) to obtain a single dataframe with the entities next to the raw text. For instance, the second note_nlp dataframe that we will name note_nlp_df.

note_nlp_id note_id start_char end_char note_nlp_source_value lexical_variant
0 0 46 57 disease coronavirus
1 0 77 88 drug paracétamol
... ... ... ... ... ...
df = (
    note_df
    .set_index("note_id")
    .join(
        note_nlp_df
        .set_index('note_id')
        .groupby(level=0)
        .apply(pd.DataFrame.to_dict, orient='records')
        .rename("entities")
    )
).reset_index()
note_id note_text note_datetime entities
0 Le patient... 2021-10-23 [{"note_nlp_id": 0, "start_char": 46, ...]
... ... ... ...