Diff of /docs/data/pandas.md [000000] .. [cad161]

Switch to side-by-side view

--- a
+++ b/docs/data/pandas.md
@@ -0,0 +1,70 @@
+# Pandas
+
+??? abstract "TLDR"
+
+    ```{ .python .no-check }
+    import edsnlp
+
+    stream = edsnlp.data.from_pandas(df, converter="omop")
+    stream = stream.map_pipeline(nlp)
+    res = stream.to_pandas(converter="omop")
+    # or equivalently
+    edsnlp.data.to_pandas(stream, converter="omop")
+    ```
+
+We provide methods to read and write documents (raw or annotated) from and to Pandas DataFrames.
+
+As an example, imagine that we have the following OMOP dataframe (we'll name it `note_df`)
+
+| note_id | note_text                                     | note_datetime |
+|--------:|:----------------------------------------------|:--------------|
+|       0 | Le patient est admis pour une pneumopathie... | 2021-10-23    |
+
+## Reading from a Pandas Dataframe {: #edsnlp.data.pandas.from_pandas }
+
+::: edsnlp.data.pandas.from_pandas
+    options:
+        heading_level: 3
+        show_source: false
+        show_toc: false
+        show_bases: false
+
+
+## Writing to a Pandas DataFrame {: #edsnlp.data.pandas.to_pandas }
+
+::: edsnlp.data.pandas.to_pandas
+    options:
+        heading_level: 3
+        show_source: false
+        show_toc: false
+        show_bases: false
+
+
+## Importing entities from a Pandas DataFrame
+
+If you have a dataframe with entities (e.g., `note_nlp` in OMOP), you must join it with the dataframe containing the raw text (e.g., `note` in OMOP) to obtain a single dataframe with the entities next to the raw text. For instance, the second `note_nlp` dataframe that we will name `note_nlp_df`.
+
+| note_nlp_id | note_id | start_char | end_char | note_nlp_source_value | lexical_variant |
+|------------:|--------:|-----------:|---------:|:----------------------|:----------------|
+|           0 |       0 |         46 |       57 | disease               | coronavirus     |
+|           1 |       0 |         77 |       88 | drug                  | paracétamol     |
+|         ... |     ... |        ... |      ... | ...                   | ...             |
+
+```{ .python .no-check }
+df = (
+    note_df
+    .set_index("note_id")
+    .join(
+        note_nlp_df
+        .set_index('note_id')
+        .groupby(level=0)
+        .apply(pd.DataFrame.to_dict, orient='records')
+        .rename("entities")
+    )
+).reset_index()
+```
+
+| note_id | note_text     | note_datetime |                                     entities |
+|--------:|---------------|---------------|---------------------------------------------:|
+|       0 | Le patient... | 2021-10-23    | `[{"note_nlp_id": 0, "start_char": 46, ...]` |
+|     ... | ...           | ...           |                                          ... |