|
a |
|
b/docs/data/index.md |
|
|
1 |
# Data connectors |
|
|
2 |
|
|
|
3 |
We provide various connectors to read and write data from and to different formats. |
|
|
4 |
|
|
|
5 |
Reading from a given path or object takes the following form: |
|
|
6 |
|
|
|
7 |
```{ .python .no-check } |
|
|
8 |
import edsnlp |
|
|
9 |
|
|
|
10 |
docs = edsnlp.data.read_{format}( # or .from_{format} for objects |
|
|
11 |
# Path to the file or directory |
|
|
12 |
"path/to/file", |
|
|
13 |
# How to convert JSON-like samples to Doc objects |
|
|
14 |
converter=predefined schema or function, |
|
|
15 |
) |
|
|
16 |
``` |
|
|
17 |
|
|
|
18 |
Writing to given path or object takes the following form: |
|
|
19 |
|
|
|
20 |
```{ .python .no-check } |
|
|
21 |
import edsnlp |
|
|
22 |
|
|
|
23 |
edsnlp.data.write_{format}( # or .to_{format} for objects |
|
|
24 |
# Path to the file or directory |
|
|
25 |
"path/to/file", |
|
|
26 |
# Iterable of Doc objects |
|
|
27 |
docs, |
|
|
28 |
# How to convert Doc objects to JSON-like samples |
|
|
29 |
converter=predefined schema or function, |
|
|
30 |
) |
|
|
31 |
``` |
|
|
32 |
|
|
|
33 |
The overall process is illustrated in the following diagram: |
|
|
34 |
|
|
|
35 |
 |
|
|
36 |
|
|
|
37 |
At the moment, we support the following data sources: |
|
|
38 |
|
|
|
39 |
| Source | Description | |
|
|
40 |
|:------------------------------|:---------------------------| |
|
|
41 |
| [JSON](./json) | `.json` and `.jsonl` files | |
|
|
42 |
| [Standoff & BRAT](./standoff) | `.ann` and `.txt` files | |
|
|
43 |
| [Pandas](./pandas) | Pandas DataFrame objects | |
|
|
44 |
| [Polars](./polars) | Polars DataFrame objects | |
|
|
45 |
| [Spark](./spark) | Spark DataFrame objects | |
|
|
46 |
|
|
|
47 |
and the following schemas: |
|
|
48 |
|
|
|
49 |
| Schema | Snippet | |
|
|
50 |
|:---------------------------------------------------------------------------|------------------------| |
|
|
51 |
| [Custom](./converters/#custom) | `converter=custom_fn` | |
|
|
52 |
| [OMOP](./converters/#omop) | `converter="omop"` | |
|
|
53 |
| [Standoff](./converters/#standoff) | `converter="standoff"` | |
|
|
54 |
| [Ents](./converters/#edsnlp.data.converters.EntsDoc2DictConverter) | `converter="ents"` | |