a b/docs/data/index.md
1
# Data connectors
2
3
We provide various connectors to read and write data from and to different formats.
4
5
Reading from a given path or object takes the following form:
6
7
```{ .python .no-check }
8
import edsnlp
9
10
docs = edsnlp.data.read_{format}(  # or .from_{format} for objects
11
    # Path to the file or directory
12
    "path/to/file",
13
    # How to convert JSON-like samples to Doc objects
14
    converter=predefined schema or function,
15
)
16
```
17
18
Writing to given path or object takes the following form:
19
20
```{ .python .no-check }
21
import edsnlp
22
23
edsnlp.data.write_{format}(  # or .to_{format} for objects
24
    # Path to the file or directory
25
    "path/to/file",
26
    # Iterable of Doc objects
27
    docs,
28
    # How to convert Doc objects to JSON-like samples
29
    converter=predefined schema or function,
30
)
31
```
32
33
The overall process is illustrated in the following diagram:
34
35
![Data connectors overview](./overview.png)
36
37
At the moment, we support the following data sources:
38
39
| Source                        | Description                |
40
|:------------------------------|:---------------------------|
41
| [JSON](./json)                | `.json` and `.jsonl` files |
42
| [Standoff & BRAT](./standoff) | `.ann` and `.txt` files    |
43
| [Pandas](./pandas)            | Pandas DataFrame objects   |
44
| [Polars](./polars)            | Polars DataFrame objects   |
45
| [Spark](./spark)              | Spark DataFrame objects    |
46
47
and the following schemas:
48
49
| Schema                                                                     | Snippet                |
50
|:---------------------------------------------------------------------------|------------------------|
51
| [Custom](./converters/#custom)                                             | `converter=custom_fn`  |
52
| [OMOP](./converters/#omop)                                                 | `converter="omop"`     |
53
| [Standoff](./converters/#standoff)                                         | `converter="standoff"` |
54
| [Ents](./converters/#edsnlp.data.converters.EntsDoc2DictConverter)         | `converter="ents"`     |