Diff of /docs/data/conll.md [000000] .. [cad161]

Switch to unified view

a b/docs/data/conll.md
1
# CoNLL
2
3
??? abstract "TLDR"
4
5
    ```{ .python .no-check }
6
    import edsnlp
7
8
    stream = edsnlp.data.read_conll(path)
9
    stream = stream.map_pipeline(nlp)
10
    ```
11
12
You can easily integrate CoNLL formatted files into your project by using EDS-NLP's CoNLL reader.
13
14
There are many CoNLL formats corresponding to different shared tasks, but one of the most common is the CoNLL-U format, which is used for dependency parsing. In CoNLL files, each line corresponds to a token and contains various columns with information about the token, such as its index, form, lemma, POS tag, and dependency relation.
15
16
EDS-NLP lets you specify the name of the `columns` if they are different from the default CoNLL-U format. If the `columns` parameter is unset, the reader looks for a comment containing `# global.columns` to infer the column names. Otherwise, the columns are
17
18
```
19
ID, FORM, LEMMA, UPOS, XPOS, FEATS, HEAD, DEPREL, DEPS, MISC
20
```
21
22
A typical CoNLL file looks like this:
23
24
```{ title="sample.conllu" }
25
1   euh euh INTJ    _   _   5   discourse   _   SpaceAfter=No
26
2   ,   ,   PUNCT   _   _   1   punct   _   _
27
3   il  lui PRON    _   Gender=Masc|Number=Sing|Person=3|PronType=Prs   5   expl:subj   _   _
28
...
29
```
30
31
## Reading CoNLL files {: #edsnlp.data.conll.read_conll }
32
33
::: edsnlp.data.conll.read_conll
34
    options:
35
        heading_level: 3
36
        show_source: false
37
        show_toc: false
38
        show_bases: false