Diff of /docs/pipes/index.md [000000] .. [cad161]

Switch to unified view

a b/docs/pipes/index.md
1
# Pipes overview
2
3
EDS-NLP provides easy-to-use pipeline components (aka pipes).
4
5
## Available components
6
7
<!-- --8<-- [start:components] -->
8
9
=== "Core"
10
11
    See the [Core components overview](/pipes/misc/) for more information.
12
13
    --8<-- "docs/pipes/core/index.md:components"
14
15
=== "Qualifiers"
16
17
    See the [Qualifiers overview](/pipes/qualifiers/) for more information.
18
19
    --8<-- "docs/pipes/qualifiers/index.md:components"
20
21
=== "Miscellaneous"
22
23
    See the [Miscellaneous components overview](/pipes/misc/) for more information.
24
25
    --8<-- "docs/pipes/misc/index.md:components"
26
27
=== "NER"
28
29
    See the [NER overview](/pipes/ner/) for more information.
30
31
    --8<-- "docs/pipes/ner/index.md:components"
32
33
=== "Trainable"
34
35
    See the [Trainable components overview](/pipes/trainable/overview/) for more information.
36
37
    --8<-- "docs/pipes/trainable/index.md:components"
38
39
<!-- --8<-- [end:components] -->
40
41
You can add them to your pipeline by simply calling `add_pipe`, for instance:
42
43
```python
44
import edsnlp, edsnlp.pipes as eds
45
46
nlp = edsnlp.blank("eds")
47
nlp.add_pipe(eds.normalizer())
48
nlp.add_pipe(eds.sentences())
49
nlp.add_pipe(eds.tnm())
50
```
51
52
## Basic architecture
53
54
Most components provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:
55
56
1. Implement a normaliser (see `eds.normalizer`)
57
2. Add an entity recognition component (eg the simple but powerful `eds.matcher` component)
58
3. Add zero or more entity qualification components, such as `eds.negation`, `eds.family` or `eds.hypothesis`. These qualifiers typically help detect false-positives.
59
60
## Extraction components
61
62
Extraction components (matchers, the date detector or NER components, for instance) keep their results to the `doc.ents` and `doc.spans` attributes directly.
63
64
By default, some components do not write their output to `doc.ents`, such as the `eds.sections` matcher. This is mainly due to the fact that, since `doc.ents` cannot contain overlapping entities, we [filter spans][edsnlp.utils.filter.filter_spans] and keep the largest one by default. Since sections usually cover large spans of text, storing them in ents would remove every other overlapping entities.
65
66
## Entity tagging
67
68
Moreover, most components declare [extensions](https://spacy.io/usage/processing-components#custom-components-attributes), on the `Doc`, `Span` and/or `Token` objects.
69
70
These extensions are especially useful for qualifier components, but can also be used by other components to persist relevant information. For instance, the `eds.dates` component declares a `span._.date` extension to store a normalised version of each detected date.