|
a |
|
b/docs/pipes/index.md |
|
|
1 |
# Pipes overview |
|
|
2 |
|
|
|
3 |
EDS-NLP provides easy-to-use pipeline components (aka pipes). |
|
|
4 |
|
|
|
5 |
## Available components |
|
|
6 |
|
|
|
7 |
<!-- --8<-- [start:components] --> |
|
|
8 |
|
|
|
9 |
=== "Core" |
|
|
10 |
|
|
|
11 |
See the [Core components overview](/pipes/misc/) for more information. |
|
|
12 |
|
|
|
13 |
--8<-- "docs/pipes/core/index.md:components" |
|
|
14 |
|
|
|
15 |
=== "Qualifiers" |
|
|
16 |
|
|
|
17 |
See the [Qualifiers overview](/pipes/qualifiers/) for more information. |
|
|
18 |
|
|
|
19 |
--8<-- "docs/pipes/qualifiers/index.md:components" |
|
|
20 |
|
|
|
21 |
=== "Miscellaneous" |
|
|
22 |
|
|
|
23 |
See the [Miscellaneous components overview](/pipes/misc/) for more information. |
|
|
24 |
|
|
|
25 |
--8<-- "docs/pipes/misc/index.md:components" |
|
|
26 |
|
|
|
27 |
=== "NER" |
|
|
28 |
|
|
|
29 |
See the [NER overview](/pipes/ner/) for more information. |
|
|
30 |
|
|
|
31 |
--8<-- "docs/pipes/ner/index.md:components" |
|
|
32 |
|
|
|
33 |
=== "Trainable" |
|
|
34 |
|
|
|
35 |
See the [Trainable components overview](/pipes/trainable/overview/) for more information. |
|
|
36 |
|
|
|
37 |
--8<-- "docs/pipes/trainable/index.md:components" |
|
|
38 |
|
|
|
39 |
<!-- --8<-- [end:components] --> |
|
|
40 |
|
|
|
41 |
You can add them to your pipeline by simply calling `add_pipe`, for instance: |
|
|
42 |
|
|
|
43 |
```python |
|
|
44 |
import edsnlp, edsnlp.pipes as eds |
|
|
45 |
|
|
|
46 |
nlp = edsnlp.blank("eds") |
|
|
47 |
nlp.add_pipe(eds.normalizer()) |
|
|
48 |
nlp.add_pipe(eds.sentences()) |
|
|
49 |
nlp.add_pipe(eds.tnm()) |
|
|
50 |
``` |
|
|
51 |
|
|
|
52 |
## Basic architecture |
|
|
53 |
|
|
|
54 |
Most components provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library: |
|
|
55 |
|
|
|
56 |
1. Implement a normaliser (see `eds.normalizer`) |
|
|
57 |
2. Add an entity recognition component (eg the simple but powerful `eds.matcher` component) |
|
|
58 |
3. Add zero or more entity qualification components, such as `eds.negation`, `eds.family` or `eds.hypothesis`. These qualifiers typically help detect false-positives. |
|
|
59 |
|
|
|
60 |
## Extraction components |
|
|
61 |
|
|
|
62 |
Extraction components (matchers, the date detector or NER components, for instance) keep their results to the `doc.ents` and `doc.spans` attributes directly. |
|
|
63 |
|
|
|
64 |
By default, some components do not write their output to `doc.ents`, such as the `eds.sections` matcher. This is mainly due to the fact that, since `doc.ents` cannot contain overlapping entities, we [filter spans][edsnlp.utils.filter.filter_spans] and keep the largest one by default. Since sections usually cover large spans of text, storing them in ents would remove every other overlapping entities. |
|
|
65 |
|
|
|
66 |
## Entity tagging |
|
|
67 |
|
|
|
68 |
Moreover, most components declare [extensions](https://spacy.io/usage/processing-components#custom-components-attributes), on the `Doc`, `Span` and/or `Token` objects. |
|
|
69 |
|
|
|
70 |
These extensions are especially useful for qualifier components, but can also be used by other components to persist relevant information. For instance, the `eds.dates` component declares a `span._.date` extension to store a normalised version of each detected date. |