Most pipes provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:
eds.normalizer
)eds.matcher
)eds.negation
, eds.family
or eds.hypothesis
. These qualifiers typically help detect false-positives.Since the basic usage of EDS-NLP components is to qualify entities, most pipes can function in two modes:
doc.ents
) are processed.on_ents_only
parameter to False
.The possibility to do full-text annotation implies that one could use the pipes the other way around, eg detecting all negations once and for all in an ETL phase, and reusing the results consequently. However, this is not the intended use of the library, which aims to help researchers downstream as a standalone application.
Depending on their purpose (entity extraction, qualification, etc), EDS-NLP pipes write their results to Doc.ents
, Doc.spans
or in a custom attribute.
Extraction pipes (matchers, the date detector or NER pipes, for instance) keep their results to the Doc.ents
attribute directly.
Note that spaCy prohibits overlapping entities within the Doc.ents
attribute. To circumvent this limitation, we [filter spans][edsnlp.utils.filter.filter_spans], and keep all discarded entities within the discarded
key of the Doc.spans
attribute.
Some pipes write their output to the Doc.spans
dictionary. We enforce the following doctrine:
eds.matcher
component), said entities are stashed in the Doc.ents
attribute.eds.sections
or eds.dates
component), it will be stashed in a specific key within the Doc.spans
attribute.Moreover, most pipes declare spaCy extensions, on the Doc
, Span
and/or Token
objects.
These extensions are especially useful for qualifier pipes, but can also be used by other pipes to persist relevant information. For instance, the eds.dates
pipeline component:
#!python Doc.spans["dates"]
#!python Span._.date