EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
At its core, it is a collection of components or pipes, either rule-based functions or
deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use spaCy to represent documents and their annotations, and Pytorch as a deep-learning backend for trainable components.
EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
Check out our interactive demo !
You can install EDS-NLP via pip
. We recommend pinning the library version in your projects, or use a strict package manager like Poetry.
pip install edsnlp==0.17.0
or if you want to use the trainable components (using pytorch)
pip install "edsnlp[ml]==0.17.0"
Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds") # (1)
terms = dict(
covid=["covid", "coronavirus"], # (2)
)
# Sentencizer component, needed for negation detection
nlp.add_pipe(eds.sentences()) # (3)
# Matcher component
nlp.add_pipe(eds.matcher(terms=terms)) # (4)
# Negation detection
nlp.add_pipe(eds.negation())
# Process your text in one call !
doc = nlp("Le patient n'est pas atteint de covid")
doc.ents # (5)
# Out: (covid,)
doc.ents[0]._.negation # (6)
# Out: True
nlp.add_pipe
method.Doc.ents
attribute.eds.negation
component has adds a negation
custom attribute.This example is complete, it should run as-is.
To learn more about EDS-NLP, we have prepared a series of tutorials that should cover the main features of the library.
--8<-- "docs/tutorials/index.md:tutorials"
--8<-- "docs/pipes/index.md:components"
The performances of an extraction pipeline may depend on the population and documents that are considered.
We welcome contributions ! Fork the project and propose a pull request.
Take a look at the dedicated page for detail.
If you use EDS-NLP, please cite us as below.
@misc{edsnlp,
author = {Wajsburt, Perceval and Petit-Jean, Thomas and Dura, Basile and Cohen, Ariel and Jean, Charline and Bey, Romain},
doi = {10.5281/zenodo.6424993},
title = {EDS-NLP: efficient information extraction from French clinical notes},
url = {https://aphp.github.io/edsnlp}
}