|
a |
|
b/README.md |
|
|
1 |
 |
|
|
2 |
[](https://aphp.github.io/edsnlp/latest/) |
|
|
3 |
[](https://pypi.org/project/edsnlp/) |
|
|
4 |
[](https://aphp.github.io/edsnlp/demo/) |
|
|
5 |
[](https://raw.githubusercontent.com/aphp/edsnlp/coverage/coverage.txt) |
|
|
6 |
[](https://zenodo.org/badge/latestdoi/467585436) |
|
|
7 |
|
|
|
8 |
|
|
|
9 |
EDS-NLP |
|
|
10 |
======= |
|
|
11 |
|
|
|
12 |
EDS-NLP is a collaborative NLP framework that aims primarily at extracting information from French clinical notes. |
|
|
13 |
At its core, it is a collection of components or pipes, either rule-based functions or |
|
|
14 |
deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components. |
|
|
15 |
|
|
|
16 |
EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's components, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities. |
|
|
17 |
|
|
|
18 |
Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) ! |
|
|
19 |
|
|
|
20 |
## Features |
|
|
21 |
|
|
|
22 |
- [Rule-based components](https://aphp.github.io/edsnlp/latest/pipes/) for French clinical notes |
|
|
23 |
- [Trainable components](https://aphp.github.io/edsnlp/latest/pipes/trainable): NER, Span classification |
|
|
24 |
- Support for multitask deep-learning models with [weights sharing](https://aphp.github.io/edsnlp/latest/concepts/torch-component/#sharing-subcomponents) |
|
|
25 |
- [Fast inference](https://aphp.github.io/edsnlp/latest/concepts/inference/), with multi-GPU support out of the box |
|
|
26 |
- Easy to use, with a spaCy-like API |
|
|
27 |
- Compatible with rule-based spaCy components |
|
|
28 |
- Support for various io formats like [BRAT](https://aphp.github.io/edsnlp/latest/data/standoff/), [JSON](https://aphp.github.io/edsnlp/latest/data/json/), [Parquet](https://aphp.github.io/edsnlp/latest/data/parquet/), [Pandas](https://aphp.github.io/edsnlp/latest/data/pandas/) or [Spark](https://aphp.github.io/edsnlp/latest/data/spark/) |
|
|
29 |
|
|
|
30 |
## Quick start |
|
|
31 |
|
|
|
32 |
### Installation |
|
|
33 |
|
|
|
34 |
You can install EDS-NLP via `pip`. We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/). |
|
|
35 |
|
|
|
36 |
```shell |
|
|
37 |
pip install edsnlp==0.17.0 |
|
|
38 |
``` |
|
|
39 |
|
|
|
40 |
or if you want to use the trainable components (using pytorch) |
|
|
41 |
|
|
|
42 |
```shell |
|
|
43 |
pip install "edsnlp[ml]==0.17.0" |
|
|
44 |
``` |
|
|
45 |
|
|
|
46 |
### A first pipeline |
|
|
47 |
|
|
|
48 |
Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated. |
|
|
49 |
|
|
|
50 |
```python |
|
|
51 |
import edsnlp, edsnlp.pipes as eds |
|
|
52 |
|
|
|
53 |
nlp = edsnlp.blank("eds") |
|
|
54 |
|
|
|
55 |
terms = dict( |
|
|
56 |
covid=["covid", "coronavirus"], |
|
|
57 |
) |
|
|
58 |
|
|
|
59 |
# Split the documents into sentences, this isneeded for negation detection |
|
|
60 |
nlp.add_pipe(eds.sentences()) |
|
|
61 |
# Matcher component |
|
|
62 |
nlp.add_pipe(eds.matcher(terms=terms)) |
|
|
63 |
# Negation detection (we also support spacy-like API !) |
|
|
64 |
nlp.add_pipe("eds.negation") |
|
|
65 |
|
|
|
66 |
# Process your text in one call ! |
|
|
67 |
doc = nlp("Le patient n'est pas atteint de covid") |
|
|
68 |
|
|
|
69 |
doc.ents |
|
|
70 |
# Out: (covid,) |
|
|
71 |
|
|
|
72 |
doc.ents[0]._.negation |
|
|
73 |
# Out: True |
|
|
74 |
``` |
|
|
75 |
|
|
|
76 |
## Documentation & Tutorials |
|
|
77 |
|
|
|
78 |
Go to the [documentation](https://aphp.github.io/edsnlp) for more information. |
|
|
79 |
|
|
|
80 |
## Disclaimer |
|
|
81 |
|
|
|
82 |
The performances of an extraction pipeline may depend on the population and documents that are considered. |
|
|
83 |
|
|
|
84 |
## Contributing to EDS-NLP |
|
|
85 |
|
|
|
86 |
We welcome contributions ! Fork the project and propose a pull request. |
|
|
87 |
Take a look at the [dedicated page](https://aphp.github.io/edsnlp/latest/contributing/) for detail. |
|
|
88 |
|
|
|
89 |
## Citation |
|
|
90 |
|
|
|
91 |
If you use EDS-NLP, please cite us as below. |
|
|
92 |
|
|
|
93 |
```bibtex |
|
|
94 |
@misc{edsnlp, |
|
|
95 |
author = {Wajsburt, Perceval and Petit-Jean, Thomas and Dura, Basile and Cohen, Ariel and Jean, Charline and Bey, Romain}, |
|
|
96 |
doi = {10.5281/zenodo.6424993}, |
|
|
97 |
title = {EDS-NLP: efficient information extraction from French clinical notes}, |
|
|
98 |
url = {https://aphp.github.io/edsnlp} |
|
|
99 |
} |
|
|
100 |
``` |
|
|
101 |
|
|
|
102 |
## Acknowledgement |
|
|
103 |
|
|
|
104 |
We would like to thank [Assistance Publique – Hôpitaux de Paris](https://www.aphp.fr/), [AP-HP Foundation](https://fondationrechercheaphp.fr/) and [Inria](https://www.inria.fr) for funding this project. |