A natural language medical domain parsing library. This library:
See the full documentation.
The API reference is also
available.
Install the library using a Python package manager such as pip
:
pip3 install zensols.mednlp
To use the cui2vec
to functionality, the embeddings must be manually
downloaded. Start with this commands:
mkdir -p ~/.cache/zensols/mednlp
wget -O ~/.cache/zensols/mednlp/cui2vec.zip https://figshare.com/ndownloader/files/10959626?private_link=00d69861786cd0156d81
If the download fails or the file is not a zip file (rather an HTML error
message text), then you will need to download the file manually by
browsing to the file, and
then moving it to ~/.cache/zensols/mednlp/cui2vec.zip
.
To parse text, create features, and extract clinical concept identifiers:
>>> from zensols.mednlp import ApplicationFactory
>>> doc_parser = ApplicationFactory.get_doc_parser()
>>> doc = doc_parser('John was diagnosed with kidney failure')
>>> for tok in doc.tokens: print(tok.norm, tok.pos_, tok.tag_, tok.cui_, tok.detected_name_)
John PROPN NNP -<N>- -<N>-
was AUX VBD -<N>- -<N>-
diagnosed VERB VBN -<N>- -<N>-
with ADP IN -<N>- -<N>-
kidney NOUN NN C0035078 kidney~failure
failure NOUN NN C0035078 kidney~failure
>>> print(doc.entities)
(<John>, <kidney failure>)
See the full example, and for other
functionality, see the examples.
By default, this library uses the small MedCAT model used for
tutorials, and is not
sufficient for any serious project. To get the UMLS trained model,the [MedCAT
UMLS request form] from be filled out (see the [MedCAT] repository).
After you obtain access and download the new model, add the following to
~/.mednlprc
with the following:
[medcat_status_resource]
url = file:///location/to/the/downloaded/file/umls_sm_wstatus_2021_oct.zip'
This API utilizes the following frameworks:
If you use this project in your research please use the following BibTeX entry:
@inproceedings{landes-etal-2023-deepzensols,
title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
author = "Landes, Paul and
Di Eugenio, Barbara and
Caragea, Cornelia",
editor = "Tan, Liling and
Milajevs, Dmitrijs and
Chauhan, Geeticka and
Gwinnup, Jeremy and
Rippeth, Elijah",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlposs-1.16",
pages = "141--146"
}
Please star the project and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.
An extensive changelog is available here.
Copyright (c) 2021 - 2025 Paul Landes