a b/guide/README.md
1
# Using medaCy: Tutorials and Workflows
2
This directory contains common workflows for using medaCy
3
4
## Table of contents
5
1. [How medaCy Works](#how-medacy-works)
6
2. [Building a medaCy Pipeline](/guide/walkthrough)
7
3. [Pre-trained Models](#utilizing-pre-trained-ner-models)
8
4. [Distributing Trained Models](#sharing-your-medacy-models)
9
5. [Interaction with spaCy](#how-medacy-uses-spacy)
10
11
### How medaCy Works
12
MedaCy leverages the text-processing power of spaCy with state-of-the-art research tools and techniques in medical text mining.
13
MedaCy consists of a set of lightning-fast pipelines that are specialized for learning specific types of medical entities and relations. A pipeline consists
14
of a stackable and interchangeable set of PipelineComponents - these are bite-sized code blocks that each overlay a feature onto the text being processed. 
15
16
#### Pipeline Components
17
PipelineComponents can be developed to utilize in custom Pipelines by interfacing the [BaseOverlayer](medacy/pipeline_components/base/base_component.py) and [BasePipeline](medacy/pipelines/base/base_pipeline.py) classes respectively. Alternatively use components already implemented in medaCy. Some more powerful components require outside software - an example is the MetaMapOverlayer which interfaces with [MetaMap](https://metamap.nlm.nih.gov/)
18
to overlay rich medical concept information onto text. Components are chained or stacked in pipelines and can themselves depend on the outputs of previous components to function. In the underlying implementation, a medaCy PipelineComponent is a wrapper over a spaCy component that includes a number of utilities specific to faciliting the training, utilization, and distribution process of medical domain text processing models.
19
20
### Utilizing Pre-trained NER models
21
To run a medaCy pre-trained model over your own data, simply install the package associated with the model by following the links below. Models officially supported by medacy all start with the prefix *medacy_model*.
22
For example, assuming you have medaCy installed:
23
24
Run:
25
26
`pip install git+https://github.com/NLPatVCU/medaCy_model_clinical_notes.git`
27
28
then the code snippet
29
30
31
```python
32
import medacy_model_clinical_notes
33
model = medacy_model_clinical_notes.load()
34
model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
35
```
36
37
will output:
38
```python
39
[
40
    ('Drug', 40, 45, 'Advil'),
41
    ('Dosage', 27, 28, '1'), 
42
    ('Form', 29, 36, 'capsule'), 
43
    ('Duration', 46, 56, 'for 5 days')
44
]
45
```
46
47
*NOTE: If you are doing bulk prediction over many files at once, it is advisable to utilize the bulk prediction functionality.*
48
49
#### List of medaCy pre-trained models
50
| Application | Dataset Trained Over | Entities |
51
| :---------: | :----------------: |:-------------:|
52
| [Clinical Notes](/guide/models/clinical_notes_model.md)| [N2C2 2018](https://n2c2.dbmi.hms.harvard.edu/) | Drug, Form, Route, ADE, Reason, Frequency, Duration, Dosage, Strength  |
53
| [EPA Systematic Reviews](/guide/models/epa_systematic_review_model.md) | [TAC SRIE 2018](https://tac.nist.gov/2018/SRIE/) | Species, Celline, Dosage, Group, etc. |
54
| [Nanomedicine Drug Labels](/guide/models/nanomedicine_drug_labels.md) | [END](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644562/) | Nanoparticle, Company, Adverse Reaction, Active Ingredient, Surface Coating, etc. |
55
56
57
### Sharing your medaCy models
58
MedaCy models can be packaged and shared with anyone (or no one!) at ease. See [this example](/guide/walkthrough/model_utilization.md) for details.
59
60
### How medaCy uses spaCy
61
[SpaCy](https://github.com/explosion/spaCy) is an open source python package built with cython that allows for lighting fast text processing. MedaCy combines spaCy's memory efficient text processing architecture with tools, ideas and principles from both machine learning and medical computational linguistics to provide a unified framework for researchers and practioners alike to advance medical text mining.