|
a |
|
b/tutorials/1_Ontology.ipynb |
|
|
1 |
{ |
|
|
2 |
"cells": [ |
|
|
3 |
{ |
|
|
4 |
"cell_type": "markdown", |
|
|
5 |
"metadata": {}, |
|
|
6 |
"source": [ |
|
|
7 |
"# FEMR Ontology support\n", |
|
|
8 |
"\n", |
|
|
9 |
"FEMR provides support for querying ontologies using the OMOP Vocabulary. \n", |
|
|
10 |
"\n", |
|
|
11 |
"This enables easier definition of labeling functions as well as better feature generation." |
|
|
12 |
] |
|
|
13 |
}, |
|
|
14 |
{ |
|
|
15 |
"cell_type": "markdown", |
|
|
16 |
"metadata": {}, |
|
|
17 |
"source": [ |
|
|
18 |
"# Downloading the OMOP Vocabulary\n", |
|
|
19 |
"\n", |
|
|
20 |
"The OMOP Vocabulary can be downloaded for free from the [OHDSI ATHENA website.](https://athena.ohdsi.org/)" |
|
|
21 |
] |
|
|
22 |
}, |
|
|
23 |
{ |
|
|
24 |
"cell_type": "markdown", |
|
|
25 |
"metadata": {}, |
|
|
26 |
"source": [ |
|
|
27 |
"# Processing the OMOP Vocabulary\n", |
|
|
28 |
"\n", |
|
|
29 |
"femr.ontology.Ontology allows you to process, and then use the OMOP Vocabulary, optionally combining it with [code metadata from MEDS](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L94).\n", |
|
|
30 |
"\n", |
|
|
31 |
"```python \n", |
|
|
32 |
"ontology = femr.ontology.Ontology(path_to_athena, code_metadata)\n", |
|
|
33 |
"```" |
|
|
34 |
] |
|
|
35 |
}, |
|
|
36 |
{ |
|
|
37 |
"cell_type": "markdown", |
|
|
38 |
"metadata": {}, |
|
|
39 |
"source": [ |
|
|
40 |
"# Working with an Ontology object\n", |
|
|
41 |
"\n", |
|
|
42 |
"The following code samples illustrate the main ways to use a vocabulary object" |
|
|
43 |
] |
|
|
44 |
}, |
|
|
45 |
{ |
|
|
46 |
"cell_type": "code", |
|
|
47 |
"execution_count": 1, |
|
|
48 |
"metadata": {}, |
|
|
49 |
"outputs": [ |
|
|
50 |
{ |
|
|
51 |
"name": "stderr", |
|
|
52 |
"output_type": "stream", |
|
|
53 |
"text": [ |
|
|
54 |
"/home/esteinberg/miniconda3/envs/debug_document_femr/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", |
|
|
55 |
" from .autonotebook import tqdm as notebook_tqdm\n" |
|
|
56 |
] |
|
|
57 |
}, |
|
|
58 |
{ |
|
|
59 |
"name": "stdout", |
|
|
60 |
"output_type": "stream", |
|
|
61 |
"text": [ |
|
|
62 |
"Loaded ontology\n" |
|
|
63 |
] |
|
|
64 |
} |
|
|
65 |
], |
|
|
66 |
"source": [ |
|
|
67 |
"import pickle\n", |
|
|
68 |
"\n", |
|
|
69 |
"# You can load / save ontology objects with pickle\n", |
|
|
70 |
"\n", |
|
|
71 |
"with open('input/meds/ontology.pkl', 'rb') as f:\n", |
|
|
72 |
" ontology = pickle.load(f)\n", |
|
|
73 |
"\n", |
|
|
74 |
"print(\"Loaded ontology\")" |
|
|
75 |
] |
|
|
76 |
}, |
|
|
77 |
{ |
|
|
78 |
"cell_type": "code", |
|
|
79 |
"execution_count": 2, |
|
|
80 |
"metadata": {}, |
|
|
81 |
"outputs": [ |
|
|
82 |
{ |
|
|
83 |
"name": "stderr", |
|
|
84 |
"output_type": "stream", |
|
|
85 |
"text": [ |
|
|
86 |
"Generating train split: 200 examples [00:00, 34972.93 examples/s]\n", |
|
|
87 |
"Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 3282.29 examples/s]\n" |
|
|
88 |
] |
|
|
89 |
} |
|
|
90 |
], |
|
|
91 |
"source": [ |
|
|
92 |
"# Ontology datasets downloaded by Athena tend to be very large as they contain many codes, including several that are no longer used.\n", |
|
|
93 |
"# We therefore provide a function to prune ontologies to a particular dataset of interest.\n", |
|
|
94 |
"# This makes it much cheaper to store and use an ontology object, both in terms of disk space and RAM\n", |
|
|
95 |
"\n", |
|
|
96 |
"import datasets\n", |
|
|
97 |
"dataset = datasets.Dataset.from_parquet(\"input/meds/data/*\")\n", |
|
|
98 |
"\n", |
|
|
99 |
"ontology.prune_to_dataset(dataset)" |
|
|
100 |
] |
|
|
101 |
}, |
|
|
102 |
{ |
|
|
103 |
"cell_type": "code", |
|
|
104 |
"execution_count": 3, |
|
|
105 |
"metadata": {}, |
|
|
106 |
"outputs": [ |
|
|
107 |
{ |
|
|
108 |
"name": "stdout", |
|
|
109 |
"output_type": "stream", |
|
|
110 |
"text": [ |
|
|
111 |
"Description DRUGS FOR PEPTIC ULCER AND GASTRO-OESOPHAGEAL REFLUX DISEASE (GORD)\n", |
|
|
112 |
"Parents {'ATC/A02'}\n", |
|
|
113 |
"Children {'ATC/A02BX'}\n", |
|
|
114 |
"All children {'RxNorm/2344', 'ATC/A02BX', 'RxNorm/4501', 'ATC/A02BX71', 'ATC/A02B', 'RxNorm/7815', 'RxNorm/7019', 'ATC/A02BX77', 'RxNorm/2353', 'RxNorm/8705', 'RxNorm/38574', 'RxNorm/2620', 'RxNorm/2018', 'RxNorm/8704', 'RxNorm/8730', 'RxNorm/6852', 'RxNorm/2017', 'RxNorm/2403'}\n", |
|
|
115 |
"All parents {'ATC/A', 'ATC/A02', 'ATC/A02B'}\n" |
|
|
116 |
] |
|
|
117 |
} |
|
|
118 |
], |
|
|
119 |
"source": [ |
|
|
120 |
"# First, we can query the description for a particular code\n", |
|
|
121 |
"print(\"Description\", ontology.get_description(\"ATC/A02B\"))\n", |
|
|
122 |
"\n", |
|
|
123 |
"# Second, we can search for the parents of a particular code\n", |
|
|
124 |
"print(\"Parents\", ontology.get_parents(\"ATC/A02B\"))\n", |
|
|
125 |
"\n", |
|
|
126 |
"# Finally, we can search for the children of a particular code\n", |
|
|
127 |
"print(\"Children\", ontology.get_children(\"ATC/A02B\"))\n", |
|
|
128 |
"\n", |
|
|
129 |
"# For the sake of convience, we also support the recursive versions of querying parents and children\n", |
|
|
130 |
"print(\"All children\", ontology.get_all_children(\"ATC/A02B\"))\n", |
|
|
131 |
"print(\"All parents\", ontology.get_all_parents(\"ATC/A02B\"))" |
|
|
132 |
] |
|
|
133 |
} |
|
|
134 |
], |
|
|
135 |
"metadata": { |
|
|
136 |
"kernelspec": { |
|
|
137 |
"display_name": "Python 3 (ipykernel)", |
|
|
138 |
"language": "python", |
|
|
139 |
"name": "python3" |
|
|
140 |
}, |
|
|
141 |
"language_info": { |
|
|
142 |
"codemirror_mode": { |
|
|
143 |
"name": "ipython", |
|
|
144 |
"version": 3 |
|
|
145 |
}, |
|
|
146 |
"file_extension": ".py", |
|
|
147 |
"mimetype": "text/x-python", |
|
|
148 |
"name": "python", |
|
|
149 |
"nbconvert_exporter": "python", |
|
|
150 |
"pygments_lexer": "ipython3", |
|
|
151 |
"version": "3.10.14" |
|
|
152 |
} |
|
|
153 |
}, |
|
|
154 |
"nbformat": 4, |
|
|
155 |
"nbformat_minor": 4 |
|
|
156 |
} |