|
a |
|
b/nbs/index.ipynb |
|
|
1 |
{ |
|
|
2 |
"cells": [ |
|
|
3 |
{ |
|
|
4 |
"cell_type": "markdown", |
|
|
5 |
"metadata": {}, |
|
|
6 |
"source": [ |
|
|
7 |
"# eligibility_criteria_parser\n", |
|
|
8 |
"\n", |
|
|
9 |
"> Repository with experiments on the usability of prompt learning for parsing eligibility criteria in clinical trials" |
|
|
10 |
] |
|
|
11 |
}, |
|
|
12 |
{ |
|
|
13 |
"cell_type": "markdown", |
|
|
14 |
"metadata": {}, |
|
|
15 |
"source": [ |
|
|
16 |
"## Install\n", |
|
|
17 |
"\n", |
|
|
18 |
"In order to install the module issue the following commands" |
|
|
19 |
] |
|
|
20 |
}, |
|
|
21 |
{ |
|
|
22 |
"cell_type": "markdown", |
|
|
23 |
"metadata": {}, |
|
|
24 |
"source": [ |
|
|
25 |
"```sh\n", |
|
|
26 |
"bash$ git clone https://github.com/megaduks/criteria_parser.git\n", |
|
|
27 |
"\n", |
|
|
28 |
"bash$ cd criteria_parser\n", |
|
|
29 |
"\n", |
|
|
30 |
"bash$ pip install -r requirements.txt\n", |
|
|
31 |
"\n", |
|
|
32 |
"bash$ pip install -e '.[dev]'\n", |
|
|
33 |
"```" |
|
|
34 |
] |
|
|
35 |
}, |
|
|
36 |
{ |
|
|
37 |
"cell_type": "markdown", |
|
|
38 |
"metadata": {}, |
|
|
39 |
"source": [ |
|
|
40 |
"The next step is to run `dvc` to download the data" |
|
|
41 |
] |
|
|
42 |
}, |
|
|
43 |
{ |
|
|
44 |
"cell_type": "markdown", |
|
|
45 |
"metadata": {}, |
|
|
46 |
"source": [ |
|
|
47 |
"```bash\n", |
|
|
48 |
"bash$ dvc pull\n", |
|
|
49 |
"```" |
|
|
50 |
] |
|
|
51 |
}, |
|
|
52 |
{ |
|
|
53 |
"cell_type": "markdown", |
|
|
54 |
"metadata": {}, |
|
|
55 |
"source": [ |
|
|
56 |
"## How to use" |
|
|
57 |
] |
|
|
58 |
}, |
|
|
59 |
{ |
|
|
60 |
"cell_type": "markdown", |
|
|
61 |
"metadata": {}, |
|
|
62 |
"source": [ |
|
|
63 |
"The function `load_chia()` downloads the entire dataset as a dataframe" |
|
|
64 |
] |
|
|
65 |
}, |
|
|
66 |
{ |
|
|
67 |
"cell_type": "code", |
|
|
68 |
"execution_count": null, |
|
|
69 |
"metadata": {}, |
|
|
70 |
"outputs": [], |
|
|
71 |
"source": [ |
|
|
72 |
"from eligibility_criteria_parser.core import *\n", |
|
|
73 |
"\n", |
|
|
74 |
"df = load_chia()" |
|
|
75 |
] |
|
|
76 |
}, |
|
|
77 |
{ |
|
|
78 |
"cell_type": "code", |
|
|
79 |
"execution_count": null, |
|
|
80 |
"metadata": {}, |
|
|
81 |
"outputs": [ |
|
|
82 |
{ |
|
|
83 |
"data": { |
|
|
84 |
"text/html": [ |
|
|
85 |
"<div>\n", |
|
|
86 |
"<style scoped>\n", |
|
|
87 |
" .dataframe tbody tr th:only-of-type {\n", |
|
|
88 |
" vertical-align: middle;\n", |
|
|
89 |
" }\n", |
|
|
90 |
"\n", |
|
|
91 |
" .dataframe tbody tr th {\n", |
|
|
92 |
" vertical-align: top;\n", |
|
|
93 |
" }\n", |
|
|
94 |
"\n", |
|
|
95 |
" .dataframe thead th {\n", |
|
|
96 |
" text-align: right;\n", |
|
|
97 |
" }\n", |
|
|
98 |
"</style>\n", |
|
|
99 |
"<table border=\"1\" class=\"dataframe\">\n", |
|
|
100 |
" <thead>\n", |
|
|
101 |
" <tr style=\"text-align: right;\">\n", |
|
|
102 |
" <th></th>\n", |
|
|
103 |
" <th>ct_no</th>\n", |
|
|
104 |
" <th>criteria</th>\n", |
|
|
105 |
" <th>mode</th>\n", |
|
|
106 |
" <th>drugs</th>\n", |
|
|
107 |
" <th>persons</th>\n", |
|
|
108 |
" <th>procedures</th>\n", |
|
|
109 |
" <th>conditions</th>\n", |
|
|
110 |
" <th>devices</th>\n", |
|
|
111 |
" <th>visits</th>\n", |
|
|
112 |
" <th>scopes</th>\n", |
|
|
113 |
" <th>observations</th>\n", |
|
|
114 |
" <th>measurements</th>\n", |
|
|
115 |
" </tr>\n", |
|
|
116 |
" </thead>\n", |
|
|
117 |
" <tbody>\n", |
|
|
118 |
" <tr>\n", |
|
|
119 |
" <th>0</th>\n", |
|
|
120 |
" <td>NCT03124329</td>\n", |
|
|
121 |
" <td>Male and female individuals between ages of 18...</td>\n", |
|
|
122 |
" <td>inclusion</td>\n", |
|
|
123 |
" <td>None</td>\n", |
|
|
124 |
" <td>[ages]</td>\n", |
|
|
125 |
" <td>None</td>\n", |
|
|
126 |
" <td>[gingival recession defects, recession defects]</td>\n", |
|
|
127 |
" <td>None</td>\n", |
|
|
128 |
" <td>None</td>\n", |
|
|
129 |
" <td>None</td>\n", |
|
|
130 |
" <td>[cervical restorations extending to the CEJ]</td>\n", |
|
|
131 |
" <td>[recession, keratinized gingiva, Miller]</td>\n", |
|
|
132 |
" </tr>\n", |
|
|
133 |
" <tr>\n", |
|
|
134 |
" <th>1</th>\n", |
|
|
135 |
" <td>NCT02796378</td>\n", |
|
|
136 |
" <td>Elevated blood-cholesterol</td>\n", |
|
|
137 |
" <td>inclusion</td>\n", |
|
|
138 |
" <td>None</td>\n", |
|
|
139 |
" <td>None</td>\n", |
|
|
140 |
" <td>None</td>\n", |
|
|
141 |
" <td>None</td>\n", |
|
|
142 |
" <td>None</td>\n", |
|
|
143 |
" <td>None</td>\n", |
|
|
144 |
" <td>None</td>\n", |
|
|
145 |
" <td>None</td>\n", |
|
|
146 |
" <td>[blood-cholesterol]</td>\n", |
|
|
147 |
" </tr>\n", |
|
|
148 |
" <tr>\n", |
|
|
149 |
" <th>2</th>\n", |
|
|
150 |
" <td>NCT03216967</td>\n", |
|
|
151 |
" <td>Adult patients Kidney transplant recipients Pa...</td>\n", |
|
|
152 |
" <td>inclusion</td>\n", |
|
|
153 |
" <td>[calcineurin inhibitor, mycophenolic acid]</td>\n", |
|
|
154 |
" <td>[Adult]</td>\n", |
|
|
155 |
" <td>None</td>\n", |
|
|
156 |
" <td>None</td>\n", |
|
|
157 |
" <td>None</td>\n", |
|
|
158 |
" <td>None</td>\n", |
|
|
159 |
" <td>None</td>\n", |
|
|
160 |
" <td>None</td>\n", |
|
|
161 |
" <td>[Viremia, pregnancy test, blood ß-HCG dosage]</td>\n", |
|
|
162 |
" </tr>\n", |
|
|
163 |
" <tr>\n", |
|
|
164 |
" <th>3</th>\n", |
|
|
165 |
" <td>NCT02200978</td>\n", |
|
|
166 |
" <td>Patients less than 16 years old with newly dia...</td>\n", |
|
|
167 |
" <td>inclusion</td>\n", |
|
|
168 |
" <td>None</td>\n", |
|
|
169 |
" <td>[old]</td>\n", |
|
|
170 |
" <td>None</td>\n", |
|
|
171 |
" <td>[acute promyelocytic leukemia]</td>\n", |
|
|
172 |
" <td>None</td>\n", |
|
|
173 |
" <td>None</td>\n", |
|
|
174 |
" <td>None</td>\n", |
|
|
175 |
" <td>None</td>\n", |
|
|
176 |
" <td>[PML-RARa]</td>\n", |
|
|
177 |
" </tr>\n", |
|
|
178 |
" <tr>\n", |
|
|
179 |
" <th>4</th>\n", |
|
|
180 |
" <td>NCT01314898</td>\n", |
|
|
181 |
" <td>Male and/or female healthy volunteers, age 18 ...</td>\n", |
|
|
182 |
" <td>inclusion</td>\n", |
|
|
183 |
" <td>None</td>\n", |
|
|
184 |
" <td>[Male, female, age, Females]</td>\n", |
|
|
185 |
" <td>None</td>\n", |
|
|
186 |
" <td>[healthy, childbearing potential]</td>\n", |
|
|
187 |
" <td>None</td>\n", |
|
|
188 |
" <td>None</td>\n", |
|
|
189 |
" <td>None</td>\n", |
|
|
190 |
" <td>None</td>\n", |
|
|
191 |
" <td>[Body Mass Index (BMI), total body weight]</td>\n", |
|
|
192 |
" </tr>\n", |
|
|
193 |
" </tbody>\n", |
|
|
194 |
"</table>\n", |
|
|
195 |
"</div>" |
|
|
196 |
], |
|
|
197 |
"text/plain": [ |
|
|
198 |
" ct_no criteria mode \\\n", |
|
|
199 |
"0 NCT03124329 Male and female individuals between ages of 18... inclusion \n", |
|
|
200 |
"1 NCT02796378 Elevated blood-cholesterol inclusion \n", |
|
|
201 |
"2 NCT03216967 Adult patients Kidney transplant recipients Pa... inclusion \n", |
|
|
202 |
"3 NCT02200978 Patients less than 16 years old with newly dia... inclusion \n", |
|
|
203 |
"4 NCT01314898 Male and/or female healthy volunteers, age 18 ... inclusion \n", |
|
|
204 |
"\n", |
|
|
205 |
" drugs persons \\\n", |
|
|
206 |
"0 None [ages] \n", |
|
|
207 |
"1 None None \n", |
|
|
208 |
"2 [calcineurin inhibitor, mycophenolic acid] [Adult] \n", |
|
|
209 |
"3 None [old] \n", |
|
|
210 |
"4 None [Male, female, age, Females] \n", |
|
|
211 |
"\n", |
|
|
212 |
" procedures conditions devices visits \\\n", |
|
|
213 |
"0 None [gingival recession defects, recession defects] None None \n", |
|
|
214 |
"1 None None None None \n", |
|
|
215 |
"2 None None None None \n", |
|
|
216 |
"3 None [acute promyelocytic leukemia] None None \n", |
|
|
217 |
"4 None [healthy, childbearing potential] None None \n", |
|
|
218 |
"\n", |
|
|
219 |
" scopes observations \\\n", |
|
|
220 |
"0 None [cervical restorations extending to the CEJ] \n", |
|
|
221 |
"1 None None \n", |
|
|
222 |
"2 None None \n", |
|
|
223 |
"3 None None \n", |
|
|
224 |
"4 None None \n", |
|
|
225 |
"\n", |
|
|
226 |
" measurements \n", |
|
|
227 |
"0 [recession, keratinized gingiva, Miller] \n", |
|
|
228 |
"1 [blood-cholesterol] \n", |
|
|
229 |
"2 [Viremia, pregnancy test, blood ß-HCG dosage] \n", |
|
|
230 |
"3 [PML-RARa] \n", |
|
|
231 |
"4 [Body Mass Index (BMI), total body weight] " |
|
|
232 |
] |
|
|
233 |
}, |
|
|
234 |
"execution_count": null, |
|
|
235 |
"metadata": {}, |
|
|
236 |
"output_type": "execute_result" |
|
|
237 |
} |
|
|
238 |
], |
|
|
239 |
"source": [ |
|
|
240 |
"df.head()" |
|
|
241 |
] |
|
|
242 |
}, |
|
|
243 |
{ |
|
|
244 |
"cell_type": "markdown", |
|
|
245 |
"metadata": {}, |
|
|
246 |
"source": [ |
|
|
247 |
"The dataset consists of 2000 clinical trial criteria annotated with 10 different entities " |
|
|
248 |
] |
|
|
249 |
}, |
|
|
250 |
{ |
|
|
251 |
"cell_type": "code", |
|
|
252 |
"execution_count": null, |
|
|
253 |
"metadata": {}, |
|
|
254 |
"outputs": [ |
|
|
255 |
{ |
|
|
256 |
"data": { |
|
|
257 |
"text/plain": [ |
|
|
258 |
"(2000, 12)" |
|
|
259 |
] |
|
|
260 |
}, |
|
|
261 |
"execution_count": null, |
|
|
262 |
"metadata": {}, |
|
|
263 |
"output_type": "execute_result" |
|
|
264 |
} |
|
|
265 |
], |
|
|
266 |
"source": [ |
|
|
267 |
"df.shape" |
|
|
268 |
] |
|
|
269 |
}, |
|
|
270 |
{ |
|
|
271 |
"cell_type": "markdown", |
|
|
272 |
"metadata": {}, |
|
|
273 |
"source": [ |
|
|
274 |
"To extract a particular entity use `get_annotations()` function. This function accepts the name of the annotated entity, the number of examples to be downloaded, and the flag to allow for random/ordered retrieval of examples. \n", |
|
|
275 |
"\n", |
|
|
276 |
"The result is a list of tuples, each tuple contains the clinical trial ID, the text of the criterion, and the annotated entities." |
|
|
277 |
] |
|
|
278 |
}, |
|
|
279 |
{ |
|
|
280 |
"cell_type": "code", |
|
|
281 |
"execution_count": null, |
|
|
282 |
"metadata": {}, |
|
|
283 |
"outputs": [ |
|
|
284 |
{ |
|
|
285 |
"data": { |
|
|
286 |
"text/plain": [ |
|
|
287 |
"[('NCT03216967',\n", |
|
|
288 |
" 'Adult patients Kidney transplant recipients Patients treated by a calcineurin inhibitor and mycophenolic acid Viremia >= 3 log UI/ml Patients who have given written informed consent Negative pregnancy test (blood ß-HCG dosage)',\n", |
|
|
289 |
" ['calcineurin inhibitor', 'mycophenolic acid']),\n", |
|
|
290 |
" ('NCT00730301',\n", |
|
|
291 |
" 'Patient diagnosed by HRCT Core Lab with eligible heterogeneous disease distribution and at least one complete oblique fissure. Age from 40 to 75 years BMI < 32 kg/m2 FEV1 < 40% of predicted value, FEV1/FVC < 70% TLC > 120% predicted, RV > 150% predicted. Stable with < 20 mg prednisone (or equivalent) qd PaCO2 < 50mm Hg PaO2 > 45 mm Hg on room air 6-min walk of > 50m (without rehabilitation) or > 100m (with rehabilitation) Nonsmoking for 4 months prior to initial interview and throughout screening The patient agrees to all protocol required follow-up intervals. The patient has no child bearing potential The patient is willing and able to complete protocol required baseline assessments and procedures ',\n", |
|
|
292 |
" ['prednisone']),\n", |
|
|
293 |
" ('NCT02715466',\n", |
|
|
294 |
" 'Male or female patients = 18 and = 85 years of age Women of child bearing potential must test negative on standard pregnancy test (urine or serum) Patients with body weight = 55 kg and = 140 kg and body mass index (BMI) = 18 kg/m2 Patients diagnosed severe sepsis / septic shock at admission on Intensive Care Unit who can be enrolled within 90 min after admission OR patients diagnosed severe sepsis / septic shock during Intensive Care Unit stay who can be enrolled within 90 min after diagnosis Patients where antibiotic therapy has already been started (prior to randomization) Patient who are fluid responsive. Fluid responsiveness is defined as increase of > 10% in mean arterial pressure (MAP) after passive leg raising (PLR) Signed informed consent by patient, legal representative or authorized person or deferred consent',\n", |
|
|
295 |
" ['antibiotic therapy']),\n", |
|
|
296 |
" ('NCT02735902',\n", |
|
|
297 |
" 'The patient or his/her representative must have given free and informed consent and signed the consent The patient must be insured or beneficiary of a health insurance plan The patient is available for 12 months of follow-up The patient underwent a successful transcutaneous implant procedure for an aortic valve within the past 24 hours The patient was receiving anti-vitamin K (AVK) treatment before percutaneous implantation of the aortic valve',\n", |
|
|
298 |
" ['anti-vitamin K', 'AVK']),\n", |
|
|
299 |
" ('NCT00989261',\n", |
|
|
300 |
" '1. Males and females age ≥18 years in second relapse or refractory. 2. Males and females age ≥60 years in first relapse or refractory. 3. Must have baseline bone marrow sample taken. 4. Morphologically documented primary AML or AML secondary to myelodysplastic syndrome (MDS with ≥20% bone marrow or peripheral blasts), as defined by the World Health Organization (WHO) criteria, confirmed by pathology review at treating institution. 5. Able to swallow the liquid study drug. 6. ECOG performance status of 0 to 2 7. In the absence of rapidly progressing disease, the interval from prior treatment to time of AC220 administration will be at least 2 weeks for cytotoxic agents or at least 5 half-lives for noncytotoxic agents. The use of chemotherapeutic or antileukemic agents other than hydroxyurea is not permitted during the study with the possible exception of intrathecal (IT) therapy at the discretion of the Investigator and with the agreement of the Sponsor. 8. Persistent chronic clinically significant non-hematological toxicities from prior treatment must be ≤Grade 1. 9. Prior therapy with FLT3 inhibitors is permitted, except previous treatment with AC220. 10. Serum creatinine ≤1.5 × ULN and glomerular filtration rate (GFR) > 30 mL/min 11. Serum potassium, magnesium, and calcium levels should be at least within institutional normal limits. 12. Total serum bilirubin ≤1.5 × ULN 13. Serum aspartate transaminase (AST) and/or alanine transaminase (ALT) ≤2.5 × ULN 14. Females of childbearing potential must have a negative pregnancy test (urine β-hCG). 15. Females of childbearing potential and sexually mature males must agree to use a medically accepted method of contraception throughout the study. 16. Written informed consent must be provided. ',\n", |
|
|
301 |
" ['FLT3 inhibitors', 'AC220'])]" |
|
|
302 |
] |
|
|
303 |
}, |
|
|
304 |
"execution_count": null, |
|
|
305 |
"metadata": {}, |
|
|
306 |
"output_type": "execute_result" |
|
|
307 |
} |
|
|
308 |
], |
|
|
309 |
"source": [ |
|
|
310 |
"examples = get_annotations(\"drugs\", n=5, random=False)\n", |
|
|
311 |
"examples" |
|
|
312 |
] |
|
|
313 |
}, |
|
|
314 |
{ |
|
|
315 |
"cell_type": "markdown", |
|
|
316 |
"metadata": {}, |
|
|
317 |
"source": [ |
|
|
318 |
"In order to use this data for prompting, the IDs, criteria, and annotations have to be separated into lists." |
|
|
319 |
] |
|
|
320 |
}, |
|
|
321 |
{ |
|
|
322 |
"cell_type": "code", |
|
|
323 |
"execution_count": null, |
|
|
324 |
"metadata": {}, |
|
|
325 |
"outputs": [ |
|
|
326 |
{ |
|
|
327 |
"name": "stdout", |
|
|
328 |
"output_type": "stream", |
|
|
329 |
"text": [ |
|
|
330 |
"['NCT03216967', 'NCT00730301', 'NCT02715466']\n", |
|
|
331 |
"['Adult patients Kidney transplant recipients Patients treated by a calcineurin inhibitor and mycophenolic acid Viremia >= 3 log UI/ml Patients who have given written informed consent Negative pregnancy test (blood ß-HCG dosage)', 'Patient diagnosed by HRCT Core Lab with eligible heterogeneous disease distribution and at least one complete oblique fissure. Age from 40 to 75 years BMI < 32 kg/m2 FEV1 < 40% of predicted value, FEV1/FVC < 70% TLC > 120% predicted, RV > 150% predicted. Stable with < 20 mg prednisone (or equivalent) qd PaCO2 < 50mm Hg PaO2 > 45 mm Hg on room air 6-min walk of > 50m (without rehabilitation) or > 100m (with rehabilitation) Nonsmoking for 4 months prior to initial interview and throughout screening The patient agrees to all protocol required follow-up intervals. The patient has no child bearing potential The patient is willing and able to complete protocol required baseline assessments and procedures ', 'Male or female patients = 18 and = 85 years of age Women of child bearing potential must test negative on standard pregnancy test (urine or serum) Patients with body weight = 55 kg and = 140 kg and body mass index (BMI) = 18 kg/m2 Patients diagnosed severe sepsis / septic shock at admission on Intensive Care Unit who can be enrolled within 90 min after admission OR patients diagnosed severe sepsis / septic shock during Intensive Care Unit stay who can be enrolled within 90 min after diagnosis Patients where antibiotic therapy has already been started (prior to randomization) Patient who are fluid responsive. Fluid responsiveness is defined as increase of > 10% in mean arterial pressure (MAP) after passive leg raising (PLR) Signed informed consent by patient, legal representative or authorized person or deferred consent']\n", |
|
|
332 |
"[['calcineurin inhibitor', 'mycophenolic acid'], ['prednisone'], ['antibiotic therapy']]\n" |
|
|
333 |
] |
|
|
334 |
} |
|
|
335 |
], |
|
|
336 |
"source": [ |
|
|
337 |
"ids, criteria, ents_true = map(list, zip(*examples))\n", |
|
|
338 |
"\n", |
|
|
339 |
"print(ids[:3])\n", |
|
|
340 |
"print(criteria[:3])\n", |
|
|
341 |
"print(ents_true[:3])" |
|
|
342 |
] |
|
|
343 |
}, |
|
|
344 |
{ |
|
|
345 |
"cell_type": "markdown", |
|
|
346 |
"metadata": {}, |
|
|
347 |
"source": [ |
|
|
348 |
"The last step is to prepare two utility functions:\n", |
|
|
349 |
"- prompting function: creates a prompt for a given example\n", |
|
|
350 |
"- deprompting function: reads the answer from the language model and extracts predicted entities\n", |
|
|
351 |
"\n", |
|
|
352 |
"Below is an example of a simple prompting function. This function constructs a specific template with `n_shots` examples and attaches the `criterion` for which the language model has to generate the response" |
|
|
353 |
] |
|
|
354 |
}, |
|
|
355 |
{ |
|
|
356 |
"cell_type": "code", |
|
|
357 |
"execution_count": null, |
|
|
358 |
"metadata": {}, |
|
|
359 |
"outputs": [], |
|
|
360 |
"source": [ |
|
|
361 |
"from typing import List, Tuple\n", |
|
|
362 |
"\n", |
|
|
363 |
"def simple_prompt(criterion: str, examples: List[Tuple[id, str,str]], entity: str, n_shots: int) -> str:\n", |
|
|
364 |
" \n", |
|
|
365 |
" TEXT = \"\"\n", |
|
|
366 |
" for ids, c, e in examples[:n_shots]:\n", |
|
|
367 |
" TEXT += f\"\"\"[text]: {c} \\n###\\n[{entity}]: {e} \\n###\\n\"\"\"\n", |
|
|
368 |
" \n", |
|
|
369 |
" return f\"\"\"{TEXT}[text]: {criterion} \\n###\\n[{entity}]:\"\"\"" |
|
|
370 |
] |
|
|
371 |
}, |
|
|
372 |
{ |
|
|
373 |
"cell_type": "markdown", |
|
|
374 |
"metadata": {}, |
|
|
375 |
"source": [ |
|
|
376 |
"As can be seen from the signature, the function accepts the following input:\n", |
|
|
377 |
"- `criterion`: the input example\n", |
|
|
378 |
"- `examples`: list of tuples (clinical trial id, criterion, true entities) that can be used to generate a few shot examples\n", |
|
|
379 |
"- `entity`: the name of the entity\n", |
|
|
380 |
"- `num_shots`: number of examples to be included in the prompt\n", |
|
|
381 |
"\n", |
|
|
382 |
"The `examples` input has exactly the same structure as the output of the `get_annotations()` function.\n", |
|
|
383 |
"\n", |
|
|
384 |
"Let's test the prompt generated by the function" |
|
|
385 |
] |
|
|
386 |
}, |
|
|
387 |
{ |
|
|
388 |
"cell_type": "code", |
|
|
389 |
"execution_count": null, |
|
|
390 |
"metadata": {}, |
|
|
391 |
"outputs": [ |
|
|
392 |
{ |
|
|
393 |
"name": "stdout", |
|
|
394 |
"output_type": "stream", |
|
|
395 |
"text": [ |
|
|
396 |
"criterion: 1. Males and females age ≥18 years in second relapse or refractory. 2. Males and females age ≥60 years in first relapse or refractory. 3. Must have baseline bone marrow sample taken. 4. Morphologically documented primary AML or AML secondary to myelodysplastic syndrome (MDS with ≥20% bone marrow or peripheral blasts), as defined by the World Health Organization (WHO) criteria, confirmed by pathology review at treating institution. 5. Able to swallow the liquid study drug. 6. ECOG performance status of 0 to 2 7. In the absence of rapidly progressing disease, the interval from prior treatment to time of AC220 administration will be at least 2 weeks for cytotoxic agents or at least 5 half-lives for noncytotoxic agents. The use of chemotherapeutic or antileukemic agents other than hydroxyurea is not permitted during the study with the possible exception of intrathecal (IT) therapy at the discretion of the Investigator and with the agreement of the Sponsor. 8. Persistent chronic clinically significant non-hematological toxicities from prior treatment must be ≤Grade 1. 9. Prior therapy with FLT3 inhibitors is permitted, except previous treatment with AC220. 10. Serum creatinine ≤1.5 × ULN and glomerular filtration rate (GFR) > 30 mL/min 11. Serum potassium, magnesium, and calcium levels should be at least within institutional normal limits. 12. Total serum bilirubin ≤1.5 × ULN 13. Serum aspartate transaminase (AST) and/or alanine transaminase (ALT) ≤2.5 × ULN 14. Females of childbearing potential must have a negative pregnancy test (urine β-hCG). 15. Females of childbearing potential and sexually mature males must agree to use a medically accepted method of contraception throughout the study. 16. Written informed consent must be provided. \n", |
|
|
397 |
"\n", |
|
|
398 |
" annotated drugs: ['FLT3 inhibitors', 'AC220']\n" |
|
|
399 |
] |
|
|
400 |
} |
|
|
401 |
], |
|
|
402 |
"source": [ |
|
|
403 |
"ct_id, criterion, e_true = examples[-1]\n", |
|
|
404 |
"\n", |
|
|
405 |
"print(f\"criterion: {criterion} \\n\\n annotated drugs: {e_true}\")" |
|
|
406 |
] |
|
|
407 |
}, |
|
|
408 |
{ |
|
|
409 |
"cell_type": "code", |
|
|
410 |
"execution_count": null, |
|
|
411 |
"metadata": {}, |
|
|
412 |
"outputs": [ |
|
|
413 |
{ |
|
|
414 |
"name": "stdout", |
|
|
415 |
"output_type": "stream", |
|
|
416 |
"text": [ |
|
|
417 |
"[text]: Adult patients Kidney transplant recipients Patients treated by a calcineurin inhibitor and mycophenolic acid Viremia >= 3 log UI/ml Patients who have given written informed consent Negative pregnancy test (blood ß-HCG dosage) \n", |
|
|
418 |
"###\n", |
|
|
419 |
"[drugs]: ['calcineurin inhibitor', 'mycophenolic acid'] \n", |
|
|
420 |
"###\n", |
|
|
421 |
"[text]: Patient diagnosed by HRCT Core Lab with eligible heterogeneous disease distribution and at least one complete oblique fissure. Age from 40 to 75 years BMI < 32 kg/m2 FEV1 < 40% of predicted value, FEV1/FVC < 70% TLC > 120% predicted, RV > 150% predicted. Stable with < 20 mg prednisone (or equivalent) qd PaCO2 < 50mm Hg PaO2 > 45 mm Hg on room air 6-min walk of > 50m (without rehabilitation) or > 100m (with rehabilitation) Nonsmoking for 4 months prior to initial interview and throughout screening The patient agrees to all protocol required follow-up intervals. The patient has no child bearing potential The patient is willing and able to complete protocol required baseline assessments and procedures \n", |
|
|
422 |
"###\n", |
|
|
423 |
"[drugs]: ['prednisone'] \n", |
|
|
424 |
"###\n", |
|
|
425 |
"[text]: Male or female patients = 18 and = 85 years of age Women of child bearing potential must test negative on standard pregnancy test (urine or serum) Patients with body weight = 55 kg and = 140 kg and body mass index (BMI) = 18 kg/m2 Patients diagnosed severe sepsis / septic shock at admission on Intensive Care Unit who can be enrolled within 90 min after admission OR patients diagnosed severe sepsis / septic shock during Intensive Care Unit stay who can be enrolled within 90 min after diagnosis Patients where antibiotic therapy has already been started (prior to randomization) Patient who are fluid responsive. Fluid responsiveness is defined as increase of > 10% in mean arterial pressure (MAP) after passive leg raising (PLR) Signed informed consent by patient, legal representative or authorized person or deferred consent \n", |
|
|
426 |
"###\n", |
|
|
427 |
"[drugs]: ['antibiotic therapy'] \n", |
|
|
428 |
"###\n", |
|
|
429 |
"[text]: 1. Males and females age ≥18 years in second relapse or refractory. 2. Males and females age ≥60 years in first relapse or refractory. 3. Must have baseline bone marrow sample taken. 4. Morphologically documented primary AML or AML secondary to myelodysplastic syndrome (MDS with ≥20% bone marrow or peripheral blasts), as defined by the World Health Organization (WHO) criteria, confirmed by pathology review at treating institution. 5. Able to swallow the liquid study drug. 6. ECOG performance status of 0 to 2 7. In the absence of rapidly progressing disease, the interval from prior treatment to time of AC220 administration will be at least 2 weeks for cytotoxic agents or at least 5 half-lives for noncytotoxic agents. The use of chemotherapeutic or antileukemic agents other than hydroxyurea is not permitted during the study with the possible exception of intrathecal (IT) therapy at the discretion of the Investigator and with the agreement of the Sponsor. 8. Persistent chronic clinically significant non-hematological toxicities from prior treatment must be ≤Grade 1. 9. Prior therapy with FLT3 inhibitors is permitted, except previous treatment with AC220. 10. Serum creatinine ≤1.5 × ULN and glomerular filtration rate (GFR) > 30 mL/min 11. Serum potassium, magnesium, and calcium levels should be at least within institutional normal limits. 12. Total serum bilirubin ≤1.5 × ULN 13. Serum aspartate transaminase (AST) and/or alanine transaminase (ALT) ≤2.5 × ULN 14. Females of childbearing potential must have a negative pregnancy test (urine β-hCG). 15. Females of childbearing potential and sexually mature males must agree to use a medically accepted method of contraception throughout the study. 16. Written informed consent must be provided. \n", |
|
|
430 |
"###\n", |
|
|
431 |
"[drugs]:\n" |
|
|
432 |
] |
|
|
433 |
} |
|
|
434 |
], |
|
|
435 |
"source": [ |
|
|
436 |
"prompt = simple_prompt(criterion=criterion, examples=examples, entity=\"drugs\", n_shots=3)\n", |
|
|
437 |
"\n", |
|
|
438 |
"print(prompt)" |
|
|
439 |
] |
|
|
440 |
}, |
|
|
441 |
{ |
|
|
442 |
"cell_type": "markdown", |
|
|
443 |
"metadata": {}, |
|
|
444 |
"source": [ |
|
|
445 |
"Similarly, a deprompting function has to be created to parse the answer from the language model and extract only the part relevant to the predicted entities. Below is an example of a simple deprompting function. The output of the language model **does not contain the input prompt**. The function simply removes all punctuation and all mentions of the entity name, and returns a list of unique terms generated by the language model." |
|
|
446 |
] |
|
|
447 |
}, |
|
|
448 |
{ |
|
|
449 |
"cell_type": "code", |
|
|
450 |
"execution_count": null, |
|
|
451 |
"metadata": {}, |
|
|
452 |
"outputs": [], |
|
|
453 |
"source": [ |
|
|
454 |
"def simple_deprompt(model_output: str, entity: str) -> List[str]:\n", |
|
|
455 |
" return list(\n", |
|
|
456 |
" set(\n", |
|
|
457 |
" model_output.translate(str.maketrans(\"\", \"\", string.punctuation))\n", |
|
|
458 |
" .replace(f\"{entity}\", \"\")\n", |
|
|
459 |
" .split()\n", |
|
|
460 |
" )\n", |
|
|
461 |
" )" |
|
|
462 |
] |
|
|
463 |
}, |
|
|
464 |
{ |
|
|
465 |
"cell_type": "markdown", |
|
|
466 |
"metadata": {}, |
|
|
467 |
"source": [ |
|
|
468 |
"The prediction is performed by the `fit_prompt` function which expects the following parameters:\n", |
|
|
469 |
"- `examples`: list of examples for which to perform prompting\n", |
|
|
470 |
"- `entity`: name of the entity\n", |
|
|
471 |
"- `model`: an object representing the BioGPT model\n", |
|
|
472 |
"- `prompt_fun`: a handle to the prompting funciton\n", |
|
|
473 |
"- `deprompt_fun`: a handle to the deprompting function\n", |
|
|
474 |
"\n", |
|
|
475 |
"Assuming we have correctly initialized the BioGPT model under the `model` variable, the invocation of the function is:" |
|
|
476 |
] |
|
|
477 |
}, |
|
|
478 |
{ |
|
|
479 |
"cell_type": "code", |
|
|
480 |
"execution_count": null, |
|
|
481 |
"metadata": {}, |
|
|
482 |
"outputs": [], |
|
|
483 |
"source": [ |
|
|
484 |
"# from fairseq.models.transformer_lm import TransformerLanguageModel\n", |
|
|
485 |
"\n", |
|
|
486 |
"# model = TransformerLanguageModel.from_pretrained(\n", |
|
|
487 |
"# \"biogpt/checkpoints/Pre-trained-BioGPT\", \n", |
|
|
488 |
"# \"checkpoint.pt\", \n", |
|
|
489 |
"# \"biogpt/BioGPT/data\",\n", |
|
|
490 |
"# tokenizer='moses', \n", |
|
|
491 |
"# bpe='fastbpe', \n", |
|
|
492 |
"# bpe_codes=\"biogpt/BioGPT/data/bpecodes\",\n", |
|
|
493 |
"# min_len=100,\n", |
|
|
494 |
"# max_len_b=2048,\n", |
|
|
495 |
"# cuda=True,\n", |
|
|
496 |
"# verbose=False,\n", |
|
|
497 |
"# )\n", |
|
|
498 |
"\n", |
|
|
499 |
"model = None # here the model should be initialized as commented out\n", |
|
|
500 |
"\n", |
|
|
501 |
"ents_pred = fit_prompt(examples, \"drugs\", model, simple_prompt, simple_deprompt)" |
|
|
502 |
] |
|
|
503 |
}, |
|
|
504 |
{ |
|
|
505 |
"cell_type": "markdown", |
|
|
506 |
"metadata": {}, |
|
|
507 |
"source": [ |
|
|
508 |
"Finally, the results can be computed using a single function `prompt_score()` which accepts two lists: true entities and the entities predicted from the language model. Both arguments are lists of lists of strings. The true entities are returned from the `get_annotations()` function, and the predicted entities are the results of the `fit_prompt()` function.\n", |
|
|
509 |
"\n", |
|
|
510 |
"The results of the function is a dictionary with keys representing each mode of Jaccard coefficient (*strict, left, right, relaxed*), each value is a tuple with four numbers:\n", |
|
|
511 |
"- mean jaccard score of entity matches\n", |
|
|
512 |
"- standard deviation of jaccard scores of entity matches\n", |
|
|
513 |
"- mean percentage coverage of entities\n", |
|
|
514 |
"- standard deviation of percentage coverages" |
|
|
515 |
] |
|
|
516 |
}, |
|
|
517 |
{ |
|
|
518 |
"cell_type": "code", |
|
|
519 |
"execution_count": null, |
|
|
520 |
"metadata": {}, |
|
|
521 |
"outputs": [], |
|
|
522 |
"source": [] |
|
|
523 |
} |
|
|
524 |
], |
|
|
525 |
"metadata": { |
|
|
526 |
"kernelspec": { |
|
|
527 |
"display_name": "Python 3 (ipykernel)", |
|
|
528 |
"language": "python", |
|
|
529 |
"name": "python3" |
|
|
530 |
} |
|
|
531 |
}, |
|
|
532 |
"nbformat": 4, |
|
|
533 |
"nbformat_minor": 4 |
|
|
534 |
} |