Diff of /ehragent/prompts_mimic.py [000000] .. [6cf5c7]

Switch to unified view

a b/ehragent/prompts_mimic.py
1
CodeHeader = """from tools import tabtools, calculator
2
Calculate = calculator.WolframAlphaCalculator
3
LoadDB = tabtools.db_loader
4
FilterDB = tabtools.data_filter
5
GetValue = tabtools.get_value
6
SQLInterpreter = tabtools.sql_interpreter
7
Calendar = tabtools.date_calculator
8
"""
9
10
RetrKnowledge = """Read the following data descriptions, generate the background knowledge as the context information that could be helpful for answering the question.
11
(1) Tables are linked by identifiers which usually have the suffix 'ID'. For example, SUBJECT_ID refers to a unique patient, HADM_ID refers to a unique admission to the hospital, and ICUSTAY_ID refers to a unique admission to an intensive care unit.
12
(2) Charted events such as notes, laboratory tests, and fluid balance are stored in a series of 'events' tables. For example the outputevents table contains all measurements related to output for a given patient, while the labevents table contains laboratory test results for a patient.
13
(3) Tables prefixed with 'd_' are dictionary tables and provide definitions for identifiers. For example, every row of chartevents is associated with a single ITEMID which represents the concept measured, but it does not contain the actual name of the measurement. By joining chartevents and d_items on ITEMID, it is possible to identify the concept represented by a given ITEMID.
14
(4) For the databases, four of them are used to define and track patient stays: admissions, patients, icustays, and transfers. Another four tables are dictionaries for cross-referencing codes against their respective definitions: d_icd_diagnoses, d_icd_procedures, d_items, and d_labitems. The remaining tables, including chartevents, cost, inputevents_cv, labevents, microbiologyevents, outputevents, prescriptions, procedures_icd, contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.
15
For different tables, they contain the following information:
16
(1) admissions: ROW_ID, SUBJECT_ID, HADM_ID, ADMITTIME, DISCHTIME, ADMISSION_TYPE, ADMISSION_LOCATION, DISCHARGE_LOCATION, INSURANCE, LANGUAGE, MARITAL_STATUS, ETHNICITY, AGE
17
(2) chartevents: ROW_ID, SUBJECT_ID, HADM_ID, ICUSTAY_ID, ITEMID, CHARTTIME, VALUENUM, VALUEUOM
18
(3) cost: ROW_ID, SUBJECT_ID, HADM_ID, EVENT_TYPE, EVENT_ID, CHARGETIME, COST
19
(4) d_icd_diagnoses: ROW_ID, ICD9_CODE, SHORT_TITLE, LONG_TITLE
20
(5) d_icd_procedures: ROW_ID, ICD9_CODE, SHORT_TITLE, LONG_TITLE
21
(6) d_items: ROW_ID, ITEMID, LABEL, LINKSTO
22
(7) d_labitems: ROW_ID, ITEMID, LABEL
23
(8) dianoses_icd: ROW_ID, SUBJECT_ID, HADM_ID, ICD9_CODE, CHARTTIME
24
(9) icustays: ROW_ID, SUBJECT_ID, HADM_ID, ICUSTAY_ID, FIRST_CAREUNIT, LAST_CAREUNIT, FIRST_WARDID, LAST_WARDID, INTIME, OUTTIME
25
(10) inputevents_cv: ROW_ID, SUBJECT_ID, HADM_ID, ICUSTAY_ID, CHARTTIME, ITEMID, AMOUNT
26
(11) labevents: ROW_ID, SUBJECT_ID, HADM_ID, ITEMID, CHARTTIME, VALUENUM, VALUEUOM
27
(12) microbiologyevents: RROW_ID, SUBJECT_ID, HADM_ID, CHARTTIME, SPEC_TYPE_DESC, ORG_NAME
28
(13) outputevents: ROW_ID, SUBJECT_ID, HADM_ID, ICUSTAY_ID, CHARTTIME, ITEMID, VALUE
29
(14) patients: ROW_ID, SUBJECT_ID, GENDER, DOB, DOD
30
(15) prescriptions: ROW_ID, SUBJECT_ID, HADM_ID, STARTDATE, ENDDATE, DRUG, DOSE_VAL_RX, DOSE_UNIT_RX, ROUTE
31
(16) procedures_icd: ROW_ID, SUBJECT_ID, HADM_ID, ICD9_CODE, CHARTTIME
32
(17) transfers: ROW_ID, SUBJECT_ID, HADM_ID, ICUSTAY_ID, EVENTTYPE, CAREUNIT, WARDID, INTIME, OUTTIME
33
34
Question: What is the maximum total hospital cost that involves a diagnosis named comp-oth vasc dev/graft since 1 year ago?
35
Knowledge: 
36
- As comp-oth vasc dev/graft is a diagnose, the corresponding ICD9_CODE can be found in the d_icd_diagnoses database.
37
- The ICD9_CODE can be used to find the corresponding HADM_ID in the diagnoses_icd database.
38
- The HADM_ID can be used to find the corresponding COST in the cost database.
39
40
Question: had any tpn w/lipids been given to patient 2238 in their last hospital visit?
41
Knowledge: 
42
- We can find the visiting information of patient 2238 in the admissions database.
43
- As tpn w/lipids is an item, we can find the corresponding information in the d_items database.
44
- As admissions only contains the visiting information of patients, we need to find the corresponding ICUSTAY_ID in the icustays database.
45
- We will check the inputevents_cv database to see if there is any record of tpn w/lipids given to patient 2238 in their last hospital visit. 
46
47
Question: what was the name of the procedure that was given two or more times to patient 58730?
48
Knowledge: 
49
- We can find the visiting information of patient 58730 in the admissions database.
50
- As procedures are stored in the procedures_icd database, we can find the corresponding ICD9_CODE in the procedures_icd database.
51
- As we only need to find the name of the procedure, we can find the corresponding SHORT_TITLE as the name in the d_icd_procedures database.
52
53
Question: {question}
54
Knowledge:
55
"""
56
57
SYSTEM_PROMPT = """You are a helpful AI assistant. Solve tasks using your coding and language skills.
58
In the following cases, suggest python code (in a python coding block) or shell script (in a sh
59
coding block) for the user to execute.
60
1. When you need to collect info, use the code to output the info you need, for example, browse or
61
search the web, download/read a file, print the content of a webpage or a file, get the current
62
date/time. After sufficient info is printed and the task is ready to be solved based on your
63
language skill, you can solve the task by yourself.
64
2. When you need to perform some task with code, use the code to perform the task and output the
65
result. Finish the task smartly.
66
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be
67
clear which step uses code, and which step uses your language skill.
68
When using code, you must indicate the script type in the code block. The user cannot provide any
69
other feedback or perform any other action beyond executing the code you suggest. The user can't
70
modify your code. So do not suggest incomplete code which requires users to modify. Don't use a
71
code block if it's not intended to be executed by the user.
72
If you want the user to save the code in a file before executing it, put # filename: <filename>
73
inside the code block as the first line. Don't include multiple code blocks in one response. Do not
74
ask users to copy and paste the result. Instead, use 'print' function for the output when relevant.
75
Check the execution result returned by the user.
76
If the result indicates there is an error, fix the error and output the code again. Suggest the
77
full code instead of partial code or code changes. If the error can't be fixed or if the task is
78
not solved even after the code is executed successfully, analyze the problem, revisit your
79
assumption, collect additional info you need, and think of a different approach to try.
80
When you find an answer, verify the answer carefully. Include verifiable evidence in your response
81
if possible.
82
Reply "TERMINATE" in the end when everything is done."""
83
84
EHRAgent_Message_Prompt = """Assume you have knowledge of several tables:
85
(1) Tables are linked by identifiers which usually have the suffix 'ID'. For example, SUBJECT_ID refers to a unique patient, HADM_ID refers to a unique admission to the hospital, and ICUSTAY_ID refers to a unique admission to an intensive care unit.
86
(2) Charted events such as notes, laboratory tests, and fluid balance are stored in a series of 'events' tables. For example the outputevents table contains all measurements related to output for a given patient, while the labevents table contains laboratory test results for a patient.
87
(3) Tables prefixed with 'd_' are dictionary tables and provide definitions for identifiers. For example, every row of chartevents is associated with a single ITEMID which represents the concept measured, but it does not contain the actual name of the measurement. By joining chartevents and d_items on ITEMID, it is possible to identify the concept represented by a given ITEMID.
88
(4) For the databases, four of them are used to define and track patient stays: admissions, patients, icustays, and transfers. Another four tables are dictionaries for cross-referencing codes against their respective definitions: d_icd_diagnoses, d_icd_procedures, d_items, and d_labitems. The remaining tables, including chartevents, cost, inputevents_cv, labevents, microbiologyevents, outputevents, prescriptions, procedures_icd, contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.
89
Write a python code to solve the given question. You can use the following functions:
90
(1) Calculate(FORMULA), which calculates the FORMULA and returns the result.
91
(2) LoadDB(DBNAME) which loads the database DBNAME and returns the database. The DBNAME can be one of the following: admissions, chartevents, cost, d_icd_diagnoses, d_icd_procedures, d_items, d_labitems, diagnoses_icd, icustays, inputevents_cv, labevents, microbiologyevents, outputevents,patients, prescriptions, procedures_icd, transfers.
92
(3) FilterDB(DATABASE, CONDITIONS), which filters the DATABASE according to the CONDITIONS and returns the filtered database. The CONDITIONS is a string composed of multiple conditions, each of which consists of the column_name, the relation and the value (e.g., COST<10). The CONDITIONS is one single string (e.g., "admissions, SUBJECT_ID=24971").
93
(4) GetValue(DATABASE, ARGUMENT), which returns a string containing all the values of the column in the DATABASE (if multiple values, separated by ", "). When there is no additional operations on the values, the ARGUMENT is the column_name in demand. If the values need to be returned with certain operations, the ARGUMENT is composed of the column_name and the operation (like COST, sum). Please do not contain " or ' in the argument.
94
(5) SQLInterpreter(SQL), which interprets the query SQL and returns the result.
95
(6) Calendar(DURATION), which returns the date after the duration of time.
96
Use the variable 'answer' to store the answer of the code. Here are some examples:
97
{examples}
98
(END OF EXAMPLES)
99
Knowledge:
100
{knowledge}
101
Question: {question}
102
Solution: """
103
104
DEFAULT_USER_PROXY_AGENT_DESCRIPTIONS = {
105
    "ALWAYS": "An attentive HUMAN user who can answer questions about the task, and can perform tasks such as running Python code or inputting command line commands at a Linux terminal and reporting back the execution results.",
106
    "TERMINATE": "A user that can run Python code or input command line commands at a Linux terminal and report back the execution results.",
107
    "NEVER": "A user that can run Python code or input command line commands at a Linux terminal and report back the execution results.",
108
}
109
110
CodeDebugger = """Given a question:
111
{question}
112
The user have written code with the following functions:
113
(1) Calculate(FORMULA), which calculates the FORMULA and returns the result.
114
(2) LoadDB(DBNAME) which loads the database DBNAME and returns the database. The DBNAME can be one of the following: admissions, chartevents, cost, d_icd_diagnoses, d_icd_procedures, d_items, d_labitems, diagnoses_icd, icustays, inputevents_cv, labevents, microbiologyevents, outputevents,patients, prescriptions, procedures_icd, transfers.
115
(3) FilterDB(DATABASE, CONDITIONS), which filters the DATABASE according to the CONDITIONS. The CONDITIONS is a string composed of multiple conditions, each of which consists of the column_name, the relation and the value (e.g., COST<10). The CONDITIONS is one single string (e.g., "admissions, SUBJECT_ID=24971").
116
(4) GetValue(DATABASE, ARGUMENT), which returns the values of the column in the DATABASE. When there is no additional operations on the values, the ARGUMENT is the column_name in demand. If the values need to be returned with certain operations, the ARGUMENT is composed of the column_name and the operation (like COST, sum). Please do not contain " or ' in the argument.
117
(5) SQLInterpreter(SQL), which interprets the query SQL and returns the result.
118
(6) Calendar(DURATION), which returns the date after the duration of time.
119
120
The code is as follows:
121
{code}
122
123
The execution result is:
124
{error_info}
125
126
Please check the code and point out the most possible reason to the error.
127
"""
128
129
EHRAgent_4Shots_Knowledge = """Question: What is the maximum total hospital cost that involves a diagnosis named comp-oth vasc dev/graft since 1 year ago?
130
Knowledge:
131
- As comp-oth vasc dev/graft is a diagnose, the corresponding ICD9_CODE can be found in the d_icd_diagnoses database.
132
- The ICD9_CODE can be used to find the corresponding HADM_ID in the diagnoses_icd database.
133
- The HADM_ID can be used to find the corresponding COST in the cost database.
134
Solution: date = Calendar('-1 year')
135
# As comp-oth vasc dev/graft is a diagnose, the corresponding ICD9_CODE can be found in the d_icd_diagnoses database.
136
diagnosis_db = LoadDB('d_icd_diagnoses')
137
filtered_diagnosis_db = FilterDB(diagnosis_db, 'SHORT_TITLE=comp-oth vasc dev/graft')
138
icd_code = GetValue(filtered_diagnosis_db, 'ICD9_CODE')
139
# The ICD9_CODE can be used to find the corresponding HADM_ID in the diagnoses_icd database.
140
diagnoses_icd_db = LoadDB('diagnoses_icd')
141
filtered_diagnoses_icd_db = FilterDB(diagnoses_icd_db, 'ICD9_CODE={}'.format(icd_code))
142
hadm_id_list = GetValue(filtered_diagnoses_icd_db, 'HADM_ID, list')
143
# The HADM_ID can be used to find the corresponding COST in the cost database.
144
max_cost = 0
145
for hadm_id in hadm_id_list:
146
    cost_db = LoadDB('cost')
147
    filtered_cost_db = FilterDB(cost_db, 'HADM_ID={}'.format(hadm_id))
148
    cost = GetValue(filtered_cost_db, 'COST, sum')
149
    if cost > max_cost:
150
        max_cost = cost
151
answer = max_cost
152
153
Question: had any tpn w/lipids been given to patient 2238 in their last hospital visit?
154
Knowledge:
155
- We can find the visiting information of patient 2238 in the admissions database.
156
- As tpn w/lipids is an item, we can find the corresponding information in the d_items database.
157
- As admissions only contains the visiting information of patients, we need to find the corresponding ICUSTAY_ID in the icustays database.
158
- We will check the inputevents_cv database to see if there is any record of tpn w/lipids given to patient 2238 in their last hospital visit. 
159
Solution: # We can find the visiting information of patient 2238 in the admissions database.
160
patient_db = LoadDB('admissions')
161
filtered_patient_db = FilterDB(patient_db, 'SUBJECT_ID=2238||min(DISCHTIME)')
162
hadm_id = GetValue(filtered_patient_db, 'HADM_ID')
163
# As tpn w/lipids is an item, we can find the corresponding information in the d_items database.
164
d_items_db = LoadDB('d_items')
165
filtered_d_items_db = FilterDB(d_items_db, 'LABEL=tpn w/lipids')
166
item_id = GetValue(filtered_d_items_db, 'ITEMID')
167
# As admissions only contains the visiting information of patients, we need to find the corresponding ICUSTAY_ID in the icustays database.
168
icustays_db = LoadDB('icustays')
169
filtered_icustays_db = FilterDB(icustays_db, 'HADM_ID={}'.format(hadm_id))
170
icustay_id = GetValue(filtered_icustays_db, 'ICUSTAY_ID')
171
# We will check the inputevents_cv database to see if there is any record of tpn w/lipids given to patient 2238 in their last hospital visit. 
172
inputevents_cv_db = LoadDB('inputevents_cv')
173
filtered_inputevents_cv_db = FilterDB(inputevents_cv_db, 'HADM_ID={}||ICUSTAY_ID={}||ITEMID={}'.format(hadm_id, icustay_id, item_id))
174
if len(filtered_inputevents_cv_db) > 0:
175
    answer = 1
176
else:
177
    answer = 0
178
179
Question: what was the name of the procedure that was given two or more times to patient 58730?
180
Knowledge:
181
- We can find the visiting information of patient 58730 in the admissions database.
182
- As procedures are stored in the procedures_icd database, we can find the corresponding ICD9_CODE in the procedures_icd database.
183
- As we only need to find the name of the procedure, we can find the corresponding SHORT_TITLE as the name in the d_icd_procedures database.
184
Solution: answer = SQLInterpreter('select d_icd_procedures.short_title from d_icd_procedures where d_icd_procedures.icd9_code in ( select t1.icd9_code from ( select procedures_icd.icd9_code, count( procedures_icd.charttime ) as c1 from procedures_icd where procedures_icd.hadm_id in ( select admissions.hadm_id from admissions where admissions.subject_id = 58730 ) group by procedures_icd.icd9_code ) as t1 where t1.c1 >= 2 )')
185
186
Question: calculate the length of stay of the first stay of patient 27392 in the icu.
187
Knowledge:
188
- We can find the visiting information of patient 27392 in the admissions database.
189
- As we only need to find the length of stay, we can find the corresponding INTIME and OUTTIME in the icustays database.
190
Solution: from datetime import datetime
191
patient_db = LoadDB('admissions')
192
filtered_patient_db = FilterDB(patient_db, 'SUBJECT_ID=27392||min(ADMITTIME)')
193
hadm_id = GetValue(filtered_patient_db, 'HADM_ID')
194
icustays_db = LoadDB('icustays')
195
filtered_icustays_db = FilterDB(icustays_db, 'HADM_ID={}'.format(hadm_id))
196
intime = GetValue(filtered_icustays_db, 'INTIME')
197
outtime = GetValue(filtered_icustays_db, 'OUTTIME')
198
intime = datetime.strptime(intime, '%Y-%m-%d %H:%M:%S')
199
outtime = datetime.strptime(outtime, '%Y-%m-%d %H:%M:%S')
200
length_of_stay = outtime - intime
201
if length_of_stay.seconds // 3600 > 12:
202
    answer = length_of_stay.days + 1
203
else:
204
    answer = length_of_stay.days
205
"""