|
a |
|
b/README.md |
|
|
1 |
# AItrika |
|
|
2 |
|
|
|
3 |
 |
|
|
4 |
|
|
|
5 |
[](https://opensource.org/licenses/Apache-2.0) |
|
|
6 |
 |
|
|
7 |
 |
|
|
8 |
 |
|
|
9 |
|
|
|
10 |
 |
|
|
11 |
|
|
|
12 |
Enhance your knowledge in medical research. |
|
|
13 |
|
|
|
14 |
AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way: |
|
|
15 |
|
|
|
16 |
- Abstract |
|
|
17 |
- Full text (when available) |
|
|
18 |
- Genes |
|
|
19 |
- Diseases |
|
|
20 |
- Mutations |
|
|
21 |
- Associations between genes and diseases |
|
|
22 |
- MeSH terms |
|
|
23 |
- Other terms |
|
|
24 |
- Results |
|
|
25 |
- Bibliography |
|
|
26 |
|
|
|
27 |
And so on! |
|
|
28 |
|
|
|
29 |
## 🚀 Run the demo app |
|
|
30 |
|
|
|
31 |
You can try AItrika with the Streamlit app by running: |
|
|
32 |
|
|
|
33 |
``` |
|
|
34 |
streamlit run app.py |
|
|
35 |
``` |
|
|
36 |
|
|
|
37 |
Or you can use it a script by running: |
|
|
38 |
|
|
|
39 |
``` |
|
|
40 |
python main.py |
|
|
41 |
``` |
|
|
42 |
|
|
|
43 |
## 📦 Install |
|
|
44 |
|
|
|
45 |
To install everything, you need `uv`. |
|
|
46 |
|
|
|
47 |
First of all, install `uv` with the command: |
|
|
48 |
|
|
|
49 |
``` |
|
|
50 |
python main.py |
|
|
51 |
``` |
|
|
52 |
|
|
|
53 |
After that, create a virtual environment with the command: |
|
|
54 |
|
|
|
55 |
``` |
|
|
56 |
uv venv venv_name |
|
|
57 |
``` |
|
|
58 |
|
|
|
59 |
Activate the virtual env: |
|
|
60 |
|
|
|
61 |
``` |
|
|
62 |
source venv_name/bin/activate |
|
|
63 |
``` |
|
|
64 |
|
|
|
65 |
And install dependencies: |
|
|
66 |
|
|
|
67 |
``` |
|
|
68 |
uv pip install -r requirements.in |
|
|
69 |
``` |
|
|
70 |
|
|
|
71 |
## 🔑 Set LLM API Keys |
|
|
72 |
|
|
|
73 |
In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`. |
|
|
74 |
|
|
|
75 |
## 🔍 Usage |
|
|
76 |
|
|
|
77 |
You can easily get informations of a paper by passing a PubMed ID: |
|
|
78 |
|
|
|
79 |
```python |
|
|
80 |
from aitrika.engine.aitrika import OnlineAItrika |
|
|
81 |
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id) |
|
|
82 |
title = aitrika_engine.get_title() |
|
|
83 |
print(title) |
|
|
84 |
``` |
|
|
85 |
|
|
|
86 |
Or you can parse a local pdf: |
|
|
87 |
|
|
|
88 |
```python |
|
|
89 |
from aitrika.engine.aitrika import LocalAItrika |
|
|
90 |
aitrika_engine = LocalAItrika(pdf_path = pdf_path) |
|
|
91 |
title = aitrika_engine.get_title() |
|
|
92 |
print(title) |
|
|
93 |
``` |
|
|
94 |
|
|
|
95 |
``` |
|
|
96 |
Breast cancer genes: beyond BRCA1 and BRCA2. |
|
|
97 |
``` |
|
|
98 |
|
|
|
99 |
You can get other informations, like the associations between genes and diseases: |
|
|
100 |
|
|
|
101 |
```python |
|
|
102 |
associations = aitrika_engine.get_associations() |
|
|
103 |
``` |
|
|
104 |
|
|
|
105 |
``` |
|
|
106 |
[ |
|
|
107 |
{ |
|
|
108 |
"gene": "BRIP1", |
|
|
109 |
"disease": "Breast Neoplasms" |
|
|
110 |
}, |
|
|
111 |
{ |
|
|
112 |
"gene": "PTEN", |
|
|
113 |
"disease": "Breast Neoplasms" |
|
|
114 |
}, |
|
|
115 |
{ |
|
|
116 |
"gene": "CHEK2", |
|
|
117 |
"disease": "Breast Neoplasms" |
|
|
118 |
}, |
|
|
119 |
] |
|
|
120 |
... |
|
|
121 |
``` |
|
|
122 |
|
|
|
123 |
Or you can get a nice formatted DataFrame: |
|
|
124 |
|
|
|
125 |
```python |
|
|
126 |
associations = aitrika_engine.associations(dataframe = True) |
|
|
127 |
``` |
|
|
128 |
|
|
|
129 |
``` |
|
|
130 |
gene disease |
|
|
131 |
0 BRIP1 Breast Neoplasms |
|
|
132 |
1 PTEN Breast Neoplasms |
|
|
133 |
2 CHEK2 Breast Neoplasms |
|
|
134 |
... |
|
|
135 |
``` |
|
|
136 |
|
|
|
137 |
With the power of RAG, you can query your document: |
|
|
138 |
|
|
|
139 |
```python |
|
|
140 |
## Prepare the documents |
|
|
141 |
documents = generate_documents(content=abstract) |
|
|
142 |
|
|
|
143 |
## Set the LLM |
|
|
144 |
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY")) |
|
|
145 |
|
|
|
146 |
## Query your document |
|
|
147 |
query = "Is BRCA1 associated with breast cancer?" |
|
|
148 |
print(llm.query(query=query)) |
|
|
149 |
``` |
|
|
150 |
|
|
|
151 |
``` |
|
|
152 |
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer. |
|
|
153 |
``` |
|
|
154 |
|
|
|
155 |
Or you can extract other informations: |
|
|
156 |
|
|
|
157 |
```python |
|
|
158 |
results = engine.extract_results(llm=llm) |
|
|
159 |
print(results) |
|
|
160 |
``` |
|
|
161 |
|
|
|
162 |
``` |
|
|
163 |
** RESULTS ** |
|
|
164 |
|
|
|
165 |
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes |
|
|
166 |
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk |
|
|
167 |
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC |
|
|
168 |
- Current clinical practice - high-penetrance genes - widely used |
|
|
169 |
- Future prospect - all familial breast cancer genes - to be included in genetic test |
|
|
170 |
- Research need - clinical management - of moderate and low-risk variants |
|
|
171 |
``` |
|
|
172 |
|
|
|
173 |
## 🚀 Run the API |
|
|
174 |
|
|
|
175 |
To run the AItrika API, follow these steps: |
|
|
176 |
|
|
|
177 |
1. Ensure you have set up your environment and installed all dependencies as described in the Installation section. |
|
|
178 |
|
|
|
179 |
2. Run the API server using the following command: |
|
|
180 |
|
|
|
181 |
```bash |
|
|
182 |
python api.py |
|
|
183 |
``` |
|
|
184 |
|
|
|
185 |
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints: |
|
|
186 |
|
|
|
187 |
- /associations: Get associations from a PubMed article |
|
|
188 |
- /abstract: Get abstract of a PubMed article |
|
|
189 |
- /query: Query a PubMed article |
|
|
190 |
- /results: Get results from a PubMed article |
|
|
191 |
- /participants: Get number of participants from a PubMed article |
|
|
192 |
- /outcomes: Get outcomes from a PubMed article |
|
|
193 |
|
|
|
194 |
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example: |
|
|
195 |
|
|
|
196 |
```bash |
|
|
197 |
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}' |
|
|
198 |
``` |
|
|
199 |
|
|
|
200 |
The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>. |
|
|
201 |
You can use this file with tools like Swagger UI for a more interactive API exploration experience. |
|
|
202 |
|
|
|
203 |
## Support the Project |
|
|
204 |
|
|
|
205 |
If you find this project useful, please consider supporting it: |
|
|
206 |
|
|
|
207 |
- 🌟 Star the project on GitHub |
|
|
208 |
- 🐛 Report bugs or suggest new features |
|
|
209 |
- 🤝 Contribute with pull requests |
|
|
210 |
- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor. |
|
|
211 |
|
|
|
212 |
### Commercial / Business use |
|
|
213 |
|
|
|
214 |
If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com). |
|
|
215 |
|
|
|
216 |
I'm available for consulting, custom development, or commercial licensing. |
|
|
217 |
|
|
|
218 |
Your support helps keep this project active and continuously improving. Thank you! |
|
|
219 |
|
|
|
220 |
## License |
|
|
221 |
|
|
|
222 |
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details. |
|
|
223 |
|
|
|
224 |
## Star History |
|
|
225 |
|
|
|
226 |
[](https://star-history.com/#dSupertramp/AItrika&Date) |