|
a/README.md |
|
b/README.md |
1 |
# AItrika |
1 |
# AItrika |
2 |
|
2 |
|
3 |
 |
|
|
4 |
|
|
|
5 |
[](https://opensource.org/licenses/Apache-2.0) |
3 |
[](https://opensource.org/licenses/Apache-2.0)
|
6 |
 |
4 |

|
7 |
 |
5 |

|
8 |
 |
6 |
 |
9 |
|
7 |
|
10 |
 |
8 |
 |
11 |
|
9 |
|
12 |
Enhance your knowledge in medical research. |
10 |
Enhance your knowledge in medical research. |
13 |
|
11 |
|
14 |
AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way: |
12 |
AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way: |
15 |
|
13 |
|
16 |
- Abstract |
14 |
- Abstract
|
17 |
- Full text (when available) |
15 |
- Full text (when available)
|
18 |
- Genes |
16 |
- Genes
|
19 |
- Diseases |
17 |
- Diseases
|
20 |
- Mutations |
18 |
- Mutations
|
21 |
- Associations between genes and diseases |
19 |
- Associations between genes and diseases
|
22 |
- MeSH terms |
20 |
- MeSH terms
|
23 |
- Other terms |
21 |
- Other terms
|
24 |
- Results |
22 |
- Results
|
25 |
- Bibliography |
23 |
- Bibliography |
26 |
|
24 |
|
27 |
And so on! |
25 |
And so on! |
28 |
|
26 |
|
29 |
## 🚀 Run the demo app |
27 |
## 🚀 Run the demo app |
30 |
|
28 |
|
31 |
You can try AItrika with the Streamlit app by running: |
29 |
You can try AItrika with the Streamlit app by running: |
32 |
|
30 |
|
33 |
``` |
31 |
```
|
34 |
streamlit run app.py |
32 |
streamlit run app.py
|
35 |
``` |
33 |
``` |
36 |
|
34 |
|
37 |
Or you can use it a script by running: |
35 |
Or you can use it a script by running: |
38 |
|
36 |
|
39 |
``` |
37 |
```
|
40 |
python main.py |
38 |
python main.py
|
41 |
``` |
39 |
``` |
42 |
|
40 |
|
43 |
## 📦 Install |
41 |
## 📦 Install |
44 |
|
42 |
|
45 |
To install everything, you need `uv`. |
43 |
To install everything, you need `uv`. |
46 |
|
44 |
|
47 |
First of all, install `uv` with the command: |
45 |
First of all, install `uv` with the command: |
48 |
|
46 |
|
49 |
``` |
47 |
```
|
50 |
python main.py |
48 |
python main.py
|
51 |
``` |
49 |
``` |
52 |
|
50 |
|
53 |
After that, create a virtual environment with the command: |
51 |
After that, create a virtual environment with the command: |
54 |
|
52 |
|
55 |
``` |
53 |
```
|
56 |
uv venv venv_name |
54 |
uv venv venv_name
|
57 |
``` |
55 |
``` |
58 |
|
56 |
|
59 |
Activate the virtual env: |
57 |
Activate the virtual env: |
60 |
|
58 |
|
61 |
``` |
59 |
```
|
62 |
source venv_name/bin/activate |
60 |
source venv_name/bin/activate
|
63 |
``` |
61 |
``` |
64 |
|
62 |
|
65 |
And install dependencies: |
63 |
And install dependencies: |
66 |
|
64 |
|
67 |
``` |
65 |
```
|
68 |
uv pip install -r requirements.in |
66 |
uv pip install -r requirements.in
|
69 |
``` |
67 |
``` |
70 |
|
68 |
|
71 |
## 🔑 Set LLM API Keys |
69 |
## 🔑 Set LLM API Keys |
72 |
|
70 |
|
73 |
In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`. |
71 |
In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`. |
74 |
|
72 |
|
75 |
## 🔍 Usage |
73 |
## 🔍 Usage |
76 |
|
74 |
|
77 |
You can easily get informations of a paper by passing a PubMed ID: |
75 |
You can easily get informations of a paper by passing a PubMed ID: |
78 |
|
76 |
|
79 |
```python |
77 |
```python
|
80 |
from aitrika.engine.aitrika import OnlineAItrika |
78 |
from aitrika.engine.aitrika import OnlineAItrika
|
81 |
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id) |
79 |
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id)
|
82 |
title = aitrika_engine.get_title() |
80 |
title = aitrika_engine.get_title()
|
83 |
print(title) |
81 |
print(title)
|
84 |
``` |
82 |
``` |
85 |
|
83 |
|
86 |
Or you can parse a local pdf: |
84 |
Or you can parse a local pdf: |
87 |
|
85 |
|
88 |
```python |
86 |
```python
|
89 |
from aitrika.engine.aitrika import LocalAItrika |
87 |
from aitrika.engine.aitrika import LocalAItrika
|
90 |
aitrika_engine = LocalAItrika(pdf_path = pdf_path) |
88 |
aitrika_engine = LocalAItrika(pdf_path = pdf_path)
|
91 |
title = aitrika_engine.get_title() |
89 |
title = aitrika_engine.get_title()
|
92 |
print(title) |
90 |
print(title)
|
93 |
``` |
91 |
``` |
94 |
|
92 |
|
95 |
``` |
93 |
```
|
96 |
Breast cancer genes: beyond BRCA1 and BRCA2. |
94 |
Breast cancer genes: beyond BRCA1 and BRCA2.
|
97 |
``` |
95 |
``` |
98 |
|
96 |
|
99 |
You can get other informations, like the associations between genes and diseases: |
97 |
You can get other informations, like the associations between genes and diseases: |
100 |
|
98 |
|
101 |
```python |
99 |
```python
|
102 |
associations = aitrika_engine.get_associations() |
100 |
associations = aitrika_engine.get_associations()
|
103 |
``` |
101 |
``` |
104 |
|
102 |
|
105 |
``` |
103 |
```
|
106 |
[ |
104 |
[
|
107 |
{ |
105 |
{
|
108 |
"gene": "BRIP1", |
106 |
"gene": "BRIP1",
|
109 |
"disease": "Breast Neoplasms" |
107 |
"disease": "Breast Neoplasms"
|
110 |
}, |
108 |
},
|
111 |
{ |
109 |
{
|
112 |
"gene": "PTEN", |
110 |
"gene": "PTEN",
|
113 |
"disease": "Breast Neoplasms" |
111 |
"disease": "Breast Neoplasms"
|
114 |
}, |
112 |
},
|
115 |
{ |
113 |
{
|
116 |
"gene": "CHEK2", |
114 |
"gene": "CHEK2",
|
117 |
"disease": "Breast Neoplasms" |
115 |
"disease": "Breast Neoplasms"
|
118 |
}, |
116 |
},
|
119 |
] |
117 |
]
|
120 |
... |
118 |
...
|
121 |
``` |
119 |
``` |
122 |
|
120 |
|
123 |
Or you can get a nice formatted DataFrame: |
121 |
Or you can get a nice formatted DataFrame: |
124 |
|
122 |
|
125 |
```python |
123 |
```python
|
126 |
associations = aitrika_engine.associations(dataframe = True) |
124 |
associations = aitrika_engine.associations(dataframe = True)
|
127 |
``` |
125 |
``` |
128 |
|
126 |
|
129 |
``` |
127 |
```
|
130 |
gene disease |
128 |
gene disease
|
131 |
0 BRIP1 Breast Neoplasms |
129 |
0 BRIP1 Breast Neoplasms
|
132 |
1 PTEN Breast Neoplasms |
130 |
1 PTEN Breast Neoplasms
|
133 |
2 CHEK2 Breast Neoplasms |
131 |
2 CHEK2 Breast Neoplasms
|
134 |
... |
132 |
...
|
135 |
``` |
133 |
``` |
136 |
|
134 |
|
137 |
With the power of RAG, you can query your document: |
135 |
With the power of RAG, you can query your document: |
138 |
|
136 |
|
139 |
```python |
137 |
```python
|
140 |
## Prepare the documents |
138 |
## Prepare the documents
|
141 |
documents = generate_documents(content=abstract) |
139 |
documents = generate_documents(content=abstract) |
142 |
|
140 |
|
143 |
## Set the LLM |
141 |
## Set the LLM
|
144 |
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY")) |
142 |
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY")) |
145 |
|
143 |
|
146 |
## Query your document |
144 |
## Query your document
|
147 |
query = "Is BRCA1 associated with breast cancer?" |
145 |
query = "Is BRCA1 associated with breast cancer?"
|
148 |
print(llm.query(query=query)) |
146 |
print(llm.query(query=query))
|
149 |
``` |
147 |
``` |
150 |
|
148 |
|
151 |
``` |
149 |
```
|
152 |
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer. |
150 |
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer.
|
153 |
``` |
151 |
``` |
154 |
|
152 |
|
155 |
Or you can extract other informations: |
153 |
Or you can extract other informations: |
156 |
|
154 |
|
157 |
```python |
155 |
```python
|
158 |
results = engine.extract_results(llm=llm) |
156 |
results = engine.extract_results(llm=llm)
|
159 |
print(results) |
157 |
print(results)
|
160 |
``` |
158 |
``` |
161 |
|
159 |
|
162 |
``` |
160 |
```
|
163 |
** RESULTS ** |
161 |
** RESULTS ** |
164 |
|
162 |
|
165 |
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes |
163 |
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes
|
166 |
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk |
164 |
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk
|
167 |
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC |
165 |
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC
|
168 |
- Current clinical practice - high-penetrance genes - widely used |
166 |
- Current clinical practice - high-penetrance genes - widely used
|
169 |
- Future prospect - all familial breast cancer genes - to be included in genetic test |
167 |
- Future prospect - all familial breast cancer genes - to be included in genetic test
|
170 |
- Research need - clinical management - of moderate and low-risk variants |
168 |
- Research need - clinical management - of moderate and low-risk variants
|
171 |
``` |
169 |
``` |
172 |
|
170 |
|
173 |
## 🚀 Run the API |
171 |
## 🚀 Run the API |
174 |
|
172 |
|
175 |
To run the AItrika API, follow these steps: |
173 |
To run the AItrika API, follow these steps: |
176 |
|
174 |
|
177 |
1. Ensure you have set up your environment and installed all dependencies as described in the Installation section. |
175 |
1. Ensure you have set up your environment and installed all dependencies as described in the Installation section. |
178 |
|
176 |
|
179 |
2. Run the API server using the following command: |
177 |
2. Run the API server using the following command: |
180 |
|
178 |
|
181 |
```bash |
179 |
```bash
|
182 |
python api.py |
180 |
python api.py
|
183 |
``` |
181 |
``` |
184 |
|
182 |
|
185 |
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints: |
183 |
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints: |
186 |
|
184 |
|
187 |
- /associations: Get associations from a PubMed article |
185 |
- /associations: Get associations from a PubMed article
|
188 |
- /abstract: Get abstract of a PubMed article |
186 |
- /abstract: Get abstract of a PubMed article
|
189 |
- /query: Query a PubMed article |
187 |
- /query: Query a PubMed article
|
190 |
- /results: Get results from a PubMed article |
188 |
- /results: Get results from a PubMed article
|
191 |
- /participants: Get number of participants from a PubMed article |
189 |
- /participants: Get number of participants from a PubMed article
|
192 |
- /outcomes: Get outcomes from a PubMed article |
190 |
- /outcomes: Get outcomes from a PubMed article |
193 |
|
191 |
|
194 |
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example: |
192 |
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example: |
195 |
|
193 |
|
196 |
```bash |
194 |
```bash
|
197 |
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}' |
195 |
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}'
|
198 |
``` |
196 |
``` |
199 |
|
197 |
|
200 |
The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>. |
198 |
The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>.
|
201 |
You can use this file with tools like Swagger UI for a more interactive API exploration experience. |
199 |
You can use this file with tools like Swagger UI for a more interactive API exploration experience. |
202 |
|
200 |
|
203 |
## Support the Project |
201 |
## Support the Project |
204 |
|
202 |
|
205 |
If you find this project useful, please consider supporting it: |
203 |
If you find this project useful, please consider supporting it: |
206 |
|
204 |
|
207 |
- 🌟 Star the project on GitHub |
205 |
- 🌟 Star the project on GitHub
|
208 |
- 🐛 Report bugs or suggest new features |
206 |
- 🐛 Report bugs or suggest new features
|
209 |
- 🤝 Contribute with pull requests |
207 |
- 🤝 Contribute with pull requests
|
210 |
- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor. |
208 |
- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor. |
211 |
|
209 |
|
212 |
### Commercial / Business use |
210 |
### Commercial / Business use |
213 |
|
211 |
|
214 |
If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com). |
212 |
If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com). |
215 |
|
213 |
|
216 |
I'm available for consulting, custom development, or commercial licensing. |
214 |
I'm available for consulting, custom development, or commercial licensing. |
217 |
|
215 |
|
218 |
Your support helps keep this project active and continuously improving. Thank you! |
216 |
Your support helps keep this project active and continuously improving. Thank you! |
219 |
|
217 |
|
220 |
## License |
218 |
## License |
221 |
|
219 |
|
222 |
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details. |
220 |
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details. |
223 |
|
221 |
|
224 |
## Star History |
222 |
## Star History |
225 |
|
223 |
|
226 |
[](https://star-history.com/#dSupertramp/AItrika&Date) |
224 |
[](https://star-history.com/#dSupertramp/AItrika&Date)
|