Switch to unified view

a/README.md b/README.md
1
# AItrika
1
# AItrika
2
2
3
![AItrika](images/logo.png)
4
5
[![License](https://img.shields.io/badge/License-Apache%202.0-orange.svg)](https://opensource.org/licenses/Apache-2.0)
3
[![License](https://img.shields.io/badge/License-Apache%202.0-orange.svg)](https://opensource.org/licenses/Apache-2.0)
6
![GitHub forks](https://img.shields.io/github/forks/dSupertramp/AItrika)
4
![GitHub forks](https://img.shields.io/github/forks/dSupertramp/AItrika)
7
![GitHub commit activity (branch)](https://img.shields.io/github/commit-activity/t/dSupertramp/AItrika/main)
5
![GitHub commit activity (branch)](https://img.shields.io/github/commit-activity/t/dSupertramp/AItrika/main)
8
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/dSupertramp/AItrika/main)
6
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/dSupertramp/AItrika/main)
9
7
10
![Static Badge](https://img.shields.io/badge/medical-content?logo=syringe&logoColor=cyan&color=cyan)
8
![Static Badge](https://img.shields.io/badge/medical-content?logo=syringe&logoColor=cyan&color=cyan)
11
9
12
Enhance your knowledge in medical research.
10
Enhance your knowledge in medical research.
13
11
14
AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way:
12
AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way:
15
13
16
- Abstract
14
- Abstract
17
- Full text (when available)
15
- Full text (when available)
18
- Genes
16
- Genes
19
- Diseases
17
- Diseases
20
- Mutations
18
- Mutations
21
- Associations between genes and diseases
19
- Associations between genes and diseases
22
- MeSH terms
20
- MeSH terms
23
- Other terms
21
- Other terms
24
- Results
22
- Results
25
- Bibliography
23
- Bibliography
26
24
27
And so on!
25
And so on!
28
26
29
## 🚀 Run the demo app
27
## 🚀 Run the demo app
30
28
31
You can try AItrika with the Streamlit app by running:
29
You can try AItrika with the Streamlit app by running:
32
30
33
```
31
```
34
streamlit run app.py
32
streamlit run app.py
35
```
33
```
36
34
37
Or you can use it a script by running:
35
Or you can use it a script by running:
38
36
39
```
37
```
40
python main.py
38
python main.py
41
```
39
```
42
40
43
## 📦 Install
41
## 📦 Install
44
42
45
To install everything, you need `uv`.
43
To install everything, you need `uv`.
46
44
47
First of all, install `uv` with the command:
45
First of all, install `uv` with the command:
48
46
49
```
47
```
50
python main.py
48
python main.py
51
```
49
```
52
50
53
After that, create a virtual environment with the command:
51
After that, create a virtual environment with the command:
54
52
55
```
53
```
56
uv venv venv_name
54
uv venv venv_name
57
```
55
```
58
56
59
Activate the virtual env:
57
Activate the virtual env:
60
58
61
```
59
```
62
source venv_name/bin/activate
60
source venv_name/bin/activate
63
```
61
```
64
62
65
And install dependencies:
63
And install dependencies:
66
64
67
```
65
```
68
uv pip install -r requirements.in
66
uv pip install -r requirements.in
69
```
67
```
70
68
71
## 🔑 Set LLM API Keys
69
## 🔑 Set LLM API Keys
72
70
73
In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`.
71
In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`.
74
72
75
## 🔍 Usage
73
## 🔍 Usage
76
74
77
You can easily get informations of a paper by passing a PubMed ID:
75
You can easily get informations of a paper by passing a PubMed ID:
78
76
79
```python
77
```python
80
from aitrika.engine.aitrika import OnlineAItrika
78
from aitrika.engine.aitrika import OnlineAItrika
81
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id)
79
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id)
82
title = aitrika_engine.get_title()
80
title = aitrika_engine.get_title()
83
print(title)
81
print(title)
84
```
82
```
85
83
86
Or you can parse a local pdf:
84
Or you can parse a local pdf:
87
85
88
```python
86
```python
89
from aitrika.engine.aitrika import LocalAItrika
87
from aitrika.engine.aitrika import LocalAItrika
90
aitrika_engine = LocalAItrika(pdf_path = pdf_path)
88
aitrika_engine = LocalAItrika(pdf_path = pdf_path)
91
title = aitrika_engine.get_title()
89
title = aitrika_engine.get_title()
92
print(title)
90
print(title)
93
```
91
```
94
92
95
```
93
```
96
Breast cancer genes: beyond BRCA1 and BRCA2.
94
Breast cancer genes: beyond BRCA1 and BRCA2.
97
```
95
```
98
96
99
You can get other informations, like the associations between genes and diseases:
97
You can get other informations, like the associations between genes and diseases:
100
98
101
```python
99
```python
102
associations = aitrika_engine.get_associations()
100
associations = aitrika_engine.get_associations()
103
```
101
```
104
102
105
```
103
```
106
[
104
[
107
  {
105
  {
108
    "gene": "BRIP1",
106
    "gene": "BRIP1",
109
    "disease": "Breast Neoplasms"
107
    "disease": "Breast Neoplasms"
110
  },
108
  },
111
  {
109
  {
112
    "gene": "PTEN",
110
    "gene": "PTEN",
113
    "disease": "Breast Neoplasms"
111
    "disease": "Breast Neoplasms"
114
  },
112
  },
115
  {
113
  {
116
    "gene": "CHEK2",
114
    "gene": "CHEK2",
117
    "disease": "Breast Neoplasms"
115
    "disease": "Breast Neoplasms"
118
  },
116
  },
119
]
117
]
120
...
118
...
121
```
119
```
122
120
123
Or you can get a nice formatted DataFrame:
121
Or you can get a nice formatted DataFrame:
124
122
125
```python
123
```python
126
associations = aitrika_engine.associations(dataframe = True)
124
associations = aitrika_engine.associations(dataframe = True)
127
```
125
```
128
126
129
```
127
```
130
      gene                          disease
128
      gene                          disease
131
0    BRIP1                 Breast Neoplasms
129
0    BRIP1                 Breast Neoplasms
132
1     PTEN                 Breast Neoplasms
130
1     PTEN                 Breast Neoplasms
133
2    CHEK2                 Breast Neoplasms
131
2    CHEK2                 Breast Neoplasms
134
...
132
...
135
```
133
```
136
134
137
With the power of RAG, you can query your document:
135
With the power of RAG, you can query your document:
138
136
139
```python
137
```python
140
## Prepare the documents
138
## Prepare the documents
141
documents = generate_documents(content=abstract)
139
documents = generate_documents(content=abstract)
142
140
143
## Set the LLM
141
## Set the LLM
144
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY"))
142
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY"))
145
143
146
## Query your document
144
## Query your document
147
query = "Is BRCA1 associated with breast cancer?"
145
query = "Is BRCA1 associated with breast cancer?"
148
print(llm.query(query=query))
146
print(llm.query(query=query))
149
```
147
```
150
148
151
```
149
```
152
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer.
150
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer.
153
```
151
```
154
152
155
Or you can extract other informations:
153
Or you can extract other informations:
156
154
157
```python
155
```python
158
results = engine.extract_results(llm=llm)
156
results = engine.extract_results(llm=llm)
159
print(results)
157
print(results)
160
```
158
```
161
159
162
```
160
```
163
** RESULTS **
161
** RESULTS **
164
162
165
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes
163
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes
166
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk
164
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk
167
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC
165
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC
168
- Current clinical practice - high-penetrance genes - widely used
166
- Current clinical practice - high-penetrance genes - widely used
169
- Future prospect - all familial breast cancer genes - to be included in genetic test
167
- Future prospect - all familial breast cancer genes - to be included in genetic test
170
- Research need - clinical management - of moderate and low-risk variants
168
- Research need - clinical management - of moderate and low-risk variants
171
```
169
```
172
170
173
## 🚀 Run the API
171
## 🚀 Run the API
174
172
175
To run the AItrika API, follow these steps:
173
To run the AItrika API, follow these steps:
176
174
177
1. Ensure you have set up your environment and installed all dependencies as described in the Installation section.
175
1. Ensure you have set up your environment and installed all dependencies as described in the Installation section.
178
176
179
2. Run the API server using the following command:
177
2. Run the API server using the following command:
180
178
181
```bash
179
```bash
182
python api.py
180
python api.py
183
```
181
```
184
182
185
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints:
183
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints:
186
184
187
- /associations: Get associations from a PubMed article
185
- /associations: Get associations from a PubMed article
188
- /abstract: Get abstract of a PubMed article
186
- /abstract: Get abstract of a PubMed article
189
- /query: Query a PubMed article
187
- /query: Query a PubMed article
190
- /results: Get results from a PubMed article
188
- /results: Get results from a PubMed article
191
- /participants: Get number of participants from a PubMed article
189
- /participants: Get number of participants from a PubMed article
192
- /outcomes: Get outcomes from a PubMed article
190
- /outcomes: Get outcomes from a PubMed article
193
191
194
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example:
192
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example:
195
193
196
```bash
194
```bash
197
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}'
195
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}'
198
```
196
```
199
197
200
The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>.
198
The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>.
201
You can use this file with tools like Swagger UI for a more interactive API exploration experience.
199
You can use this file with tools like Swagger UI for a more interactive API exploration experience.
202
200
203
## Support the Project
201
## Support the Project
204
202
205
If you find this project useful, please consider supporting it:
203
If you find this project useful, please consider supporting it:
206
204
207
- 🌟 Star the project on GitHub
205
- 🌟 Star the project on GitHub
208
- 🐛 Report bugs or suggest new features
206
- 🐛 Report bugs or suggest new features
209
- 🤝 Contribute with pull requests
207
- 🤝 Contribute with pull requests
210
- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor.
208
- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor.
211
209
212
### Commercial / Business use
210
### Commercial / Business use
213
211
214
If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com).
212
If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com).
215
213
216
I'm available for consulting, custom development, or commercial licensing.
214
I'm available for consulting, custom development, or commercial licensing.
217
215
218
Your support helps keep this project active and continuously improving. Thank you!
216
Your support helps keep this project active and continuously improving. Thank you!
219
217
220
## License
218
## License
221
219
222
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details.
220
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details.
223
221
224
## Star History
222
## Star History
225
223
226
[![Star History Chart](https://api.star-history.com/svg?repos=dSupertramp/AItrika&type=Date)](https://star-history.com/#dSupertramp/AItrika&Date)
224
[![Star History Chart](https://api.star-history.com/svg?repos=dSupertramp/AItrika&type=Date)](https://star-history.com/#dSupertramp/AItrika&Date)