--- a +++ b/README.md @@ -0,0 +1,226 @@ +# AItrika + + + +[](https://opensource.org/licenses/Apache-2.0) + + + + + + +Enhance your knowledge in medical research. + +AItrika (formerly **PubGPT**) is a tool that can extract lots of relevant informations inside medical papers in an easy way: + +- Abstract +- Full text (when available) +- Genes +- Diseases +- Mutations +- Associations between genes and diseases +- MeSH terms +- Other terms +- Results +- Bibliography + +And so on! + +## 🚀 Run the demo app + +You can try AItrika with the Streamlit app by running: + +``` +streamlit run app.py +``` + +Or you can use it a script by running: + +``` +python main.py +``` + +## 📦 Install + +To install everything, you need `uv`. + +First of all, install `uv` with the command: + +``` +python main.py +``` + +After that, create a virtual environment with the command: + +``` +uv venv venv_name +``` + +Activate the virtual env: + +``` +source venv_name/bin/activate +``` + +And install dependencies: + +``` +uv pip install -r requirements.in +``` + +## 🔑 Set LLM API Keys + +In order to set API keys, insert your keys into the `env.example` file and rename it to `.env`. + +## 🔍 Usage + +You can easily get informations of a paper by passing a PubMed ID: + +```python +from aitrika.engine.aitrika import OnlineAItrika +aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id) +title = aitrika_engine.get_title() +print(title) +``` + +Or you can parse a local pdf: + +```python +from aitrika.engine.aitrika import LocalAItrika +aitrika_engine = LocalAItrika(pdf_path = pdf_path) +title = aitrika_engine.get_title() +print(title) +``` + +``` +Breast cancer genes: beyond BRCA1 and BRCA2. +``` + +You can get other informations, like the associations between genes and diseases: + +```python +associations = aitrika_engine.get_associations() +``` + +``` +[ + { + "gene": "BRIP1", + "disease": "Breast Neoplasms" + }, + { + "gene": "PTEN", + "disease": "Breast Neoplasms" + }, + { + "gene": "CHEK2", + "disease": "Breast Neoplasms" + }, +] +... +``` + +Or you can get a nice formatted DataFrame: + +```python +associations = aitrika_engine.associations(dataframe = True) +``` + +``` + gene disease +0 BRIP1 Breast Neoplasms +1 PTEN Breast Neoplasms +2 CHEK2 Breast Neoplasms +... +``` + +With the power of RAG, you can query your document: + +```python +## Prepare the documents +documents = generate_documents(content=abstract) + +## Set the LLM +llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY")) + +## Query your document +query = "Is BRCA1 associated with breast cancer?" +print(llm.query(query=query)) +``` + +``` +The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer. +``` + +Or you can extract other informations: + +```python +results = engine.extract_results(llm=llm) +print(results) +``` + +``` +** RESULTS ** + +- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes +- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk +- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC +- Current clinical practice - high-penetrance genes - widely used +- Future prospect - all familial breast cancer genes - to be included in genetic test +- Research need - clinical management - of moderate and low-risk variants +``` + +## 🚀 Run the API + +To run the AItrika API, follow these steps: + +1. Ensure you have set up your environment and installed all dependencies as described in the Installation section. + +2. Run the API server using the following command: + +```bash +python api.py +``` + +The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints: + +- /associations: Get associations from a PubMed article +- /abstract: Get abstract of a PubMed article +- /query: Query a PubMed article +- /results: Get results from a PubMed article +- /participants: Get number of participants from a PubMed article +- /outcomes: Get outcomes from a PubMed article + +You can use tools like curl, Postman, or any HTTP client to interact with the API. For example: + +```bash +curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}' +``` + +The API documentation is automatically generated and saved to <code>docs/api-reference/openapi.json</code>. +You can use this file with tools like Swagger UI for a more interactive API exploration experience. + +## Support the Project + +If you find this project useful, please consider supporting it: + +- 🌟 Star the project on GitHub +- 🐛 Report bugs or suggest new features +- 🤝 Contribute with pull requests +- ☕️ [Buy me a coffee](https://www.buymeacoffee.com/dsupertramp) or consider a sponsor. + +### Commercial / Business use + +If you're using this project in a business or commercial context, please [contact me](salvatoredanilopalumbo@gmail.com). + +I'm available for consulting, custom development, or commercial licensing. + +Your support helps keep this project active and continuously improving. Thank you! + +## License + +AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details. + +## Star History + +[](https://star-history.com/#dSupertramp/AItrika&Date)