a b/README.md
1
# IntelliGenes
2
3
This is the CLI implementation of _IntelliGenes_. A GUI version is available in the intelligenes-gui branch. 
4
5
IntelliGenes is a Python-based portable pipeline that addresses challenges arising from the cascading volume of genomics datasets being created that require interpretation. IntelliGenes serves as a comprehensive toolkit, fitting cutting-edge algorithms for discovering disease-associated biomarkers and patient prediction to users’ unique cohorts. IntelliGenes integrates demographics with genomics, facilitating investigations that consider both variables simultaneously. With IntelliGenes, we introduce I-Genes Scores, our novel metric for understanding the relevance of biomarkers in disease prediction engines.
6
7
_IntelliGenes_ can be installed through our GitHub using the terminal. Follow the provided steps to install IntelliGenes and the package’s dependencies: 
8
```
9
# Clone IntelliGenes’ GitHub Repository
10
git clone https://github.com/drzeeshanahmed/intelligenes.git
11
12
# Navigate to IntelliGenes
13
cd intelligenes/
14
15
# Install IntelliGenes
16
pip install .
17
```
18
19
_IntelliGenes_ offers a robust selection of tools to help users understand their multi-genomics datasets. _IntelliGenes_ has been designed as an easy-to-understand pipeline for those at all levels of computational understanding. _IntelliGenes_ has three functions:
20
```
21
# Discover Biomarkers
22
igenes_select -i data/cigt_file.csv -o results/
23
24
# Disease Prediction & I-Genes Scores 
25
igenes_predict -i data/cigt_file.csv -f features_file.csv -o results/
26
27
# IntelliGenes (Discovering Biomarkers & Predicting Disease) 
28
igenes -i data/cigt_file.csv -o results/
29
```
30
31
These are sample commands. We have provided an example CIGT file in tests/.
32
33
These commands all users to write various flags that will tailor _IntelliGenes_ to their exact needs: 
34
```
35
# IntelliGenes Selection Help
36
igenes_select --help
37
38
# IntelliGenes Prediction Help
39
igenes_predict --help
40
41
# IntelliGenes Help
42
igenes --help
43
```
44
45
_IntelliGenes_ requires a CIGT formatted dataset as an input. Examples of CIGT datasets can be found on our GitHub. The CIGT formatted dataset integrates demographics and transcriptomic: 
46
  - Columns contain demographic or transcriptomic biomarkers, while rows contain identifiers for individual patients. 
47
  - Demographics such as ‘Age’, ‘Race’, and ‘Sex’ should be integers (use EHR standards). These demographics are not required, as IntelliGenes works using only genomics/transcriptomics.
48
  - There must be a ‘Type’ column, denoting a patient’s status as an integer (use 0 or 1). 
49
50
More information is available in **Supplementary Material 2: _IntelliGenes_: Installation, configuration, and user’s guidelines**
51
52
If using _IntelliGenes_, please cite: 
53
54
Degroat, W., Mendhe, D., Bhurasi, A., Abdelhalim, H., Saman, Z., & Ahmed, Z. (2023). IntelliGenes: A novel machine learning pipeline for biomarker discovery and predictive analysis using multi-genomic profiles. Bioinformatics. 39, 12. btad755. PMID: 38096588. doi:10.1093/bioinformatics/btad755 (Oxford).