--- a +++ b/paper/NAR/readme.md @@ -0,0 +1,36 @@ +[](https://doi.org/10.1093/nar/gkac010) +[](https://doi.org/10.5281/zenodo.3999156) +[](https://zenodo.org/badge/latestdoi/283439278) +[](https://github.com/shenwanxiang/bidd-aggmap/tree/master/paper/example) + + +### Code for the paper published in NAR: +[AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks](https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkac010/6517966) + +**DataSet** + +| **Project** | **Data type** | **Datasets** | **\# of samples** | **\# of features** | **Data & Code Path** | +| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | --------------------------- | +| Proof-of-Concept | Image data | **MNIST**\[1\] | 70K images including 10 classes, 60K training set, 10K test set. | 684 pixels | paper/00\_mnist/correlation | +| Proof-of-Concept | Image data | **FMNIST**\[2\] | 70K images including 10 classes, 60K training set, 10K test set. | 684 pixels | paper/01\_fmnist/correlation | +| Example to run | Breast Cancer Diagnostic | **WDBC**\[3\] | 569 samples that are labeled as 357 benign status and 212 malignant status. | 30 real-valued features of cell nucleus | paper/00\_example\_breast\_cancer | +| Pan Cancer | Transcriptomics | **TCGA-T**\[4\] | Total 10446 samples including 33 cancer types from Pan-Cancer Atlas, the number of samples for each class is ranged from 45 to 1212, with an average of 317. The number of samples for 15 tumor types are less than 200. | 10381 normalized-level3 RNA-Seq gene expression data | paper/02\_transcriptome/CNN | +| Pan Cancer | Transcriptomics | **TCGA-S & TCGA-G**\[5\] | It contains 18 subset datasets, each dataset is a binary task on a different cancer and different stages or grades, the number of samples for each task is ranged from 179 to 1134, with an average of 486. | 17970 “O” genes with Z-score transformed RNA-Seq gene expression data. | paper/02\_transcriptome/ML | +| COVID-19 | Proteomics | **Cov-D**\[6\] | 363 samples, 211 SARS-CoV-2 positives and 151 negatives that are from 3 different labs. | 88 nasal swabs MALDI-MS signal peaks | paper/03\_COVID-19 | +| COVID-19 | Proteomics & Metabolomics | **Cov-S**\[7\] | 41 patients, including 31 in training set (18 non-severe and 13 severe) and an independent cohort of 10 patients (6 non-severe and 4 severe). | 1486 markers from the sera samples, including 649 proteins and 847 metabolites | paper/03\_COV19\_Severe | + + + +### **References** +* [1] LeCun, Y. The MNIST database of handwritten digits. http://yannlecuncom/exdb/mnist/ (**1998**). +* [2] Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:170807747 (**2017**). +* [3] Dua, D. & Graff, C. UCI machine learning repository, Wisconsin Diagnostic Breast Cancer (WDBC) Data Set. URL http://archiveicsuciedu/ml 37 (**2019**). +* [4] Lyu, B. & Haque, A. Deep learning based tumor type classification using gene expression data. in Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 89-96 (**2018**). +* [5] Smith, A. M. et al. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC bioinformatics 21, 1-18 (**2020**). +* [6] Nachtigall, F. M., Pereira, A., Trofymchuk, O. S. & Santos, L. S. Detection of SARS-CoV-2 in nasal swabs using MALDI-MS. Nature biotechnology 38, 1168-1173 (**2020**). +* [7] Shen, B. et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell 182, 59-72. e15 (**2020**). +* [8] Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nature medicine 25, 679-689 (**2019**). +* [9] Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nature medicine 25, 968-976 (**2019**). + + +