[9e8054]: / paper / NAR / readme.md

Download this file

37 lines (26 with data), 8.1 kB

Paper
DataSets
code
Example

Code for the paper published in NAR:

AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks

DataSet

Project Data type Datasets # of samples # of features Data & Code Path
Proof-of-Concept Image data MNIST[1] 70K images including 10 classes, 60K training set, 10K test set. 684 pixels paper/00_mnist/correlation
Proof-of-Concept Image data FMNIST[2] 70K images including 10 classes, 60K training set, 10K test set. 684 pixels paper/01_fmnist/correlation
Example to run Breast Cancer Diagnostic WDBC[3] 569 samples that are labeled as 357 benign status and 212 malignant status. 30 real-valued features of cell nucleus paper/00_example_breast_cancer
Pan Cancer Transcriptomics TCGA-T[4] Total 10446 samples including 33 cancer types from Pan-Cancer Atlas, the number of samples for each class is ranged from 45 to 1212, with an average of 317. The number of samples for 15 tumor types are less than 200. 10381 normalized-level3 RNA-Seq gene expression data paper/02_transcriptome/CNN
Pan Cancer Transcriptomics TCGA-S & TCGA-G[5] It contains 18 subset datasets, each dataset is a binary task on a different cancer and different stages or grades, the number of samples for each task is ranged from 179 to 1134, with an average of 486. 17970 “O” genes with Z-score transformed RNA-Seq gene expression data. paper/02_transcriptome/ML
COVID-19 Proteomics Cov-D[6] 363 samples, 211 SARS-CoV-2 positives and 151 negatives that are from 3 different labs. 88 nasal swabs MALDI-MS signal peaks paper/03_COVID-19
COVID-19 Proteomics & Metabolomics Cov-S[7] 41 patients, including 31 in training set (18 non-severe and 13 severe) and an independent cohort of 10 patients (6 non-severe and 4 severe). 1486 markers from the sera samples, including 649 proteins and 847 metabolites paper/03_COV19_Severe

References

  • [1] LeCun, Y. The MNIST database of handwritten digits. http://yannlecuncom/exdb/mnist/ (1998).
  • [2] Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:170807747 (2017).
  • [3] Dua, D. & Graff, C. UCI machine learning repository, Wisconsin Diagnostic Breast Cancer (WDBC) Data Set. URL http://archiveicsuciedu/ml 37 (2019).
  • [4] Lyu, B. & Haque, A. Deep learning based tumor type classification using gene expression data. in Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 89-96 (2018).
  • [5] Smith, A. M. et al. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC bioinformatics 21, 1-18 (2020).
  • [6] Nachtigall, F. M., Pereira, A., Trofymchuk, O. S. & Santos, L. S. Detection of SARS-CoV-2 in nasal swabs using MALDI-MS. Nature biotechnology 38, 1168-1173 (2020).
  • [7] Shen, B. et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell 182, 59-72. e15 (2020).
  • [8] Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nature medicine 25, 679-689 (2019).
  • [9] Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nature medicine 25, 968-976 (2019).