--- a +++ b/3-Metabolome and Proteome/README.md @@ -0,0 +1,50 @@ +## 1. Metabolome + +The raw data from mass spectrometer was imported into commercial software Progenesis QI (version 2.2, hereinafter referred to as QI) for peak picking (https://www.nonlinear.com/progenesis/qi/), to obtain information of metabolites such as mass over charge, retention time and ion area. The QI workflow consists of the following steps: peak alignment, peak picking, and peak identification. + +The metabolite identification was performed by Progenesis QI by searching against HMDB (v5.0), METLIN (v3.7.1) and KEGG (v96.0) databases. + +Pre-processing of peak data was performed using metaX (https://www.bioconductor.org/packages/3.2/bioc/html/metaX.html), the steps include: + +- Filtering out low quality ions (first removed ions in QC sample that contain over 50% missing value, then removed ions in actual samples that contain over 80% missing value) +- Using k-nearest neighbor (KNN) method for filling the missing values +- Using probabilistic quotient normalization (PQN) method for data normalization +- Using QC-RSC (Quality control-based robust LOESS signal correction) method to alleviate the effects of peak area attenuation +- Filtering out ions in all QC samples which are RSD > 30% (the ions with RSD > 30% are fluctuate greatly in the experiment and will not be included in downstream statistical analysis) + +Taken the analysis of positive ion mode as example: + +```R +library(metaX) +para <- new("metaXpara") +pfile <- "m_pos.csv" ## Output from QI, raw peak file with metabolite information +sfile <- "s_pos.list" ## Output from QI, sample list file +idres <- "i_pos.csv" ## Output from QI, ion intensity file +para@outdir <- "metaX_result_pos" +para@prefix <- "pos" +para@sampleListFile <- sfile +para@ratioPairs <- "COPD:Healthy" +para <- importDataFromQI(para, file=pfile) +plsdaPara <- new("plsDAPara") +plsdaPara@scale = "pareto" +plsdaPara@cpu = 4 +plsdaPara@kfold = 3 +#plsdaPara@do = FALSE +res <- doQCRLSC(para, cpu=1) +missValueImputeMethod(para)<-"KNN" +p <- metaXpipe(para, plsdaPara=plsdaPara, missValueRatioQC=0.5, missValueRatioSample=0.8, cvFilter=0.3, idres=idres, qcsc=0, scale="pareto", remveOutlier=FALSE, nor.method="pqn", t=1, nor.order = 1, pclean = FALSE, doROC=FALSE) +save(p, file="pos.rda") +sessionInfo() +``` + +The processed metabolome data are uploaded as metabolome.txt + +The detailed information for each metabolite, including KEGG/HMDB/METLIN/PubChem/ChEBI IDs, SMILES structure, class and pathway is uploaded as compound_information.txt + +## 2. Sputum and serum proteome + +A panel of 280 proteins were measured using custom Quantibody Human Antibody Array (test procedure no. SOP-TF-QAH-001, SOP-TF-QAH-003 microarray) from RayBiotech (https://www.raybiotech.com/inflammation-protein-arrays/). + +The processed sputum and serum proteome data are uploaded as sputum_proteome.txt and serum_proteome.txt + +The detailed information of the 280 proteins is uploaded as protein_information.txt