a b/3-Metabolome and Proteome/README.md
1
## 1. Metabolome
2
3
The raw data from mass spectrometer was imported into commercial software Progenesis QI (version 2.2, hereinafter referred to as QI) for peak picking (https://www.nonlinear.com/progenesis/qi/), to obtain information of metabolites such as mass over charge, retention time and ion area. The QI workflow consists of the following steps: peak alignment, peak picking, and peak identification.
4
5
The metabolite identification was performed by Progenesis QI by searching against HMDB (v5.0), METLIN (v3.7.1) and KEGG (v96.0) databases. 
6
7
Pre-processing of peak data was performed using metaX (https://www.bioconductor.org/packages/3.2/bioc/html/metaX.html), the steps include: 
8
9
- Filtering out low quality ions (first removed ions in QC sample that contain over 50% missing value, then removed ions in actual samples that contain over 80% missing value)
10
- Using k-nearest neighbor (KNN) method for filling the missing values
11
- Using probabilistic quotient normalization (PQN) method for data normalization
12
- Using QC-RSC (Quality control-based robust LOESS signal correction) method to alleviate the effects of peak area attenuation
13
- Filtering out ions in all QC samples which are RSD > 30% (the ions with RSD > 30% are fluctuate greatly in the experiment and will not be included in downstream statistical analysis)
14
15
Taken the analysis of positive ion mode as example:
16
17
```R
18
library(metaX)
19
para <- new("metaXpara")
20
pfile <- "m_pos.csv" ## Output from QI, raw peak file with metabolite information
21
sfile <- "s_pos.list" ## Output from QI, sample list file
22
idres <- "i_pos.csv" ## Output from QI, ion intensity file
23
para@outdir <- "metaX_result_pos"
24
para@prefix <- "pos"
25
para@sampleListFile <- sfile
26
para@ratioPairs <- "COPD:Healthy"
27
para <- importDataFromQI(para, file=pfile)
28
plsdaPara <- new("plsDAPara")
29
plsdaPara@scale = "pareto"
30
plsdaPara@cpu = 4
31
plsdaPara@kfold = 3
32
#plsdaPara@do = FALSE
33
res <- doQCRLSC(para, cpu=1)
34
missValueImputeMethod(para)<-"KNN"
35
p <- metaXpipe(para, plsdaPara=plsdaPara, missValueRatioQC=0.5, missValueRatioSample=0.8, cvFilter=0.3, idres=idres, qcsc=0, scale="pareto", remveOutlier=FALSE, nor.method="pqn", t=1, nor.order = 1, pclean = FALSE, doROC=FALSE)
36
save(p, file="pos.rda")
37
sessionInfo()
38
```
39
40
The processed metabolome data are uploaded as metabolome.txt
41
42
The detailed information for each metabolite, including KEGG/HMDB/METLIN/PubChem/ChEBI IDs, SMILES structure, class and pathway is uploaded as compound_information.txt
43
44
## 2. Sputum and serum proteome
45
46
A panel of 280 proteins were measured using custom Quantibody Human Antibody Array (test procedure no. SOP-TF-QAH-001, SOP-TF-QAH-003 microarray) from RayBiotech (https://www.raybiotech.com/inflammation-protein-arrays/).
47
48
The processed sputum and serum proteome data are uploaded as sputum_proteome.txt and serum_proteome.txt
49
50
The detailed information of the 280 proteins is uploaded as protein_information.txt