|
a |
|
b/3-Metabolome and Proteome/README.md |
|
|
1 |
## 1. Metabolome |
|
|
2 |
|
|
|
3 |
The raw data from mass spectrometer was imported into commercial software Progenesis QI (version 2.2, hereinafter referred to as QI) for peak picking (https://www.nonlinear.com/progenesis/qi/), to obtain information of metabolites such as mass over charge, retention time and ion area. The QI workflow consists of the following steps: peak alignment, peak picking, and peak identification. |
|
|
4 |
|
|
|
5 |
The metabolite identification was performed by Progenesis QI by searching against HMDB (v5.0), METLIN (v3.7.1) and KEGG (v96.0) databases. |
|
|
6 |
|
|
|
7 |
Pre-processing of peak data was performed using metaX (https://www.bioconductor.org/packages/3.2/bioc/html/metaX.html), the steps include: |
|
|
8 |
|
|
|
9 |
- Filtering out low quality ions (first removed ions in QC sample that contain over 50% missing value, then removed ions in actual samples that contain over 80% missing value) |
|
|
10 |
- Using k-nearest neighbor (KNN) method for filling the missing values |
|
|
11 |
- Using probabilistic quotient normalization (PQN) method for data normalization |
|
|
12 |
- Using QC-RSC (Quality control-based robust LOESS signal correction) method to alleviate the effects of peak area attenuation |
|
|
13 |
- Filtering out ions in all QC samples which are RSD > 30% (the ions with RSD > 30% are fluctuate greatly in the experiment and will not be included in downstream statistical analysis) |
|
|
14 |
|
|
|
15 |
Taken the analysis of positive ion mode as example: |
|
|
16 |
|
|
|
17 |
```R |
|
|
18 |
library(metaX) |
|
|
19 |
para <- new("metaXpara") |
|
|
20 |
pfile <- "m_pos.csv" ## Output from QI, raw peak file with metabolite information |
|
|
21 |
sfile <- "s_pos.list" ## Output from QI, sample list file |
|
|
22 |
idres <- "i_pos.csv" ## Output from QI, ion intensity file |
|
|
23 |
para@outdir <- "metaX_result_pos" |
|
|
24 |
para@prefix <- "pos" |
|
|
25 |
para@sampleListFile <- sfile |
|
|
26 |
para@ratioPairs <- "COPD:Healthy" |
|
|
27 |
para <- importDataFromQI(para, file=pfile) |
|
|
28 |
plsdaPara <- new("plsDAPara") |
|
|
29 |
plsdaPara@scale = "pareto" |
|
|
30 |
plsdaPara@cpu = 4 |
|
|
31 |
plsdaPara@kfold = 3 |
|
|
32 |
#plsdaPara@do = FALSE |
|
|
33 |
res <- doQCRLSC(para, cpu=1) |
|
|
34 |
missValueImputeMethod(para)<-"KNN" |
|
|
35 |
p <- metaXpipe(para, plsdaPara=plsdaPara, missValueRatioQC=0.5, missValueRatioSample=0.8, cvFilter=0.3, idres=idres, qcsc=0, scale="pareto", remveOutlier=FALSE, nor.method="pqn", t=1, nor.order = 1, pclean = FALSE, doROC=FALSE) |
|
|
36 |
save(p, file="pos.rda") |
|
|
37 |
sessionInfo() |
|
|
38 |
``` |
|
|
39 |
|
|
|
40 |
The processed metabolome data are uploaded as metabolome.txt |
|
|
41 |
|
|
|
42 |
The detailed information for each metabolite, including KEGG/HMDB/METLIN/PubChem/ChEBI IDs, SMILES structure, class and pathway is uploaded as compound_information.txt |
|
|
43 |
|
|
|
44 |
## 2. Sputum and serum proteome |
|
|
45 |
|
|
|
46 |
A panel of 280 proteins were measured using custom Quantibody Human Antibody Array (test procedure no. SOP-TF-QAH-001, SOP-TF-QAH-003 microarray) from RayBiotech (https://www.raybiotech.com/inflammation-protein-arrays/). |
|
|
47 |
|
|
|
48 |
The processed sputum and serum proteome data are uploaded as sputum_proteome.txt and serum_proteome.txt |
|
|
49 |
|
|
|
50 |
The detailed information of the 280 proteins is uploaded as protein_information.txt |