|
a |
|
b/README.md |
|
|
1 |
|
|
|
2 |
<!-- README.md is generated from README.Rmd. Please edit that file --> |
|
|
3 |
|
|
|
4 |
# INDEED |
|
|
5 |
|
|
|
6 |
## Overview |
|
|
7 |
|
|
|
8 |
This R package implements INDEED algorithm from Zuo *et. al.*’s Methods |
|
|
9 |
paper, INDEED: Integrated differential expression and differential |
|
|
10 |
network analysis of omic data for biomarker discovery ([PMID: |
|
|
11 |
27592383](https://www.ncbi.nlm.nih.gov/pubmed/?term=27592383%5Buid%5D)). |
|
|
12 |
|
|
|
13 |
This R package will generate a list of dataframes containing information |
|
|
14 |
such as p-value, node degree and activity score for each biomolecule. A |
|
|
15 |
higher activity score indicates that the corresponding biomolecule has |
|
|
16 |
more neighbors connected in the differential network and their p-values |
|
|
17 |
are more statistically significant. It will also generate a network |
|
|
18 |
display to aid users’ biomarker selection. |
|
|
19 |
|
|
|
20 |
## Comparison with competing methods |
|
|
21 |
|
|
|
22 |
A comparison between INDEED and two competing methods ([DNAPATH](https://cran.r-project.org/web/packages/dnapath/index.html) and [JDINAC](https://github.com/jijiadong/JDINAC)) has been performed in a [simulation study](https://github.com/Hurricaner1989/INDEED-simulation). Metrics such as precision recall AUC, precision, recall and run time are computed under $n < p$, $n = p$, and $n > p$ conditions. We encourage users who are interested in a more thorough comparison to build on top of our simulation study and try to perform the comparison by themselves. |
|
|
23 |
|
|
|
24 |
## Installation |
|
|
25 |
|
|
|
26 |
You can install INDEED from github with: |
|
|
27 |
|
|
|
28 |
``` r |
|
|
29 |
# install.packages("devtools") |
|
|
30 |
devtools::install_github("ressomlab/INDEED") |
|
|
31 |
``` |
|
|
32 |
|
|
|
33 |
## Load package |
|
|
34 |
|
|
|
35 |
Load the package. |
|
|
36 |
|
|
|
37 |
``` r |
|
|
38 |
# load INDEED |
|
|
39 |
library(INDEED) |
|
|
40 |
#> Loading required package: glasso |
|
|
41 |
``` |
|
|
42 |
|
|
|
43 |
## Examples |
|
|
44 |
|
|
|
45 |
A testing dataset has been provided to the users to get familiar with |
|
|
46 |
INDEED R package. It contains the expression levels of 39 metabolites |
|
|
47 |
from 120 subjects (CIRR: 60; HCC: 60) with CIRR group named as group 0 |
|
|
48 |
and HCC group named as group 1. |
|
|
49 |
|
|
|
50 |
``` r |
|
|
51 |
# Data matrix contains the expression levels of 39 metabolites from 120 subjects |
|
|
52 |
# (6 metabolites and 10 subjects are shown) |
|
|
53 |
head(Met_GU[, 1:10]) |
|
|
54 |
#> X1 X2 X3 X4 X5 X6 |
|
|
55 |
#> 1 -1.17784288 -0.6524507 0.1130101 0.3273883 -0.81597223 0.91690985 |
|
|
56 |
#> 2 -0.74465547 -0.8403552 1.2275791 1.4884276 0.95811649 0.22175791 |
|
|
57 |
#> 3 1.02005243 1.6526556 0.4660893 1.4657142 1.15495800 0.66656520 |
|
|
58 |
#> 4 0.40435337 0.4216086 0.3728297 0.4413724 0.41055731 0.39239917 |
|
|
59 |
#> 5 1.27026847 1.5406950 -0.1213972 1.0226981 -1.41568157 0.02338627 |
|
|
60 |
#> 6 0.04855234 0.6102747 1.0018852 0.8012087 0.03375084 0.29277059 |
|
|
61 |
#> X7 X8 X9 X10 |
|
|
62 |
#> 1 -0.10606357 -0.14868927 -0.7536426 1.9331369 |
|
|
63 |
#> 2 0.57873922 -0.04059911 -0.3448051 -0.3943420 |
|
|
64 |
#> 3 -0.02235966 -0.25240024 0.6314481 -0.2927764 |
|
|
65 |
#> 4 0.34483591 0.64974659 0.3820917 0.3832617 |
|
|
66 |
#> 5 2.19089662 -0.80789325 0.1743634 1.2832645 |
|
|
67 |
#> 6 0.20963886 0.25854132 0.8692107 -0.5259235 |
|
|
68 |
# Group label for each subject (40 subjects are shown) |
|
|
69 |
Met_Group_GU[1:40] |
|
|
70 |
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 |
|
|
71 |
#> 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 |
|
|
72 |
#> X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 |
|
|
73 |
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |
|
|
74 |
# Metabolite KEGG IDs (10 metabolites are shown) |
|
|
75 |
Met_name_GU[1:10] |
|
|
76 |
#> [1] "C00009" "C00022" "C00025" "C00049" "C00064" "C00065" "C00086" "C00097" |
|
|
77 |
#> [9] "C00124" "C00148" |
|
|
78 |
``` |
|
|
79 |
|
|
|
80 |
An example to obtain the differential network using partial correlation |
|
|
81 |
analysis. |
|
|
82 |
|
|
|
83 |
``` r |
|
|
84 |
# set seed to avoid randomness |
|
|
85 |
set.seed(100) |
|
|
86 |
# Compute rho values to run graphical lasso |
|
|
87 |
pre_data <- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = TRUE) |
|
|
88 |
``` |
|
|
89 |
|
|
|
90 |
<!-- --> |
|
|
91 |
|
|
|
92 |
From the error curve figure, users can choose the rho value based on the |
|
|
93 |
minimum rule (red vertical line), the one standard error rule (blue |
|
|
94 |
horizontal line) or their preferred value. INDEED provides users the |
|
|
95 |
option to adjust multiple testing effect in edge detection (fdr = TRUE). |
|
|
96 |
This will lead to a more sparse network in general. In this example, the |
|
|
97 |
network is too sparse. We decide to set fdr = FALSE for demonstration. |
|
|
98 |
It’s a good idea to start by setting fdr = TRUE and later relax it to |
|
|
99 |
fdr = FALSE if the network is too sparse when working on a new dataset. |
|
|
100 |
|
|
|
101 |
``` r |
|
|
102 |
# Choose optimal rho values to compute activity scores and build the differential network |
|
|
103 |
result <- partial_cor(data_list = pre_data, rho_group1 = 'min', rho_group2 = "min", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE) |
|
|
104 |
``` |
|
|
105 |
|
|
|
106 |
Show the network display and users can interact with it. |
|
|
107 |
|
|
|
108 |
``` r |
|
|
109 |
# Show result |
|
|
110 |
head(result$activity_score) |
|
|
111 |
#> Node ID P_value Node_Degree Activity_Score |
|
|
112 |
#> 1 12 C00183 0.000 3 8.2 |
|
|
113 |
#> 2 15 C00188 0.487 3 8.1 |
|
|
114 |
#> 3 5 C00064 0.015 3 7.7 |
|
|
115 |
#> 4 18 C00247 0.889 6 6.4 |
|
|
116 |
#> 5 8 C00097 0.578 6 6.0 |
|
|
117 |
#> 6 16 C00189 0.016 5 6.0 |
|
|
118 |
head(result$diff_network) |
|
|
119 |
#> Node1 Node2 Binary Weight |
|
|
120 |
#> 1 1 19 1 2.290368 |
|
|
121 |
#> 2 2 8 -1 -2.290368 |
|
|
122 |
#> 3 2 31 1 2.226212 |
|
|
123 |
#> 4 2 33 -1 -4.635348 |
|
|
124 |
#> 5 3 22 1 2.033520 |
|
|
125 |
#> 6 4 33 -1 -2.457263 |
|
|
126 |
# Show network |
|
|
127 |
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= FALSE, layout= 'nice') |
|
|
128 |
``` |
|
|
129 |
|
|
|
130 |
<!-- Network display image was generated from somewhere else --> |
|
|
131 |
|
|
|
132 |
<!-- --> |
|
|
133 |
|
|
|
134 |
An example to obtain the differential network using correlation |
|
|
135 |
analysis. When the partial correlation analysis returns a too sparse |
|
|
136 |
network even when the multiple testing correction is turned off (fdr = |
|
|
137 |
FALSE). It’s better to try correlation analysis. |
|
|
138 |
|
|
|
139 |
``` r |
|
|
140 |
# set seed to avoid randomness |
|
|
141 |
set.seed(100) |
|
|
142 |
# Compute rho values to run graphical lasso |
|
|
143 |
result <- non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE) |
|
|
144 |
``` |
|
|
145 |
|
|
|
146 |
Show the network display and users can interact with it. Here, edgewidth |
|
|
147 |
is assigned to the significance level of the differential connection |
|
|
148 |
(z-score of edge connection with different colors for positive or |
|
|
149 |
negative changes). |
|
|
150 |
|
|
|
151 |
``` r |
|
|
152 |
# Show result |
|
|
153 |
head(result$activity_score) |
|
|
154 |
#> Node ID P_value Node_Degree Activity_Score |
|
|
155 |
#> 1 22 C00581 0.074 5 11.9 |
|
|
156 |
#> 2 15 C00188 0.487 5 10.0 |
|
|
157 |
#> 3 31 C02497 0.537 5 8.6 |
|
|
158 |
#> 4 21 C00383 0.001 5 8.5 |
|
|
159 |
#> 5 13 C00186 0.388 5 7.8 |
|
|
160 |
#> 6 16 C00189 0.016 3 7.2 |
|
|
161 |
head(result$diff_network) |
|
|
162 |
#> Node1 Node2 Binary Weight |
|
|
163 |
#> 1 1 19 1 2.512144 |
|
|
164 |
#> 2 4 18 1 2.326348 |
|
|
165 |
#> 3 5 13 -1 -2.197286 |
|
|
166 |
#> 4 5 22 -1 -2.033520 |
|
|
167 |
#> 5 5 34 1 4.635348 |
|
|
168 |
#> 6 6 31 -1 -2.257129 |
|
|
169 |
# Show network |
|
|
170 |
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= TRUE, layout= 'nice') |
|
|
171 |
``` |
|
|
172 |
|
|
|
173 |
<!-- Network display image was generated from somewhere else --> |
|
|
174 |
|
|
|
175 |
<!-- --> |