Diff of /README.md [000000] .. [3bfed4]

Switch to unified view

a b/README.md
1
2
<!-- README.md is generated from README.Rmd. Please edit that file -->
3
4
# INDEED
5
6
## Overview
7
8
This R package implements INDEED algorithm from Zuo *et. al.*’s Methods
9
paper, INDEED: Integrated differential expression and differential
10
network analysis of omic data for biomarker discovery ([PMID:
11
27592383](https://www.ncbi.nlm.nih.gov/pubmed/?term=27592383%5Buid%5D)).
12
13
This R package will generate a list of dataframes containing information
14
such as p-value, node degree and activity score for each biomolecule. A
15
higher activity score indicates that the corresponding biomolecule has
16
more neighbors connected in the differential network and their p-values
17
are more statistically significant. It will also generate a network
18
display to aid users’ biomarker selection.
19
20
## Comparison with competing methods
21
22
A comparison between INDEED and two competing methods ([DNAPATH](https://cran.r-project.org/web/packages/dnapath/index.html) and [JDINAC](https://github.com/jijiadong/JDINAC)) has been performed in a [simulation study](https://github.com/Hurricaner1989/INDEED-simulation). Metrics such as precision recall AUC, precision, recall and run time are computed under $n < p$, $n = p$, and $n > p$ conditions. We encourage users who are interested in a more thorough comparison to build on top of our simulation study and try to perform the comparison by themselves. 
23
24
## Installation
25
26
You can install INDEED from github with:
27
28
``` r
29
# install.packages("devtools")
30
devtools::install_github("ressomlab/INDEED")
31
```
32
33
## Load package
34
35
Load the package.
36
37
``` r
38
# load INDEED
39
library(INDEED)
40
#> Loading required package: glasso
41
```
42
43
## Examples
44
45
A testing dataset has been provided to the users to get familiar with
46
INDEED R package. It contains the expression levels of 39 metabolites
47
from 120 subjects (CIRR: 60; HCC: 60) with CIRR group named as group 0
48
and HCC group named as group 1.
49
50
``` r
51
# Data matrix contains the expression levels of 39 metabolites from 120 subjects 
52
# (6 metabolites and 10 subjects are shown)
53
head(Met_GU[, 1:10])
54
#>            X1         X2         X3        X4          X5         X6
55
#> 1 -1.17784288 -0.6524507  0.1130101 0.3273883 -0.81597223 0.91690985
56
#> 2 -0.74465547 -0.8403552  1.2275791 1.4884276  0.95811649 0.22175791
57
#> 3  1.02005243  1.6526556  0.4660893 1.4657142  1.15495800 0.66656520
58
#> 4  0.40435337  0.4216086  0.3728297 0.4413724  0.41055731 0.39239917
59
#> 5  1.27026847  1.5406950 -0.1213972 1.0226981 -1.41568157 0.02338627
60
#> 6  0.04855234  0.6102747  1.0018852 0.8012087  0.03375084 0.29277059
61
#>            X7          X8         X9        X10
62
#> 1 -0.10606357 -0.14868927 -0.7536426  1.9331369
63
#> 2  0.57873922 -0.04059911 -0.3448051 -0.3943420
64
#> 3 -0.02235966 -0.25240024  0.6314481 -0.2927764
65
#> 4  0.34483591  0.64974659  0.3820917  0.3832617
66
#> 5  2.19089662 -0.80789325  0.1743634  1.2832645
67
#> 6  0.20963886  0.25854132  0.8692107 -0.5259235
68
# Group label for each subject (40 subjects are shown)
69
Met_Group_GU[1:40]
70
#>   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21
71
#> 1  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   1   1
72
#>   X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40
73
#> 1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
74
# Metabolite KEGG IDs (10 metabolites are shown)
75
Met_name_GU[1:10]
76
#>  [1] "C00009" "C00022" "C00025" "C00049" "C00064" "C00065" "C00086" "C00097"
77
#>  [9] "C00124" "C00148"
78
```
79
80
An example to obtain the differential network using partial correlation
81
analysis.
82
83
``` r
84
# set seed to avoid randomness
85
set.seed(100)
86
# Compute rho values to run graphical lasso
87
pre_data <- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = TRUE)
88
```
89
90
![](figure/rho-selection-1.png)<!-- -->
91
92
From the error curve figure, users can choose the rho value based on the
93
minimum rule (red vertical line), the one standard error rule (blue
94
horizontal line) or their preferred value. INDEED provides users the
95
option to adjust multiple testing effect in edge detection (fdr = TRUE).
96
This will lead to a more sparse network in general. In this example, the
97
network is too sparse. We decide to set fdr = FALSE for demonstration.
98
It’s a good idea to start by setting fdr = TRUE and later relax it to
99
fdr = FALSE if the network is too sparse when working on a new dataset.
100
101
``` r
102
# Choose optimal rho values to compute activity scores and build the differential network
103
result <- partial_cor(data_list = pre_data, rho_group1 = 'min', rho_group2 = "min", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE)
104
```
105
106
Show the network display and users can interact with it.
107
108
``` r
109
# Show result 
110
head(result$activity_score)
111
#>   Node     ID P_value Node_Degree Activity_Score
112
#> 1   12 C00183   0.000           3            8.2
113
#> 2   15 C00188   0.487           3            8.1
114
#> 3    5 C00064   0.015           3            7.7
115
#> 4   18 C00247   0.889           6            6.4
116
#> 5    8 C00097   0.578           6            6.0
117
#> 6   16 C00189   0.016           5            6.0
118
head(result$diff_network)
119
#>   Node1 Node2 Binary    Weight
120
#> 1     1    19      1  2.290368
121
#> 2     2     8     -1 -2.290368
122
#> 3     2    31      1  2.226212
123
#> 4     2    33     -1 -4.635348
124
#> 5     3    22      1  2.033520
125
#> 6     4    33     -1 -2.457263
126
# Show network
127
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= FALSE, layout= 'nice')
128
```
129
130
<!-- Network display image was generated from somewhere else -->
131
132
![](figure/network_display_partial.png)<!-- -->
133
134
An example to obtain the differential network using correlation
135
analysis. When the partial correlation analysis returns a too sparse
136
network even when the multiple testing correction is turned off (fdr =
137
FALSE). It’s better to try correlation analysis.
138
139
``` r
140
# set seed to avoid randomness
141
set.seed(100)
142
# Compute rho values to run graphical lasso
143
result <- non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE)
144
```
145
146
Show the network display and users can interact with it. Here, edgewidth
147
is assigned to the significance level of the differential connection
148
(z-score of edge connection with different colors for positive or
149
negative changes).
150
151
``` r
152
# Show result 
153
head(result$activity_score)
154
#>   Node     ID P_value Node_Degree Activity_Score
155
#> 1   22 C00581   0.074           5           11.9
156
#> 2   15 C00188   0.487           5           10.0
157
#> 3   31 C02497   0.537           5            8.6
158
#> 4   21 C00383   0.001           5            8.5
159
#> 5   13 C00186   0.388           5            7.8
160
#> 6   16 C00189   0.016           3            7.2
161
head(result$diff_network)
162
#>   Node1 Node2 Binary    Weight
163
#> 1     1    19      1  2.512144
164
#> 2     4    18      1  2.326348
165
#> 3     5    13     -1 -2.197286
166
#> 4     5    22     -1 -2.033520
167
#> 5     5    34      1  4.635348
168
#> 6     6    31     -1 -2.257129
169
# Show network
170
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= TRUE, layout= 'nice')
171
```
172
173
<!-- Network display image was generated from somewhere else -->
174
175
![](figure/network_display_correlation.png)<!-- -->