|
a |
|
b/README.md |
|
|
1 |
# Hierarchical Shrinkage Stan Models for Biomarker Selection |
|
|
2 |
|
|
|
3 |
[](https://cran.r-project.org/package=hsstan) |
|
|
4 |
[](https://cran.r-project.org/package=hsstan) |
|
|
5 |
|
|
|
6 |
The **hsstan** package provides linear and logistic regression models penalized |
|
|
7 |
with hierarchical shrinkage priors for selection of biomarkers. Models are |
|
|
8 |
fitted with [Stan](https://mc-stan.org), which allows to perform full Bayesian |
|
|
9 |
inference ([Carpenter et al. (2017)](https://doi.org/10.18637/jss.v076.i01)). |
|
|
10 |
|
|
|
11 |
It implements the horseshoe and regularized horseshoe priors ([Piironen and |
|
|
12 |
Vehtari (2017)](https://doi.org/10.1214/17-EJS1337SI)), and the projection |
|
|
13 |
predictive selection approach to recover a sparse set of predictive biomarkers |
|
|
14 |
([Piironen, Paasiniemi and Vehtari (2020)](https://doi.org/10.1214/20-EJS1711)). |
|
|
15 |
|
|
|
16 |
The approach is particularly suited to selection from high-dimensional panels |
|
|
17 |
of biomarkers, such as those that can be measured by MSMS or similar technologies. |
|
|
18 |
|
|
|
19 |
### Example |
|
|
20 |
|
|
|
21 |
```r |
|
|
22 |
library(hsstan) |
|
|
23 |
data(diabetes) |
|
|
24 |
|
|
|
25 |
## if possible, allow using as many cores as cross-validation folds |
|
|
26 |
options(mc.cores=10) |
|
|
27 |
|
|
|
28 |
## baseline model with only clinical covariates |
|
|
29 |
hs.base <- hsstan(diabetes, Y ~ age + sex) |
|
|
30 |
|
|
|
31 |
## model with additional predictors |
|
|
32 |
hs.biom <- hsstan(diabetes, Y ~ age + sex, penalized=colnames(diabetes)[3:10]) |
|
|
33 |
print(hs.biom) |
|
|
34 |
# mean sd 2.5% 97.5% n_eff Rhat |
|
|
35 |
# (Intercept) 0.00 0.03 -0.07 0.07 4483 1 |
|
|
36 |
# age 0.00 0.04 -0.07 0.08 4706 1 |
|
|
37 |
# sex -0.15 0.04 -0.22 -0.08 5148 1 |
|
|
38 |
# bmi 0.33 0.04 0.25 0.41 4228 1 |
|
|
39 |
# map 0.20 0.04 0.12 0.28 3571 1 |
|
|
40 |
# tc -0.45 0.25 -0.94 0.04 3713 1 |
|
|
41 |
# ldl 0.28 0.20 -0.12 0.68 3674 1 |
|
|
42 |
# hdl 0.01 0.12 -0.23 0.25 3761 1 |
|
|
43 |
# tch 0.07 0.08 -0.06 0.25 4358 1 |
|
|
44 |
# ltg 0.43 0.11 0.22 0.64 3690 1 |
|
|
45 |
# glu 0.02 0.03 -0.03 0.10 3034 1 |
|
|
46 |
|
|
|
47 |
## behaviour of the sampler |
|
|
48 |
sampler.stats(hs.base) |
|
|
49 |
# accept.stat stepsize divergences treedepth gradients warmup sample |
|
|
50 |
# chain:1 0.9497 0.5723 0 3 6320 0.09 0.08 |
|
|
51 |
# chain:2 0.9357 0.6480 0 3 5938 0.09 0.08 |
|
|
52 |
# chain:3 0.9455 0.6014 0 3 6112 0.09 0.08 |
|
|
53 |
# chain:4 0.9488 0.5932 0 3 6238 0.09 0.08 |
|
|
54 |
# all 0.9449 0.6037 0 3 24608 0.36 0.32 |
|
|
55 |
|
|
|
56 |
sampler.stats(hs.biom) |
|
|
57 |
# accept.stat stepsize divergences treedepth gradients warmup sample |
|
|
58 |
# chain:1 0.9821 0.0191 0 8 233656 5.04 4.28 |
|
|
59 |
# chain:2 0.9891 0.0158 1 8 255994 5.88 4.72 |
|
|
60 |
# chain:3 0.9908 0.0143 0 9 274328 5.77 5.14 |
|
|
61 |
# chain:4 0.9933 0.0121 0 9 344984 5.98 6.70 |
|
|
62 |
# all 0.9888 0.0153 1 9 1108962 22.67 20.84 |
|
|
63 |
|
|
|
64 |
## approximate leave-one-out cross-validation with Pareto smoothed |
|
|
65 |
## importance sampling |
|
|
66 |
loo(hs.base) |
|
|
67 |
# Computed from 4000 by 442 log-likelihood matrix |
|
|
68 |
# Estimate SE |
|
|
69 |
# elpd_loo -622.4 11.4 |
|
|
70 |
# p_loo 3.4 0.2 |
|
|
71 |
# looic 1244.9 22.7 |
|
|
72 |
# ------ |
|
|
73 |
# Monte Carlo SE of elpd_loo is 0.0. |
|
|
74 |
# |
|
|
75 |
# All Pareto k estimates are good (k < 0.5). |
|
|
76 |
|
|
|
77 |
loo(hs.biom) |
|
|
78 |
# Computed from 4000 by 442 log-likelihood matrix |
|
|
79 |
# Estimate SE |
|
|
80 |
# elpd_loo -476.5 13.7 |
|
|
81 |
# p_loo 9.8 0.7 |
|
|
82 |
# looic 953.0 27.5 |
|
|
83 |
# ------ |
|
|
84 |
# Monte Carlo SE of elpd_loo is 0.1. |
|
|
85 |
# |
|
|
86 |
# All Pareto k estimates are good (k < 0.5). |
|
|
87 |
|
|
|
88 |
## run 10-folds cross-validation |
|
|
89 |
set.seed(1) |
|
|
90 |
folds <- caret::createFolds(diabetes$Y, k=10, list=FALSE) |
|
|
91 |
cv.base <- kfold(hs.base, folds=folds) |
|
|
92 |
cv.biom <- kfold(hs.biom, folds=folds) |
|
|
93 |
|
|
|
94 |
## cross-validated performance |
|
|
95 |
round(posterior_performance(cv.base), 2) |
|
|
96 |
# mean sd 2.5% 97.5% |
|
|
97 |
# r2 0.02 0.00 0.01 0.03 |
|
|
98 |
# llk -623.14 1.67 -626.61 -620.13 |
|
|
99 |
# attr(,"type") |
|
|
100 |
# [1] "cross-validated" |
|
|
101 |
|
|
|
102 |
round(posterior_performance(cv.biom), 2) |
|
|
103 |
# mean sd 2.5% 97.5% |
|
|
104 |
# r2 0.48 0.01 0.47 0.50 |
|
|
105 |
# llk -482.86 3.76 -490.45 -476.56 |
|
|
106 |
# attr(,"type") |
|
|
107 |
# [1] "cross-validated" |
|
|
108 |
|
|
|
109 |
## projection predictive selection |
|
|
110 |
sel.biom <- projsel(hs.biom) |
|
|
111 |
print(sel.biom, digits=4) |
|
|
112 |
# var kl rel.kl.null rel.kl elpd delta.elpd |
|
|
113 |
# 1 Intercept only 0.352283 0.00000 NA -627.3 -155.84260 |
|
|
114 |
# 2 Initial submodel 0.333156 0.05429 0.0000 -619.8 -148.39729 |
|
|
115 |
# 3 bmi 0.138629 0.60648 0.5839 -533.1 -61.69199 |
|
|
116 |
# 4 ltg 0.058441 0.83411 0.8246 -492.5 -21.09681 |
|
|
117 |
# 5 map 0.035970 0.89789 0.8920 -482.7 -11.25515 |
|
|
118 |
# 6 hdl 0.010304 0.97075 0.9691 -473.9 -2.41192 |
|
|
119 |
# 7 tc 0.005292 0.98498 0.9841 -472.2 -0.72490 |
|
|
120 |
# 8 ldl 0.002444 0.99306 0.9927 -471.8 -0.38292 |
|
|
121 |
# 9 tch 0.001105 0.99686 0.9967 -471.5 -0.07819 |
|
|
122 |
# 10 glu 0.000000 1.00000 1.0000 -471.4 0.00000 |
|
|
123 |
``` |
|
|
124 |
|
|
|
125 |
### References |
|
|
126 |
|
|
|
127 |
* [M. Colombo][mcol], A. Asadi Shehni, I. Thoma et al., |
|
|
128 |
Quantitative levels of serum N-glycans in type 1 diabetes and their |
|
|
129 |
association with kidney disease, |
|
|
130 |
[_Glycobiology_ (2021) 31 (5): 613-623](https://doi.org/10.1093/glycob/cwaa106). |
|
|
131 |
|
|
|
132 |
* [M. Colombo][mcol], S.J. McGurnaghan, L.A.K. Blackbourn et al., |
|
|
133 |
Comparison of serum and urinary biomarker panels with albumin creatinin |
|
|
134 |
ratio in the prediction of renal function decline in type 1 diabetes, |
|
|
135 |
[_Diabetologia_ (2020) 63 (4): 788-798](https://doi.org/10.1007/s00125-019-05081-8). |
|
|
136 |
|
|
|
137 |
* [M. Colombo][mcol], E. Valo, S.J. McGurnaghan et al., |
|
|
138 |
Biomarkers associated with progression of renal disease in type 1 diabetes, |
|
|
139 |
[_Diabetologia_ (2019) 62 (9): 1616-1627](https://doi.org/10.1007/s00125-019-4915-0). |
|
|
140 |
|
|
|
141 |
[mcol]: https://github.com/mcol |