a b/README.md
1
# Hierarchical Shrinkage Stan Models for Biomarker Selection
2
3
[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/hsstan)](https://cran.r-project.org/package=hsstan)
4
[![CRAN\_Downloads\_Badge](https://cranlogs.r-pkg.org/badges/hsstan)](https://cran.r-project.org/package=hsstan)
5
6
The **hsstan** package provides linear and logistic regression models penalized
7
with hierarchical shrinkage priors for selection of biomarkers. Models are
8
fitted with [Stan](https://mc-stan.org), which allows to perform full Bayesian
9
inference ([Carpenter et al. (2017)](https://doi.org/10.18637/jss.v076.i01)).
10
11
It implements the horseshoe and regularized horseshoe priors ([Piironen and
12
Vehtari (2017)](https://doi.org/10.1214/17-EJS1337SI)), and the projection
13
predictive selection approach to recover a sparse set of predictive biomarkers
14
([Piironen, Paasiniemi and Vehtari (2020)](https://doi.org/10.1214/20-EJS1711)).
15
16
The approach is particularly suited to selection from high-dimensional panels
17
of biomarkers, such as those that can be measured by MSMS or similar technologies.
18
19
### Example
20
21
```r
22
library(hsstan)
23
data(diabetes)
24
25
## if possible, allow using as many cores as cross-validation folds
26
options(mc.cores=10)
27
28
## baseline model with only clinical covariates
29
hs.base <- hsstan(diabetes, Y ~ age + sex)
30
31
## model with additional predictors
32
hs.biom <- hsstan(diabetes, Y ~ age + sex, penalized=colnames(diabetes)[3:10])
33
print(hs.biom)
34
#              mean   sd  2.5% 97.5% n_eff Rhat
35
# (Intercept)  0.00 0.03 -0.07  0.07  4483    1
36
# age          0.00 0.04 -0.07  0.08  4706    1
37
# sex         -0.15 0.04 -0.22 -0.08  5148    1
38
# bmi          0.33 0.04  0.25  0.41  4228    1
39
# map          0.20 0.04  0.12  0.28  3571    1
40
# tc          -0.45 0.25 -0.94  0.04  3713    1
41
# ldl          0.28 0.20 -0.12  0.68  3674    1
42
# hdl          0.01 0.12 -0.23  0.25  3761    1
43
# tch          0.07 0.08 -0.06  0.25  4358    1
44
# ltg          0.43 0.11  0.22  0.64  3690    1
45
# glu          0.02 0.03 -0.03  0.10  3034    1
46
47
## behaviour of the sampler
48
sampler.stats(hs.base)
49
#         accept.stat stepsize divergences treedepth gradients warmup sample
50
# chain:1      0.9497   0.5723           0         3      6320   0.09   0.08
51
# chain:2      0.9357   0.6480           0         3      5938   0.09   0.08
52
# chain:3      0.9455   0.6014           0         3      6112   0.09   0.08
53
# chain:4      0.9488   0.5932           0         3      6238   0.09   0.08
54
# all          0.9449   0.6037           0         3     24608   0.36   0.32
55
56
sampler.stats(hs.biom)
57
#         accept.stat stepsize divergences treedepth gradients warmup sample
58
# chain:1      0.9821   0.0191           0         8    233656   5.04   4.28
59
# chain:2      0.9891   0.0158           1         8    255994   5.88   4.72
60
# chain:3      0.9908   0.0143           0         9    274328   5.77   5.14
61
# chain:4      0.9933   0.0121           0         9    344984   5.98   6.70
62
# all          0.9888   0.0153           1         9   1108962  22.67  20.84
63
64
## approximate leave-one-out cross-validation with Pareto smoothed
65
## importance sampling
66
loo(hs.base)
67
# Computed from 4000 by 442 log-likelihood matrix
68
#          Estimate   SE
69
# elpd_loo   -622.4 11.4
70
# p_loo         3.4  0.2
71
# looic      1244.9 22.7
72
# ------
73
# Monte Carlo SE of elpd_loo is 0.0.
74
#
75
# All Pareto k estimates are good (k < 0.5).
76
77
loo(hs.biom)
78
# Computed from 4000 by 442 log-likelihood matrix
79
#          Estimate   SE
80
# elpd_loo   -476.5 13.7
81
# p_loo         9.8  0.7
82
# looic       953.0 27.5
83
# ------
84
# Monte Carlo SE of elpd_loo is 0.1.
85
#
86
# All Pareto k estimates are good (k < 0.5).
87
88
## run 10-folds cross-validation
89
set.seed(1)
90
folds <- caret::createFolds(diabetes$Y, k=10, list=FALSE)
91
cv.base <- kfold(hs.base, folds=folds)
92
cv.biom <- kfold(hs.biom, folds=folds)
93
94
## cross-validated performance
95
round(posterior_performance(cv.base), 2)
96
#        mean   sd    2.5%   97.5%
97
# r2     0.02 0.00    0.01    0.03
98
# llk -623.14 1.67 -626.61 -620.13
99
# attr(,"type")
100
# [1] "cross-validated"
101
102
round(posterior_performance(cv.biom), 2)
103
#        mean   sd    2.5%   97.5%
104
# r2     0.48 0.01    0.47    0.50
105
# llk -482.86 3.76 -490.45 -476.56
106
# attr(,"type")
107
# [1] "cross-validated"
108
109
## projection predictive selection
110
sel.biom <- projsel(hs.biom)
111
print(sel.biom, digits=4)
112
#                 var       kl rel.kl.null rel.kl   elpd delta.elpd
113
# 1    Intercept only 0.352283     0.00000     NA -627.3 -155.84260
114
# 2  Initial submodel 0.333156     0.05429 0.0000 -619.8 -148.39729
115
# 3               bmi 0.138629     0.60648 0.5839 -533.1  -61.69199
116
# 4               ltg 0.058441     0.83411 0.8246 -492.5  -21.09681
117
# 5               map 0.035970     0.89789 0.8920 -482.7  -11.25515
118
# 6               hdl 0.010304     0.97075 0.9691 -473.9   -2.41192
119
# 7                tc 0.005292     0.98498 0.9841 -472.2   -0.72490
120
# 8               ldl 0.002444     0.99306 0.9927 -471.8   -0.38292
121
# 9               tch 0.001105     0.99686 0.9967 -471.5   -0.07819
122
# 10              glu 0.000000     1.00000 1.0000 -471.4    0.00000
123
```
124
125
### References
126
127
* [M. Colombo][mcol], A. Asadi Shehni, I. Thoma et al.,
128
  Quantitative levels of serum N-glycans in type 1 diabetes and their
129
  association with kidney disease,
130
  [_Glycobiology_ (2021) 31 (5): 613-623](https://doi.org/10.1093/glycob/cwaa106).
131
132
* [M. Colombo][mcol], S.J. McGurnaghan, L.A.K. Blackbourn et al.,
133
  Comparison of serum and urinary biomarker panels with albumin creatinin
134
  ratio in the prediction of renal function decline in type 1 diabetes,
135
  [_Diabetologia_ (2020) 63 (4): 788-798](https://doi.org/10.1007/s00125-019-05081-8).
136
137
* [M. Colombo][mcol], E. Valo, S.J. McGurnaghan et al.,
138
  Biomarkers associated with progression of renal disease in type 1 diabetes,
139
  [_Diabetologia_ (2019) 62 (9): 1616-1627](https://doi.org/10.1007/s00125-019-4915-0).
140
141
[mcol]: https://github.com/mcol