Diff of /manual.txt [000000] .. [81de4e]

Switch to unified view

a b/manual.txt
1
DESCRIPTION
2
Package: voomDDA
3
Title: Voom Based Diagonal Discriminant Analysis for RNA-Seq Data Classification
4
Version: 1.0.0
5
Author: Gokmen Zararsiz, Dincer Goksuluk, Selcuk Korkmaz
6
Description: Some functions for sample classification in RNA-Seq data
7
Maintainer: Gokmen Zararsiz <gokmenzararsiz@erciyes.edu.tr>
8
Depends: R (>= 3.1.0), pamr, limma
9
Suggests: MLSeq
10
License: GPL-2
11
...
12
13
####
14
15
weighted.stats{voomDDA}
16
17
Calculation of Weighted Statistics
18
19
Description
20
This function calculates several weighted statistics that is necessary for voomDDA and voomNSC classifiers.
21
22
Usage
23
weighted.stats(x = x, w = w, conditions = conditions)
24
25
Arguments
26
x: a data matrix with n columns and p rows, whose weighted statistics is to be computed. Rows indicate genes, where columns indicate samples.
27
w: a weight matrix with n columns and p rows. 
28
conditions: a numeric or factor vector containing the outcome for each sample representing experimental conditions.
29
30
Details
31
voom function in limma package takes RNA-Seq read counts as input, applies voom transformation and produces both expression values and weights in a pxn data matrix. weighted.stats calculates a number of statistics that is necessary for other functions in this package. These functions can be used to classify RNA-Seq data using voom precision weights.
32
33
Value
34
n: number of samples
35
p: number of genes
36
nclass: number of class
37
se.scale: scale factors for the within class standard errors defined as sqrt(1/n.class-1/n)
38
weightedMean: overall weighted means for each gene
39
weightedMean.C: weighted means calculated for each gene in each class
40
WeightedSD.C: weighted standard deviations calculated for each gene in each class
41
weightedSD.pooled: overall pooled and weighted standard deviations for each gene
42
delta: a matrix containing the relative differences, also called as t scores, in gene expression for each group
43
44
Authors
45
Gokmen Zararsiz (gokmenzararsiz@erciyes.edu.tr), Dincer Goksuluk (dincer.goksuluk@hacettepe.edu.tr), Selcuk Korkmaz (selcuk.korkmaz@hacettepe.edu.tr)
46
47
References
48
Law CW, Chen Y, Shi W, et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, doi:10.1186/gb-2014-15-2-r29
49
Tibshirani R, Hastie T, Narasimhan B, et al. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression PNAS 99: 6567-72.
50
51
Examples
52
# weighted statistics will be calculated for Fisher's iris data
53
x = as.matrix(iris[,1:4])
54
55
# generate some data for weight matrix
56
w = matrix(rnorm(150*4) + 1, ncol=4)
57
58
# iris data outcome
59
conditions = iris[,5]
60
61
# calculate weighted statistics
62
weighted.stats(t(x), t(w), conditions)
63
64
####
65
66
voomDDA.train{voomDDA}
67
68
Train a Voom Based Diagonal Discriminant Analysis for RNA-Seq Classification
69
70
Description
71
A function that applies voom based diagonal discriminant analysis to train and classify RNA-Seq data
72
73
Usage
74
voomDDA.train(counts, conditions, normalization = "TMM", pooled.var = TRUE)
75
76
Arguments
77
counts: numeric pxn matrix or data frame of read counts. Rows correspond to p genes (transcripts, exons, etc.), while columns correspont to biological samples.
78
conditions: factor or numeric vector for class labels representing experimental conditions
79
normalization: normalization of count data to adjust sample spesific differences before classification. tmm: Trimmed mean of M values. quantile: quantile normalization. none: Normalization is not applied. 
80
pooled.var: logical flag. If true (by default), the covariance matrices are assumed to be constant across classes and voomDLDA linear classifier is used. Otherwise (pool= FALSE), the covariance matrices may vary across classes and voomDQDA quadratic classifier is used.
81
82
Details
83
voomDDA is an RNA-Seq classifier which takes read counts as input, applies voom transformation and incorporates voom precision weights and log-cpm values in an extension of diagonal discriminant analysis for prediction.
84
85
Value
86
classNames: names of each experimental condition
87
nclass: number of class
88
normalization: used normalization model in the training process
89
PooledVar: TRUE - voom based diagonal linear discriminant analysis (voomDLDA). FALSE - voom based diagonal quadratic discriminant analysis (voomDQDA)
90
weightedStats: returns the same as weightedStats()
91
92
Authors
93
Gokmen Zararsiz (gokmenzararsiz@erciyes.edu.tr), Dincer Goksuluk (dincer.goksuluk@hacettepe.edu.tr), Selcuk Korkmaz (selcuk.korkmaz@hacettepe.edu.tr)
94
95
References
96
Dudoit S, Fridlyand J, Speed TP (2000). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. (Statistics, UC Berkeley, June 2000, Tech Report \#576)
97
Law CW, Chen Y, Shi W, et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, doi:10.1186/gb-2014-15-2-r29
98
                                                                                                                                                    
99
Examples
100
#use cervical data in MLSeq package
101
library(MLSeq)
102
data(cervical)
103
104
#create cervical conditions, train and test sets
105
set.seed(12345)
106
ratio=0.7
107
conditions = factor(rep(c("N","T"), c(29,29)))
108
ind = sample(58, ceiling(58*ratio), FALSE)
109
train = cervical[,ind]
110
test = cervical[,-ind]
111
tr.cond = conditions[ind]
112
ts.cond = conditions[-ind]
113
114
#train a voomDLDA classifier using quantile normalization
115
fit = voomDDA.train(counts = train, conditions = tr.cond, normalization = "quan", TRUE)
116
117
#train a voomDQDA classifier using TMM normalization
118
fit2 = voomDDA.train(counts = train, conditions = tr.cond, normalization = "TMM", FALSE)
119
120
####
121
122
predict.voomDDA{voomDDA}
123
124
Extract Predictions From voomDDA.train() Objects
125
126
Description
127
This function predicts the class labels of test data for a given voomDDA model
128
129
Usage
130
predict.voomDDA(object, newdata)
131
132
Arguments
133
object: a fitted training model object after voomDDA.train()
134
newdata: new test read count data to be predicted
135
136
Details
137
predict.voomDDA() function predicts the class labels of a test data based on the voomDDA training model.
138
139
Value
140
a vector of predicted classes of test data
141
142
Authors
143
Gokmen Zararsiz (gokmenzararsiz@erciyes.edu.tr), Dincer Goksuluk (dincer.goksuluk@hacettepe.edu.tr), Selcuk Korkmaz (selcuk.korkmaz@hacettepe.edu.tr)
144
145
References
146
Dudoit S, Fridlyand J, Speed TP (2000). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. (Statistics, UC Berkeley, June 2000, Tech Report \#576)
147
Law CW, Chen Y, Shi W, et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, doi:10.1186/gb-2014-15-2-r29
148
149
Examples
150
#use cervical data in MLSeq package
151
library(MLSeq)
152
data(cervical)
153
154
#create cervical conditions, train and test sets
155
set.seed(12345)
156
ratio=0.7
157
conditions = factor(rep(c("N","T"), c(29,29)))
158
ind = sample(58, ceiling(58*ratio), FALSE)
159
train = cervical[,ind]
160
test = cervical[,-ind]
161
tr.cond = conditions[ind]
162
ts.cond = conditions[-ind]
163
164
#train a voomDLDA classifier using quantile normalization
165
fit = voomDDA.train(counts = train, conditions = tr.cond, normalization = "quan", TRUE)
166
167
#train a voomDQDA classifier using TMM normalization
168
fit2 = voomDDA.train(counts = train, conditions = tr.cond, normalization = "TMM", FALSE)
169
170
#predict the labels of test data classes and create a confusion matrix
171
pred = predict.voomDDA(fit, test)
172
table(ts.cond, pred)
173
174
pred2 = predict.voomDDA(fit2, test)
175
table(ts.cond, pred2)
176
177
####
178
179
voomNSC.train{voomDDA}
180
181
Train a Voom Based Nearest Shrunken Centroids Classifier for RNA-Seq Classification
182
183
Description
184
A function that applies voom based nearest shrunken centroids classifier to train and classify RNA-Seq data
185
186
Usage
187
voomNSC.train(counts, conditions, n.threshold = 30, offset.percent = 50, remove.zeros = TRUE, normalization = "TMM")
188
189
Arguments
190
counts: numeric pxn matrix or data frame of read counts. Rows correspond to p genes (transcripts, exons, etc.), while columns correspont to biological samples.
191
conditions: factor or numeric vector for class labels representing experimental conditions
192
n.threshold : number of threshold values desired (default 30)
193
offset.percent: Fudge factor added to the denominator of each t-statistic, expressed as a percentile of the gene standard deviation values. This is a small positive quantity to penalize genes with expression values near zero, which can result in very large ratios. This factor is expecially impotant for Affy data. Default is the median of the standard 
194
deviations of each gene
195
remove.zeros: remove threshold values yielding zero genes? Default TRUE
196
normalization: normalization of count data to adjust sample spesific differences before classification. tmm: Trimmed mean of M values. quantile: quantile normalization. none: Normalization is not applied. 
197
198
Details
199
voomNSC is an RNA-Seq classifier which takes read counts as input, applies voom transformation and incorporates voom precision weights and log-cpm values in an extension of nearest shrunken centroids classifers for prediction.
200
201
Value
202
call: the calling sequence used
203
weightedMean: a vector containing the overall weighted unshrunken centroids
204
weightedMean.C: a matrix containing the weighted unshrunken centroids for each class
205
delta: a matrix containing the relative differences, also called as t scores, in gene expression for each group
206
errors: number of training errors for each threshold value
207
nonzero: number of genes that survived the thresholding for each threshold value
208
normalization: normalization of count data to adjust sample spesific differences before classification. tmm: Trimmed mean of M values. quantile: quantile normalization. none: Normalization is not applied.
209
offset: offset.percent used in the training process
210
opt.threshold: optimal threshold value that gives the minimum training error with the lowest number of genes
211
prior: prior probabilities used in the training process (proportions of the class frequencies)
212
prob: an array of predicted class probabilities. of dimension n by nclass by n.threshold. n is the number samples, nclass is the number of classes, n.threshold is the number of thresholds tried
213
weightedSD.pooled: a vector of pooled and weighted standard deviations for each gene
214
se.scale: scale factors for the within class standard errors defined as sqrt(1/n.class-1/n)
215
SelectedGenes: names of genes that survived the thresholding for each threshold value
216
SelectedGenesIndex: indexes of genes that survived the thresholding for each threshold value
217
threshold: a vector of the threshold tried in the shrinkage
218
conditions: conditions
219
pred.conditions: a matrix containing the predicted class labels for each threshold value 
220
221
Authors
222
Trevor Hastie,Robert Tibshirani, Balasubramanian Narasimhan, and Gilbert Chu originally wrote pamr.train() in CRAN package pamr which was modified for RNA-Seq classification by Gokmen Zararsiz (gokmenzararsiz@erciyes.edu.tr), Dincer Goksuluk (dincer.goksuluk@hacettepe.edu.tr), Selcuk Korkmaz (selcuk.korkmaz@hacettepe.edu.tr)
223
224
References
225
Tibshirani R, Hastie T, Narasimhan B, et al. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression PNAS 99: 6567-72.
226
Dudoit S, Fridlyand J, Speed TP (2000). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. (Statistics, UC Berkeley, June 2000, Tech Report \#576)
227
Law CW, Chen Y, Shi W, et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, doi:10.1186/gb-2014-15-2-r29
228
                                                                                                                                                    
229
Examples
230
#use cervical data in MLSeq package
231
library(MLSeq)
232
data(cervical)
233
234
#create cervical conditions, train and test sets
235
set.seed(12345)
236
ratio=0.7
237
conditions = factor(rep(c("N","T"), c(29,29)))
238
ind = sample(58, ceiling(58*ratio), FALSE)
239
train = cervical[,ind]
240
test = cervical[,-ind]
241
tr.cond = conditions[ind]
242
ts.cond = conditions[-ind]
243
244
#train a voomNSC classifier using TMM normalization
245
fit = voomNSC.train(counts = train, conditions = tr.cond, normalization = "TMM")
246
247
####
248
249
predict.voomNSC{voomDDA}
250
251
Extract Predictions From voomNSC.train() Objects
252
253
Description
254
This function predicts the class labels of test data for a given voomNSC model
255
256
Usage
257
predict.voomNSC(fit, newdata, threshold = fit$opt.threshold, prior = fit$prior)
258
259
Arguments
260
fit: a fitted training model object after voomNSC.train()
261
newdata: new test read count data to be predicted
262
threshold: threshold value which will be used in the prediction process. default is fit$opt.threshold, the value that gives the minimum training error with the lowest number of genes
263
prior: prior probabilities which will be used in the prediction process. default is fit$prior.
264
265
Details
266
predict.voomNSC() function predicts the class labels of a test data based on the voomNSC training model.
267
268
Value
269
a vector of predicted classes of test data
270
271
Authors
272
Trevor Hastie,Robert Tibshirani, Balasubramanian Narasimhan, and Gilbert Chu originally wrote pamr.train() in CRAN package pamr which was modified for RNA-Seq classification by Gokmen Zararsiz (gokmenzararsiz@erciyes.edu.tr), Dincer Goksuluk (dincer.goksuluk@hacettepe.edu.tr), Selcuk Korkmaz (selcuk.korkmaz@hacettepe.edu.tr)
273
274
References
275
Tibshirani R, Hastie T, Narasimhan B, et al. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression PNAS 99: 6567-72.
276
Dudoit S, Fridlyand J, Speed TP (2000). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. (Statistics, UC Berkeley, June 2000, Tech Report \#576)
277
Law CW, Chen Y, Shi W, et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, doi:10.1186/gb-2014-15-2-r29
278
279
Examples
280
#use cervical data in MLSeq package
281
library(MLSeq)
282
data(cervical)
283
284
#create cervical conditions, train and test sets
285
set.seed(12345)
286
ratio=0.7
287
conditions = factor(rep(c("N","T"), c(29,29)))
288
ind = sample(58, ceiling(58*ratio), FALSE)
289
train = cervical[,ind]
290
test = cervical[,-ind]
291
tr.cond = conditions[ind]
292
ts.cond = conditions[-ind]
293
294
#apply a voomNSC with TMM normalization
295
fit = voomNSC.train(counts = train, conditions = tr.cond, normalization = "TMM")
296
297
#predict the labels of test data classes with the optimum threshold value and create a confusion matrix
298
pred = predict.voomNSC(fit, test)
299
table(ts.cond, pred)
300
301
#use another threshold value
302
pred2 = predict.voomNSC(fit, test, 1.34)
303
table(ts.cond, pred2)
304
305
####