Diff of /README.md [000000] .. [c4e594]

Switch to unified view

a b/README.md
1
# Bioinformatics 2018 - Distinguishing prognostic and predictive biomarkers: An information theoretic approach 
2
 
3
Information theoretic predictive biomarker ranking 
4
5
**Date:** 02/02/2018
6
7
**Paper:** Distinguishing prognostic and predictive biomarkers: An information theoretic approach 
8
**Authors:** Konstantinos Sechidis, Konstantinos Papangelou, Paul D. Metcalfe, David Svensson, James Weatherall and Gavin Brown
9
10
**Platform:** R Version 3.3.1
11
12
**Required packages:** MASS, infotheo
13
14
**Maintainer:** Konstantinos Sechidis konstantinos.sechidis@manchester.ac.uk
15
16
**Description:** Deriving rankings that capture the predictive biomarker strength through univariate (INFO) or higher-order (INFO+) methods
17
18
**Functions:**
19
20
```INFOplus.Output_Categorical.Covariates_Categorical(data,labels,treatment,top_k)$ranking``` 
21
This function returns the predictive ranking, the input arguments are
22
23
**data:** A matrix containing the covariates (biomarkers). The columns capture the different covariates, while the rows the different examples (patients). For this function the covariates are categorical (nominal).
24
25
**labels:** A vector that contains the output (target) label for each patient, in this case it takes categorical (nominal) values.
26
27
**treatment:** A vector that describes the treatment allocation (i.e. T=0 control group, T=1 experimental treatment).
28
29
**top_k:** The number of top-k predictive biomarkers to be returned.
30
31
Furthermore we provide functions that can be used for various data types:
32
33
```INFOplus.Output_Categorical.Covariates_Continuous```:  The covariates can be either all continuous or mixed (continuous and categorical). To discretise continuous covariates we follow by default Scott's rule.
34
```INFOplus.Output_Survival.Covariates_Categorical```: For survival (time-to-event) output targets and categorical covariates.
35
```INFOplus.Output_Survival.Covariates_Categorical```: For survival (time-to-event) output targets and continuous or mixed (continuous and categorical) covariates.
36
 
37
Finally, we provide the same functions for deriving the uni-variate INFO ranking.
38
39
40
Example
41
42
We provide a source code (```Functions-GenerateData.R```) to generate the synthetic scenarios presented in the paper. The following example shows how to derive the predictive rankings using our code.
43
44
```
45
## Load libraries
46
library(MASS) # To generate synthetic data by sampling a Multivariate Normal
47
library(infotheo) # Information theoretic library  
48
 
49
## Load sources
50
source("Functions-GenerateData.R") # Function to generate synthetic data
51
source("InformationTheory-PredictiveRankings.R") # Functions to derive predictive rankings
52
53
54
###################################
55
##### Generate synthetic data #####
56
###################################
57
model <- 3 ;         # Which model to use (1, 2, 3, 4, 5, 6, 7) - details on the paper
58
theta_pred <- 1      # Strength of predictive part
59
num_features <- 20   # Number of covariates
60
sample_size <- 2000  # Number of examples
61
62
dataset <- Generate.Data(sample_size,num_features,theta_pred,model)
63
    
64
# The methods will return the top-k biomarkers
65
top_k <-5
66
67
####################################################### 
68
# Ranking the biomarkers on their predictive strength #
69
#######################################################
70
# INFO, which captures first order interactions (returns the top_k = 5 biomarkers)
71
INFO.Output_Categorical.Covariates_Categorical(dataset$data,dataset$labels,dataset$treatment)$ranking[1:top_k] # this function returns the ranking
72
73
# INFO+, which captures second order interactions (returns the top_k = 5 biomarkers)
74
INFOplus.Output_Categorical.Covariates_Categorical(dataset$data,dataset$labels,dataset$treatment,top_k)$ranking # this function returns the ranking
75
76
77
```