2018-Bioinformatics-Pred / Git / Diff of /README.md

Models:

DanielG/

2018-Bioinformatics-Pred

Downloads: 1

Diff of /README.md [000000] .. [c4e594]

Switch to unified view

 b/README.md
+# Bioinformatics 2018 - Distinguishing prognostic and predictive biomarkers: An information theoretic approach
+Information theoretic predictive biomarker ranking
+**Date:** 02/02/2018
+**Paper:** Distinguishing prognostic and predictive biomarkers: An information theoretic approach
+**Authors:** Konstantinos Sechidis, Konstantinos Papangelou, Paul D. Metcalfe, David Svensson, James Weatherall and Gavin Brown
+**Platform:** R Version 3.3.1
+**Required packages:** MASS, infotheo
+**Maintainer:** Konstantinos Sechidis konstantinos.sechidis@manchester.ac.uk
+**Description:** Deriving rankings that capture the predictive biomarker strength through univariate (INFO) or higher-order (INFO+) methods
+**Functions:**
+```INFOplus.Output_Categorical.Covariates_Categorical(data,labels,treatment,top_k)$ranking```
+This function returns the predictive ranking, the input arguments are
+**data:** A matrix containing the covariates (biomarkers). The columns capture the different covariates, while the rows the different examples (patients). For this function the covariates are categorical (nominal).
+**labels:** A vector that contains the output (target) label for each patient, in this case it takes categorical (nominal) values.
+**treatment:** A vector that describes the treatment allocation (i.e. T=0 control group, T=1 experimental treatment).
+**top_k:** The number of top-k predictive biomarkers to be returned.
+Furthermore we provide functions that can be used for various data types:
+```INFOplus.Output_Categorical.Covariates_Continuous```:  The covariates can be either all continuous or mixed (continuous and categorical). To discretise continuous covariates we follow by default Scott's rule.
+```INFOplus.Output_Survival.Covariates_Categorical```: For survival (time-to-event) output targets and categorical covariates.
+```INFOplus.Output_Survival.Covariates_Categorical```: For survival (time-to-event) output targets and continuous or mixed (continuous and categorical) covariates.
+Finally, we provide the same functions for deriving the uni-variate INFO ranking.
+Example
+We provide a source code (```Functions-GenerateData.R```) to generate the synthetic scenarios presented in the paper. The following example shows how to derive the predictive rankings using our code.
+```
+## Load libraries
+library(MASS) # To generate synthetic data by sampling a Multivariate Normal
+library(infotheo) # Information theoretic library
+## Load sources
+source("Functions-GenerateData.R") # Function to generate synthetic data
+source("InformationTheory-PredictiveRankings.R") # Functions to derive predictive rankings
+###################################
+##### Generate synthetic data #####
+###################################
+model <- 3 ;         # Which model to use (1, 2, 3, 4, 5, 6, 7) - details on the paper
+theta_pred <- 1      # Strength of predictive part
+num_features <- 20   # Number of covariates
+sample_size <- 2000  # Number of examples
+dataset <- Generate.Data(sample_size,num_features,theta_pred,model)
+# The methods will return the top-k biomarkers
+top_k <-5
+#######################################################
+# Ranking the biomarkers on their predictive strength #
+#######################################################
+# INFO, which captures first order interactions (returns the top_k = 5 biomarkers)
+INFO.Output_Categorical.Covariates_Categorical(dataset$data,dataset$labels,dataset$treatment)$ranking[1:top_k] # this function returns the ranking
+# INFO+, which captures second order interactions (returns the top_k = 5 biomarkers)
+INFOplus.Output_Categorical.Covariates_Categorical(dataset$data,dataset$labels,dataset$treatment,top_k)$ranking # this function returns the ranking
+```