Diff of /README.md [000000] .. [e26484]

Switch to side-by-side view

--- a
+++ b/README.md
@@ -0,0 +1,165 @@
+# Omicsfold
+
+![Maturity level-Prototype](https://img.shields.io/badge/Maturity%20Level-Prototype-red)
+
+![](omicsfold_id.png)
+
+### Multi-omics data normalisation, model fitting, and visualisation.
+
+## Overview
+
+This is a utility R package containing custom code and scripts developed to
+establish a working approach for integration of multi-omics data.
+
+The package provides a unified toolkit for the analysis and integration of
+multi-omic high-throughput data. It relies upon the
+[`mixOmics`](http://mixomics.org/) toolkit to provide implementations of many of
+the underlying projection to latent structures (PLS) methods used to analyse
+high-dimensional data. In addition to this, it includes custom implementations
+of data pre-processing, normalisation, collation, model validation,
+visualisation & output functions.
+
+The originally individual scripts have been collected into a formal package that
+should be installable and usable within an analysts' R environment without
+further configuration. The package is fully documented at the function level.
+
+## Getting Started
+
+This package and analysis requires R v3.6 or above. It is largely built upon the
+`mixOmics` integration framework. The dependencies vary significantly in source,
+so an installation script is provided to make satisfying the dependencies as
+simple as possible. `mixOmics` installs its own dependencies as well. Note that
+we install `mixOmics` from the GitHub repository as this version is more up to
+date than the one on Bioconductor and has a number of fixes which are needed to
+avoid bugs.
+
+Notable dependencies that will be installed if they are not already:
+
+- mixOmics
+- WGCNA
+- ggplot2
+- dplyr & magrittr
+- reshape2
+
+See the [`DESCRIPTION`](OmicsFold/DESCRIPTION) file for a complete
+dependency list
+
+### Installation
+
+Due to the number of dependencies and the number of places those dependencies
+come from, there is an installation script available.  This can be run by
+opening up an R session in your preferred environment, ensuring your working
+directory is the `OmicsFold` directory, then issuing the following commands:
+
+```R
+source('install.R')
+install.omicsfold()
+```
+
+This should install all the dependencies and then finally the OmicsFold package
+itself.  If there are any issues due to versions changing or changes in which
+repository maintains the active version of a package, you may have to update the
+script.
+
+If you are having issues installing OmicsFold in a conda environment, please try
+the following steps: 
+
+First, create the conda environment:
+```Shell
+conda create --name OmicsFold 
+source activate OmicsFold
+conda install r=3.6.0
+conda install -c conda-forge boost-cpp
+```
+
+Second, launch R in the conda environment and manually install the following packages (or if you are installing directly in a local instance of R):
+```R
+if (!requireNamespace("BiocManager", quietly = TRUE))
+    install.packages("BiocManager")
+BiocManager::install("metagenomeSeq")
+BiocManager::install("org.Mm.eg.db")
+install.packages("XML", repos = "http://www.omegahat.net/R")
+source("http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/installAnRichment.R")
+installAnRichment()
+source('install.R')
+install.omicsfold()
+```
+For installation using nextflow (https://www.nextflow.io/docs/latest/getstarted.html) please see https://github.com/AstraZeneca/Omicsfold/tree/master/OmicsFold/nextflow_pipeline
+
+### Usage
+
+Import the `OmicsFold` and the `mixOmics` packages in R and you're ready to
+go.  Some functions also require `dplyr` to be loaded so it's a good idea to
+load it anyway. Certain plotting functions also may require ggplot2 to be loaded.
+
+```R
+library(OmicsFold)
+library(mixOmics)
+library(dplyr)
+library (ggplot2) #(optional)
+```
+
+### Data Normalisation
+
+A number of normalisation functions have been provided.  Each has documentation
+which can be read in the usual way in R.  For example, the help for the function
+`normalise.tss` can be viewed by calling `?normalise.tss`.  A brief description
+of the usage of each function can be read in the [Getting Started with
+Normalisation](docs/getting-started-normalisation.md) document, with a few key
+functions also showing example code for how to use it.
+
+- `low.count.removal()`
+- `normalise.tss()`
+- `normalise.css()`
+- `normalise.logit()`
+- `normalise.logit.empirical()`
+- `normalise.clr()`
+- `normalise.clr.within.features()`
+
+### Analysis of mixOmics Output
+
+Once a `mixOmics` model has been fitted, OmicsFold can be used to perform a
+number of visualisation and data extraction functions.  Below is a brief list of
+the functionality provided.  While these are well documented in the R help
+system, descriptions of how to use each function can also be found in the
+[Getting Started with Model Analysis](docs/getting-started-model-analysis.md)
+document.
+
+- **Model variance analysis** - functions are provided to extract the percentage
+  contributions of each component to the model variance and the centroids of
+  variance across the blocks of a DIABLO model.
+- **Feature analysis for sPLS-DA models** - feature loadings on the fitted
+  singleomics model can be exported as a sorted table, while feature stability
+  across many sparse model fits can also be exported.  As there may be many
+  components to export stability for, another function lets you combine these
+  into a single table as well as a plotting function allowing you to plot
+  stability of the selected features as a visualisation.
+- **Feature analysis for DIABLO models** - similarly to the features for
+  singleomics models above, multiomics models can also have feature loadings and
+  stability exported. Associated correlations between features of different 
+  blocks can be exported as either a matrix and then also converted to a CSV 
+  file appropriate for importing into Cytoscape where it can form a network 
+  graph.
+- **Model predictivity** - we provide a function to plot the predictivity of a
+  model from a confusion matrix.
+- **Utility functions** - offers a way to take long feature names being passed
+  to plots and truncate them for display.
+- **BlockRank** - implements a novel approach to analysing feature importance 
+  between blocks of data.
+
+
+
+## Other Information
+
+To contact the maintainers or project director, please refer to the
+[`AUTHORS`](AUTHORS.md) file.  If you are thinking of contributing to OmicsFold,
+all the information you will need is in the [`CONTRIBUTING`](CONTRIBUTING.md)
+file.
+
+OmicsFold is licensed under the [Apache-2.0 software
+licence](https://www.apache.org/licenses/LICENSE-2.0) as documented in the
+[`LICENCE`](LICENCE.md) file.  Separately installed dependencies of OmicsFold
+may be licensed under different licence agreements.  If you plan to create
+derivative works from OmicsFold or use OmicsFold for commercial or profitable
+enterprises, please ensure you adhere to all the expectations of these
+dependencies and seek legal advice if you are unsure.