Diff of /README.md [000000] .. [e26484]

Switch to unified view

a b/README.md
1
# Omicsfold
2
3
![Maturity level-Prototype](https://img.shields.io/badge/Maturity%20Level-Prototype-red)
4
5
![](omicsfold_id.png)
6
7
### Multi-omics data normalisation, model fitting, and visualisation.
8
9
## Overview
10
11
This is a utility R package containing custom code and scripts developed to
12
establish a working approach for integration of multi-omics data.
13
14
The package provides a unified toolkit for the analysis and integration of
15
multi-omic high-throughput data. It relies upon the
16
[`mixOmics`](http://mixomics.org/) toolkit to provide implementations of many of
17
the underlying projection to latent structures (PLS) methods used to analyse
18
high-dimensional data. In addition to this, it includes custom implementations
19
of data pre-processing, normalisation, collation, model validation,
20
visualisation & output functions.
21
22
The originally individual scripts have been collected into a formal package that
23
should be installable and usable within an analysts' R environment without
24
further configuration. The package is fully documented at the function level.
25
26
## Getting Started
27
28
This package and analysis requires R v3.6 or above. It is largely built upon the
29
`mixOmics` integration framework. The dependencies vary significantly in source,
30
so an installation script is provided to make satisfying the dependencies as
31
simple as possible. `mixOmics` installs its own dependencies as well. Note that
32
we install `mixOmics` from the GitHub repository as this version is more up to
33
date than the one on Bioconductor and has a number of fixes which are needed to
34
avoid bugs.
35
36
Notable dependencies that will be installed if they are not already:
37
38
- mixOmics
39
- WGCNA
40
- ggplot2
41
- dplyr & magrittr
42
- reshape2
43
44
See the [`DESCRIPTION`](OmicsFold/DESCRIPTION) file for a complete
45
dependency list
46
47
### Installation
48
49
Due to the number of dependencies and the number of places those dependencies
50
come from, there is an installation script available.  This can be run by
51
opening up an R session in your preferred environment, ensuring your working
52
directory is the `OmicsFold` directory, then issuing the following commands:
53
54
```R
55
source('install.R')
56
install.omicsfold()
57
```
58
59
This should install all the dependencies and then finally the OmicsFold package
60
itself.  If there are any issues due to versions changing or changes in which
61
repository maintains the active version of a package, you may have to update the
62
script.
63
64
If you are having issues installing OmicsFold in a conda environment, please try
65
the following steps: 
66
67
First, create the conda environment:
68
```Shell
69
conda create --name OmicsFold 
70
source activate OmicsFold
71
conda install r=3.6.0
72
conda install -c conda-forge boost-cpp
73
```
74
75
Second, launch R in the conda environment and manually install the following packages (or if you are installing directly in a local instance of R):
76
```R
77
if (!requireNamespace("BiocManager", quietly = TRUE))
78
    install.packages("BiocManager")
79
BiocManager::install("metagenomeSeq")
80
BiocManager::install("org.Mm.eg.db")
81
install.packages("XML", repos = "http://www.omegahat.net/R")
82
source("http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/installAnRichment.R")
83
installAnRichment()
84
source('install.R')
85
install.omicsfold()
86
```
87
For installation using nextflow (https://www.nextflow.io/docs/latest/getstarted.html) please see https://github.com/AstraZeneca/Omicsfold/tree/master/OmicsFold/nextflow_pipeline
88
89
### Usage
90
91
Import the `OmicsFold` and the `mixOmics` packages in R and you're ready to
92
go.  Some functions also require `dplyr` to be loaded so it's a good idea to
93
load it anyway. Certain plotting functions also may require ggplot2 to be loaded.
94
95
```R
96
library(OmicsFold)
97
library(mixOmics)
98
library(dplyr)
99
library (ggplot2) #(optional)
100
```
101
102
### Data Normalisation
103
104
A number of normalisation functions have been provided.  Each has documentation
105
which can be read in the usual way in R.  For example, the help for the function
106
`normalise.tss` can be viewed by calling `?normalise.tss`.  A brief description
107
of the usage of each function can be read in the [Getting Started with
108
Normalisation](docs/getting-started-normalisation.md) document, with a few key
109
functions also showing example code for how to use it.
110
111
- `low.count.removal()`
112
- `normalise.tss()`
113
- `normalise.css()`
114
- `normalise.logit()`
115
- `normalise.logit.empirical()`
116
- `normalise.clr()`
117
- `normalise.clr.within.features()`
118
119
### Analysis of mixOmics Output
120
121
Once a `mixOmics` model has been fitted, OmicsFold can be used to perform a
122
number of visualisation and data extraction functions.  Below is a brief list of
123
the functionality provided.  While these are well documented in the R help
124
system, descriptions of how to use each function can also be found in the
125
[Getting Started with Model Analysis](docs/getting-started-model-analysis.md)
126
document.
127
128
- **Model variance analysis** - functions are provided to extract the percentage
129
  contributions of each component to the model variance and the centroids of
130
  variance across the blocks of a DIABLO model.
131
- **Feature analysis for sPLS-DA models** - feature loadings on the fitted
132
  singleomics model can be exported as a sorted table, while feature stability
133
  across many sparse model fits can also be exported.  As there may be many
134
  components to export stability for, another function lets you combine these
135
  into a single table as well as a plotting function allowing you to plot
136
  stability of the selected features as a visualisation.
137
- **Feature analysis for DIABLO models** - similarly to the features for
138
  singleomics models above, multiomics models can also have feature loadings and
139
  stability exported. Associated correlations between features of different 
140
  blocks can be exported as either a matrix and then also converted to a CSV 
141
  file appropriate for importing into Cytoscape where it can form a network 
142
  graph.
143
- **Model predictivity** - we provide a function to plot the predictivity of a
144
  model from a confusion matrix.
145
- **Utility functions** - offers a way to take long feature names being passed
146
  to plots and truncate them for display.
147
- **BlockRank** - implements a novel approach to analysing feature importance 
148
  between blocks of data.
149
150
151
152
## Other Information
153
154
To contact the maintainers or project director, please refer to the
155
[`AUTHORS`](AUTHORS.md) file.  If you are thinking of contributing to OmicsFold,
156
all the information you will need is in the [`CONTRIBUTING`](CONTRIBUTING.md)
157
file.
158
159
OmicsFold is licensed under the [Apache-2.0 software
160
licence](https://www.apache.org/licenses/LICENSE-2.0) as documented in the
161
[`LICENCE`](LICENCE.md) file.  Separately installed dependencies of OmicsFold
162
may be licensed under different licence agreements.  If you plan to create
163
derivative works from OmicsFold or use OmicsFold for commercial or profitable
164
enterprises, please ensure you adhere to all the expectations of these
165
dependencies and seek legal advice if you are unsure.