|
a |
|
b/README.md |
|
|
1 |
# Omicsfold |
|
|
2 |
|
|
|
3 |
 |
|
|
4 |
|
|
|
5 |
 |
|
|
6 |
|
|
|
7 |
### Multi-omics data normalisation, model fitting, and visualisation. |
|
|
8 |
|
|
|
9 |
## Overview |
|
|
10 |
|
|
|
11 |
This is a utility R package containing custom code and scripts developed to |
|
|
12 |
establish a working approach for integration of multi-omics data. |
|
|
13 |
|
|
|
14 |
The package provides a unified toolkit for the analysis and integration of |
|
|
15 |
multi-omic high-throughput data. It relies upon the |
|
|
16 |
[`mixOmics`](http://mixomics.org/) toolkit to provide implementations of many of |
|
|
17 |
the underlying projection to latent structures (PLS) methods used to analyse |
|
|
18 |
high-dimensional data. In addition to this, it includes custom implementations |
|
|
19 |
of data pre-processing, normalisation, collation, model validation, |
|
|
20 |
visualisation & output functions. |
|
|
21 |
|
|
|
22 |
The originally individual scripts have been collected into a formal package that |
|
|
23 |
should be installable and usable within an analysts' R environment without |
|
|
24 |
further configuration. The package is fully documented at the function level. |
|
|
25 |
|
|
|
26 |
## Getting Started |
|
|
27 |
|
|
|
28 |
This package and analysis requires R v3.6 or above. It is largely built upon the |
|
|
29 |
`mixOmics` integration framework. The dependencies vary significantly in source, |
|
|
30 |
so an installation script is provided to make satisfying the dependencies as |
|
|
31 |
simple as possible. `mixOmics` installs its own dependencies as well. Note that |
|
|
32 |
we install `mixOmics` from the GitHub repository as this version is more up to |
|
|
33 |
date than the one on Bioconductor and has a number of fixes which are needed to |
|
|
34 |
avoid bugs. |
|
|
35 |
|
|
|
36 |
Notable dependencies that will be installed if they are not already: |
|
|
37 |
|
|
|
38 |
- mixOmics |
|
|
39 |
- WGCNA |
|
|
40 |
- ggplot2 |
|
|
41 |
- dplyr & magrittr |
|
|
42 |
- reshape2 |
|
|
43 |
|
|
|
44 |
See the [`DESCRIPTION`](OmicsFold/DESCRIPTION) file for a complete |
|
|
45 |
dependency list |
|
|
46 |
|
|
|
47 |
### Installation |
|
|
48 |
|
|
|
49 |
Due to the number of dependencies and the number of places those dependencies |
|
|
50 |
come from, there is an installation script available. This can be run by |
|
|
51 |
opening up an R session in your preferred environment, ensuring your working |
|
|
52 |
directory is the `OmicsFold` directory, then issuing the following commands: |
|
|
53 |
|
|
|
54 |
```R |
|
|
55 |
source('install.R') |
|
|
56 |
install.omicsfold() |
|
|
57 |
``` |
|
|
58 |
|
|
|
59 |
This should install all the dependencies and then finally the OmicsFold package |
|
|
60 |
itself. If there are any issues due to versions changing or changes in which |
|
|
61 |
repository maintains the active version of a package, you may have to update the |
|
|
62 |
script. |
|
|
63 |
|
|
|
64 |
If you are having issues installing OmicsFold in a conda environment, please try |
|
|
65 |
the following steps: |
|
|
66 |
|
|
|
67 |
First, create the conda environment: |
|
|
68 |
```Shell |
|
|
69 |
conda create --name OmicsFold |
|
|
70 |
source activate OmicsFold |
|
|
71 |
conda install r=3.6.0 |
|
|
72 |
conda install -c conda-forge boost-cpp |
|
|
73 |
``` |
|
|
74 |
|
|
|
75 |
Second, launch R in the conda environment and manually install the following packages (or if you are installing directly in a local instance of R): |
|
|
76 |
```R |
|
|
77 |
if (!requireNamespace("BiocManager", quietly = TRUE)) |
|
|
78 |
install.packages("BiocManager") |
|
|
79 |
BiocManager::install("metagenomeSeq") |
|
|
80 |
BiocManager::install("org.Mm.eg.db") |
|
|
81 |
install.packages("XML", repos = "http://www.omegahat.net/R") |
|
|
82 |
source("http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/installAnRichment.R") |
|
|
83 |
installAnRichment() |
|
|
84 |
source('install.R') |
|
|
85 |
install.omicsfold() |
|
|
86 |
``` |
|
|
87 |
For installation using nextflow (https://www.nextflow.io/docs/latest/getstarted.html) please see https://github.com/AstraZeneca/Omicsfold/tree/master/OmicsFold/nextflow_pipeline |
|
|
88 |
|
|
|
89 |
### Usage |
|
|
90 |
|
|
|
91 |
Import the `OmicsFold` and the `mixOmics` packages in R and you're ready to |
|
|
92 |
go. Some functions also require `dplyr` to be loaded so it's a good idea to |
|
|
93 |
load it anyway. Certain plotting functions also may require ggplot2 to be loaded. |
|
|
94 |
|
|
|
95 |
```R |
|
|
96 |
library(OmicsFold) |
|
|
97 |
library(mixOmics) |
|
|
98 |
library(dplyr) |
|
|
99 |
library (ggplot2) #(optional) |
|
|
100 |
``` |
|
|
101 |
|
|
|
102 |
### Data Normalisation |
|
|
103 |
|
|
|
104 |
A number of normalisation functions have been provided. Each has documentation |
|
|
105 |
which can be read in the usual way in R. For example, the help for the function |
|
|
106 |
`normalise.tss` can be viewed by calling `?normalise.tss`. A brief description |
|
|
107 |
of the usage of each function can be read in the [Getting Started with |
|
|
108 |
Normalisation](docs/getting-started-normalisation.md) document, with a few key |
|
|
109 |
functions also showing example code for how to use it. |
|
|
110 |
|
|
|
111 |
- `low.count.removal()` |
|
|
112 |
- `normalise.tss()` |
|
|
113 |
- `normalise.css()` |
|
|
114 |
- `normalise.logit()` |
|
|
115 |
- `normalise.logit.empirical()` |
|
|
116 |
- `normalise.clr()` |
|
|
117 |
- `normalise.clr.within.features()` |
|
|
118 |
|
|
|
119 |
### Analysis of mixOmics Output |
|
|
120 |
|
|
|
121 |
Once a `mixOmics` model has been fitted, OmicsFold can be used to perform a |
|
|
122 |
number of visualisation and data extraction functions. Below is a brief list of |
|
|
123 |
the functionality provided. While these are well documented in the R help |
|
|
124 |
system, descriptions of how to use each function can also be found in the |
|
|
125 |
[Getting Started with Model Analysis](docs/getting-started-model-analysis.md) |
|
|
126 |
document. |
|
|
127 |
|
|
|
128 |
- **Model variance analysis** - functions are provided to extract the percentage |
|
|
129 |
contributions of each component to the model variance and the centroids of |
|
|
130 |
variance across the blocks of a DIABLO model. |
|
|
131 |
- **Feature analysis for sPLS-DA models** - feature loadings on the fitted |
|
|
132 |
singleomics model can be exported as a sorted table, while feature stability |
|
|
133 |
across many sparse model fits can also be exported. As there may be many |
|
|
134 |
components to export stability for, another function lets you combine these |
|
|
135 |
into a single table as well as a plotting function allowing you to plot |
|
|
136 |
stability of the selected features as a visualisation. |
|
|
137 |
- **Feature analysis for DIABLO models** - similarly to the features for |
|
|
138 |
singleomics models above, multiomics models can also have feature loadings and |
|
|
139 |
stability exported. Associated correlations between features of different |
|
|
140 |
blocks can be exported as either a matrix and then also converted to a CSV |
|
|
141 |
file appropriate for importing into Cytoscape where it can form a network |
|
|
142 |
graph. |
|
|
143 |
- **Model predictivity** - we provide a function to plot the predictivity of a |
|
|
144 |
model from a confusion matrix. |
|
|
145 |
- **Utility functions** - offers a way to take long feature names being passed |
|
|
146 |
to plots and truncate them for display. |
|
|
147 |
- **BlockRank** - implements a novel approach to analysing feature importance |
|
|
148 |
between blocks of data. |
|
|
149 |
|
|
|
150 |
|
|
|
151 |
|
|
|
152 |
## Other Information |
|
|
153 |
|
|
|
154 |
To contact the maintainers or project director, please refer to the |
|
|
155 |
[`AUTHORS`](AUTHORS.md) file. If you are thinking of contributing to OmicsFold, |
|
|
156 |
all the information you will need is in the [`CONTRIBUTING`](CONTRIBUTING.md) |
|
|
157 |
file. |
|
|
158 |
|
|
|
159 |
OmicsFold is licensed under the [Apache-2.0 software |
|
|
160 |
licence](https://www.apache.org/licenses/LICENSE-2.0) as documented in the |
|
|
161 |
[`LICENCE`](LICENCE.md) file. Separately installed dependencies of OmicsFold |
|
|
162 |
may be licensed under different licence agreements. If you plan to create |
|
|
163 |
derivative works from OmicsFold or use OmicsFold for commercial or profitable |
|
|
164 |
enterprises, please ensure you adhere to all the expectations of these |
|
|
165 |
dependencies and seek legal advice if you are unsure. |