|
a |
|
b/README.md |
|
|
1 |
|
|
|
2 |
# DeepG <img src="man/figures/logo_small.png" align="left" vspace="-1800px"/> |
|
|
3 |
|
|
|
4 |
**deepG: toolbox for deep neural networks optimized for genomic |
|
|
5 |
datasets** <!--- |
|
|
6 |
% <p><img alt="DeepG logo" height="70px" src="man/figures/logo_small.png" align="left" hspace="-1000px" vspace="-180px"></p> |
|
|
7 |
--> |
|
|
8 |
|
|
|
9 |
The goal of the package is to speed up the development of |
|
|
10 |
bioinformatical tools for sequence classification, homology detection |
|
|
11 |
and other bioinformatical tasks. It is developed for biologists and |
|
|
12 |
advanced AI researchers. DeepG is a collaborative effort from the |
|
|
13 |
McHardy Lab at the *Helmholtz Centre for Infection Research*, the Chair of |
|
|
14 |
Statistical Learning and Data Science at the *Ludwig Maximilian |
|
|
15 |
University of Munich* and the Huttenhower lab at *Harvard T.H. Chan |
|
|
16 |
School of Public Health*. |
|
|
17 |
|
|
|
18 |
[](https://zenodo.org/badge/latestdoi/387820006) |
|
|
19 |
|
|
|
20 |
## Overview |
|
|
21 |
|
|
|
22 |
The package offers several functions to create, train and evaluate |
|
|
23 |
neural networks as well as data processing. |
|
|
24 |
|
|
|
25 |
- **Data processing** |
|
|
26 |
- Create data generator to handle large collections of files. |
|
|
27 |
- Different options to encode fasta/fastq file (one-hot encoding, |
|
|
28 |
coverage or quality score encoding). |
|
|
29 |
- Different options to handle ambiguous nucleotides. |
|
|
30 |
- **Deep learning architectures** |
|
|
31 |
- Create network architectures with single function call. |
|
|
32 |
- Custom loss and metric functions available. |
|
|
33 |
- **Model training** |
|
|
34 |
- Automatically create model/data pipeline. |
|
|
35 |
- **Visualizing training progress** |
|
|
36 |
- Visualize training progress and metrics in tensorboard. |
|
|
37 |
- **Model evaluation** |
|
|
38 |
- Evaluate trained models. |
|
|
39 |
- **Model interpretability** |
|
|
40 |
- Use Integrated Gradient to visualize relationship of model’s |
|
|
41 |
predictions with regard to its input. |
|
|
42 |
|
|
|
43 |
## Installation |
|
|
44 |
|
|
|
45 |
Install the tensorflow python package |
|
|
46 |
|
|
|
47 |
``` r |
|
|
48 |
install.packages("tensorflow") |
|
|
49 |
tensorflow::install_tensorflow() |
|
|
50 |
``` |
|
|
51 |
|
|
|
52 |
and afterwards install the latest version of deepG from github |
|
|
53 |
|
|
|
54 |
``` r |
|
|
55 |
devtools::install_github("GenomeNet/deepG") |
|
|
56 |
``` |
|
|
57 |
|
|
|
58 |
## Usage |
|
|
59 |
|
|
|
60 |
See the Package website at <https://deepg.de> for documentation and |
|
|
61 |
example code. |
|
|
62 |
|
|
|
63 |
<!-- ## Examples --> |
|
|
64 |
|
|
|
65 |
<!-- ## Datasets --> |
|
|
66 |
<!-- The library comes with mutiple different datasets for testing: --> |
|
|
67 |
<!-- - The set `data(parenthesis)` contains 100k characters of the parenthesis synthetic language generated from a very simple counting language with a parenthesis and letter alphabet Σ = {( ) 0 1 2 3 4 }. The language is constrained to match parentheses, and nesting is limited to at most 4 levels deep. Each opening parenthesis increases and each closing parenthesis decreases the nesting level, respectively. Numbers are generated randomly, but are constrained to indicate the nesting level at their position. --> |
|
|
68 |
<!-- - The set `data(crispr_full)` containing all CRISPR loci found in NCBI representative genomes with neighbor nucleotides up and downstream. --> |
|
|
69 |
<!-- - The set `data(crispr_sample)` containing a subset of `data(crispr_full)`. --> |
|
|
70 |
<!-- - The set `data(ecoli)` contains the *E. coli* genome, see [the genome sequence of Escherichia coli K-12](https://science.sciencemag.org/content/277/5331/1453.long). --> |
|
|
71 |
<!-- - The set `data(ecoli_small)` contains a subset of `data(ecoli)`. --> |
|
|
72 |
<!--- |
|
|
73 |
## Installation and Usage |
|
|
74 |
Please see our [Wiki](https://github.com/hiddengenome/deepG/wiki) for further installation instructions. It covers also usage instructions for multi-GPU machines. |
|
|
75 |
- [Installation on desktop machine](https://github.com/hiddengenome/deepG/wiki/Installation-of-deepG-on-desktop) |
|
|
76 |
- [Installation on GPU server](https://github.com/hiddengenome/deepG/wiki/Installation-of-deepG-on-GPU-server) |
|
|
77 |
- [Installation AWS](https://github.com/hiddengenome/deepG/wiki/Installation-AWS) |
|
|
78 |
- [GPU Usage](https://github.com/hiddengenome/deepG/wiki/manage-GPU-usage) |
|
|
79 |
- [Tensorboard Integration](https://github.com/hiddengenome/deepG/wiki/Tensorboard-integration) |
|
|
80 |
See the help files `?deepG` to get started and for questions use the [FAQ](https://github.com/hiddengenome/deepG/wiki/FAQ). |
|
|
81 |
--> |