[02ea2d]: / README.Rmd

Download this file

94 lines (67 with data), 4.3 kB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/"
)
```

# DeepG <img src="man/figures/logo_small.png"  align="left" vspace="-1800px"/>

**deepG: R toolbox for deep neural networks optimized for genomic datasets**
<!---
% <p><img alt="DeepG logo" height="70px" src="man/figures/logo_small.png" align="left" hspace="-1000px" vspace="-180px"></p>
-->


The goal of the package is to speed up the development of bioinformatical tools for sequence classification, homology detection and other bioinformatical tasks. It is developed for biologists and advanced AI researchers. DeepG is a collaborative effort from the McHardy Lab at the *Helmholtz Centre for Infection Research*, the chair of Statistical Learning and Data Science at the *Ludwig Maximilian University of Munich* of Prof. Dr. Bernd Bischl and the Huttenhower lab at *Harvard T.H. Chan School of Public Health*.

[![DOI](https://zenodo.org/badge/387820006.svg)](https://zenodo.org/badge/latestdoi/387820006)

## Overview

The package offers several functions to create, train and evaluate neural networks as well as data processing.

+ **Data processing**
  + Different options to encode fasta/fastq file (one-hot encoding, coverage or quality score encoding).
  + Different options to handle ambiguous nucleotides.
  + Create data generator to handle large collections of files.
+ **Deep learning architectures**
  + Create network architectures with single function call.
  + Custom loss and metric functions available.
+ **Model training** 
  + Automatically create model/data pipeline.
+ **Visualizing training progress**
  + Visualize training progress and metrics in tensorboard.   
+ **Model evaluation**
  + Evaluate trained models.
+ **Model interpretability**
  + Use Integrated Gradient to visualize relationship of model's predictions with regard to its input.
    

## Installation

Install the tensorflow python package

```{r, eval=FALSE, message=FALSE}
install.packages("tensorflow")
tensorflow::install_tensorflow()
```

and afterwards install the latest version of deepG from github 

```{r, eval=FALSE, message=FALSE}
devtools::install_github("GenomeNet/deepG")
```

```{r, echo=FALSE, warning=FALSE, message=FALSE}
devtools::load_all(path = "~/deepG")
```

## Usage 

See the Package website at https://deepg.de for documentation and example code.

 <!-- ## Examples  -->

<!-- ## Datasets -->

<!-- The library comes with mutiple different datasets for testing: -->

<!-- - The set `data(parenthesis)` contains 100k characters of the parenthesis synthetic language generated from a very simple counting language with a parenthesis and letter alphabet Σ = {( ) 0 1 2 3 4 }. The language is constrained to match parentheses, and nesting is limited to at most 4 levels deep. Each opening parenthesis increases and each closing parenthesis decreases the nesting level, respectively. Numbers are generated randomly, but are constrained to indicate the nesting level at their position. -->
<!-- - The set `data(crispr_full)` containing all CRISPR loci found in NCBI representative genomes with neighbor nucleotides up and downstream. -->
<!-- - The set `data(crispr_sample)` containing a subset of `data(crispr_full)`. -->
<!-- - The set `data(ecoli)` contains the *E. coli* genome, see [the genome sequence of Escherichia coli K-12](https://science.sciencemag.org/content/277/5331/1453.long). -->
<!-- - The set `data(ecoli_small)` contains a subset of `data(ecoli)`. -->

<!---
## Installation and Usage

Please see our [Wiki](https://github.com/hiddengenome/deepG/wiki) for further installation instructions. It covers also usage instructions for multi-GPU machines.

- [Installation on desktop machine](https://github.com/hiddengenome/deepG/wiki/Installation-of-deepG-on-desktop)
- [Installation on GPU server](https://github.com/hiddengenome/deepG/wiki/Installation-of-deepG-on-GPU-server)
- [Installation AWS](https://github.com/hiddengenome/deepG/wiki/Installation-AWS)
- [GPU Usage](https://github.com/hiddengenome/deepG/wiki/manage-GPU-usage)
- [Tensorboard Integration](https://github.com/hiddengenome/deepG/wiki/Tensorboard-integration)

See the help files `?deepG` to get started and for questions use the [FAQ](https://github.com/hiddengenome/deepG/wiki/FAQ).
-->