Starting in Deepvariant v1.1, make_examples has an optional parameter called
--runtime_by_region
, which enables make_examples to output the runtime for
each region, broken up into the different stages of processing each region: "get
reads", "find candidates", "make pileup images", and "write outputs". These
stats, along with the number of reads, candidates, and examples processed for
each region, are output into a tab-separated value (TSV) file, or one TSV file
per task if the examples are sharded.
This TSV file can then be visualized using the runtime_by_region_vis
script,
creating a visual report.
Example reports for typical runs:
When using the one-step run_deepvariant
script, supply a --logging_dir
directory and set --runtime_report
. For example, when following the
quick start, including --runtime_report
--logging_dir=/output/logs
would produce the following runtime profiling output
file and visual report:
/output/logs/make_examples_runtime_by_region/make_examples_runtime-00000-of-00001.tsv
/output/logs/make_examples_runtime_by_region_report.html
If you are using make_examples itself without the run_deepvariant script, supply
make_examples with --runtime_by_region=
a filename (.tsv). If the examples are
sharded then the runtime profile should be sharded into the same number of
shards, e.g. when using examples@64
then runtimes could be runtimes@64.tsv
.
Then use the runtime_by_region_vis
script to create a visual report of the
make_examples runtime by region data. Continuing from the quick start, it looks
like this:
BIN_VERSION="1.6.1" # Only available in v1.1+.
docker run \
-v "INPUT_DIR":"/input" \
-v "OUTPUT_DIR:/output" \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/runtime_by_region_vis \
--input=/output/logs/make_examples_runtime_by_region/make_examples_runtime-00000-of-00001.tsv \
--output=/output/logs/make_examples_runtime_by_region/make_examples_runtime_vis.html \
--title="Quick start runtime profiling"
make_examples
, the first stage of DeepVariant, starts from a BAM file and
corresponding reference FASTA file. One 1000 bp region at a time (set by
--partition_size
), make_examples
will:
Get reads: Query the bam files for all the reads in the region. Optionally
(--realign_reads
) do a local assembly of the reads and realign the reads
to the resulting haplotype graph.
Find candidates: Catalogue all the putative alternate alleles for all those
reads, and compare the accumulated evidence for each alt against thresholds
(see --vsc_min*
parameters for the thresholds).
Make pileup images: Represent the reads as a
pileup image tensor.
When --alt_aligned_pileup
is enabled, those alignments are included in
this step.
Write output: Write the examples composed of candidates and pileup images
out to a tfrecord file.
This is the top of one example file (tab-delimited but shown nicely spaced here
for readability):
region get reads find candidates make pileup images write outputs num reads num candidates num examples
chr1:1-1000 0.025 0.016 0.0 0.0 0 0 0
chr1:64001-65000 0.032 0.044 0.027 0.001 25 1 1
chr1:128001-129000 0.059 0.047 0.081 0.01 59 2 2
chr1:192001-193000 0.058 0.045 0.261 0.078 56 13 13
...
The runtimes for the four stages are all captured in seconds.
All the charts are built from the same types of data as shown above. Here "task"
will refer to the runtimes captured when producing one shard of examples. For
example, if there are 64 shards, task 1 is the task that produced
examples-00001-of-00064.tfrecord.gz
.
make pileup images
stage will show morewrite outputs