|
a |
|
b/softwares_config/htseq_2.0.1.config |
|
|
1 |
usage: htseq-count [-h] [--version] [-f {sam,bam,auto}] [-r {pos,name}] |
|
|
2 |
[--max-reads-in-buffer MAX_BUFFER_SIZE] |
|
|
3 |
[-s {yes,no,reverse}] [-a MINAQUAL] [-t FEATURE_TYPE] |
|
|
4 |
[-i IDATTR] [--additional-attr ADDITIONAL_ATTRIBUTES] |
|
|
5 |
[--add-chromosome-info] |
|
|
6 |
[-m {union,intersection-strict,intersection-nonempty}] |
|
|
7 |
[--nonunique {none,all,fraction,random}] |
|
|
8 |
[--secondary-alignments {score,ignore}] |
|
|
9 |
[--supplementary-alignments {score,ignore}] [-o SAMOUTS] |
|
|
10 |
[-p {SAM,BAM,sam,bam}] [-d OUTPUT_DELIMITER] |
|
|
11 |
[-c OUTPUT_FILENAME] [--counts-output-sparse] |
|
|
12 |
[--append-output] [-n NPROCESSES] |
|
|
13 |
[--feature-query FEATURE_QUERY] [-q] [--with-header] |
|
|
14 |
samfilenames [samfilenames ...] featuresfilename |
|
|
15 |
|
|
|
16 |
This script takes one or more alignment files in SAM/BAM format and a feature |
|
|
17 |
file in GFF format and calculates for each feature the number of reads mapping |
|
|
18 |
to it. See http://htseq.readthedocs.io/en/master/count.html for details. |
|
|
19 |
|
|
|
20 |
positional arguments: |
|
|
21 |
samfilenames Path to the SAM/BAM files containing the mapped reads. |
|
|
22 |
If '-' is selected, read from standard input |
|
|
23 |
featuresfilename Path to the GTF file containing the features |
|
|
24 |
|
|
|
25 |
optional arguments: |
|
|
26 |
-h, --help show this help message and exit |
|
|
27 |
--version Show software version and exit |
|
|
28 |
-f {sam,bam,auto}, --format {sam,bam,auto} |
|
|
29 |
Type of <alignment_file> data. DEPRECATED: file format |
|
|
30 |
is detected automatically. This option is ignored. |
|
|
31 |
-r {pos,name}, --order {pos,name} |
|
|
32 |
'pos' or 'name'. Sorting order of <alignment_file> |
|
|
33 |
(default: name). Paired-end sequencing data must be |
|
|
34 |
sorted either by position or by read name, and the |
|
|
35 |
sorting order must be specified. Ignored for single- |
|
|
36 |
end data. |
|
|
37 |
--max-reads-in-buffer MAX_BUFFER_SIZE |
|
|
38 |
When <alignment_file> is paired end sorted by |
|
|
39 |
position, allow only so many reads to stay in memory |
|
|
40 |
until the mates are found (raising this number will |
|
|
41 |
use more memory). Has no effect for single end or |
|
|
42 |
paired end sorted by name |
|
|
43 |
-s {yes,no,reverse}, --stranded {yes,no,reverse} |
|
|
44 |
Whether the data is from a strand-specific assay. |
|
|
45 |
Specify 'yes', 'no', or 'reverse' (default: yes). |
|
|
46 |
'reverse' means 'yes' with reversed strand |
|
|
47 |
interpretation |
|
|
48 |
-a MINAQUAL, --minaqual MINAQUAL |
|
|
49 |
Skip all reads with MAPQ alignment quality lower than |
|
|
50 |
the given minimum value (default: 10). MAPQ is the 5th |
|
|
51 |
column of a SAM/BAM file and its usage depends on the |
|
|
52 |
software used to map the reads. |
|
|
53 |
-t FEATURE_TYPE, --type FEATURE_TYPE |
|
|
54 |
Feature type (3rd column in GTF file) to be used, all |
|
|
55 |
features of other type are ignored (default, suitable |
|
|
56 |
for Ensembl GTF files: exon) |
|
|
57 |
-i IDATTR, --idattr IDATTR |
|
|
58 |
GTF attribute to be used as feature ID (default, |
|
|
59 |
suitable for Ensembl GTF files: gene_id). All feature |
|
|
60 |
of the right type (see -t option) within the same GTF |
|
|
61 |
attribute will be added together. The typical way of |
|
|
62 |
using this option is to count all exonic reads from |
|
|
63 |
each gene and add the exons but other uses are |
|
|
64 |
possible as well. You can call this option multiple |
|
|
65 |
times: in that case, the combination of all attributes |
|
|
66 |
separated by colons (:) will be used as a unique |
|
|
67 |
identifier, e.g. for exons you might use -i gene_id -i |
|
|
68 |
exon_number. |
|
|
69 |
--additional-attr ADDITIONAL_ATTRIBUTES |
|
|
70 |
Additional feature attributes (default: none, suitable |
|
|
71 |
for Ensembl GTF files: gene_name). Use multiple times |
|
|
72 |
for more than one additional attribute. These |
|
|
73 |
attributes are only used as annotations in the output, |
|
|
74 |
while the determination of how the counts are added |
|
|
75 |
together is done based on option -i. |
|
|
76 |
--add-chromosome-info |
|
|
77 |
Store information about the chromosome of each feature |
|
|
78 |
as an additional attribute (e.g. colunm in the TSV |
|
|
79 |
output file). |
|
|
80 |
-m {union,intersection-strict,intersection-nonempty}, --mode {union,intersection-strict,intersection-nonempty} |
|
|
81 |
Mode to handle reads overlapping more than one feature |
|
|
82 |
(choices: union, intersection-strict, intersection- |
|
|
83 |
nonempty; default: union) |
|
|
84 |
--nonunique {none,all,fraction,random} |
|
|
85 |
Whether and how to score reads that are not uniquely |
|
|
86 |
aligned or ambiguously assigned to features (choices: |
|
|
87 |
none, all, fraction, random; default: none) |
|
|
88 |
--secondary-alignments {score,ignore} |
|
|
89 |
Whether to score secondary alignments (0x100 flag) |
|
|
90 |
--supplementary-alignments {score,ignore} |
|
|
91 |
Whether to score supplementary alignments (0x800 flag) |
|
|
92 |
-o SAMOUTS, --samout SAMOUTS |
|
|
93 |
Write out all SAM alignment records into SAM/BAM files |
|
|
94 |
(one per input file needed), annotating each line with |
|
|
95 |
its feature assignment (as an optional field with tag |
|
|
96 |
'XF'). See the -p option to use BAM instead of SAM. |
|
|
97 |
-p {SAM,BAM,sam,bam}, --samout-format {SAM,BAM,sam,bam} |
|
|
98 |
Format to use with the --samout option. |
|
|
99 |
-d OUTPUT_DELIMITER, --delimiter OUTPUT_DELIMITER |
|
|
100 |
Column delimiter in output (default: TAB). |
|
|
101 |
-c OUTPUT_FILENAME, --counts_output OUTPUT_FILENAME |
|
|
102 |
Filename to output the counts to instead of stdout. |
|
|
103 |
--counts-output-sparse |
|
|
104 |
Store the counts as a sparse matrix (mtx, h5ad, loom). |
|
|
105 |
--append-output Append counts output to an existing file instead of |
|
|
106 |
creating a new one. This option is useful if you have |
|
|
107 |
already creates a TSV/CSV/similar file with a header |
|
|
108 |
for your samples (with additional columns for the |
|
|
109 |
feature name and any additionl attributes) and want to |
|
|
110 |
fill in the rest of the file. |
|
|
111 |
-n NPROCESSES, --nprocesses NPROCESSES |
|
|
112 |
Number of parallel CPU processes to use (default: 1). |
|
|
113 |
This option is useful to process several input files |
|
|
114 |
at once. Each file will use only 1 CPU. It is |
|
|
115 |
possible, of course, to split a very large input |
|
|
116 |
SAM/BAM files into smaller chunks upstream to make use |
|
|
117 |
of this option. |
|
|
118 |
--feature-query FEATURE_QUERY |
|
|
119 |
Restrict to features descibed in this expression. |
|
|
120 |
Currently supports a single kind of expression: |
|
|
121 |
attribute == "one attr" to restrict the GFF to a |
|
|
122 |
single gene or transcript, e.g. --feature-query |
|
|
123 |
'gene_name == "ACTB"' - notice the single quotes |
|
|
124 |
around the argument of this option and the double |
|
|
125 |
quotes around the gene name. Broader queries might |
|
|
126 |
become available in the future. |
|
|
127 |
-q, --quiet Suppress progress report |
|
|
128 |
--with-header Whether to add a column header to the output TSV file |
|
|
129 |
indicating which column corresponds to which input BAM |
|
|
130 |
file. Only used if output to console or tsv or csv |
|
|
131 |
file. Default to False. |
|
|
132 |
|
|
|
133 |
Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology |
|
|
134 |
Laboratory (EMBL), Givanna Putri (g.putri@unsw.edu.au) and Fabio Zanini |
|
|
135 |
(fabio.zanini@unsw.edu.au), UNSW Sydney. (c) 2010-2021. Released under the |
|
|
136 |
terms of the GNU General Public License v3. Please cite the following paper if |
|
|
137 |
you use this script: G. Putri et al. Analysing high-throughput sequencing data |
|
|
138 |
in Python with HTSeq 2.0. Bioinformatics (2022). |
|
|
139 |
https://doi.org/10.1093/bioinformatics/btac166. Part of the 'HTSeq' framework, |
|
|
140 |
version 2.0.1. |