Diff of /docs/index.rst [000000] .. [3b722e]

Switch to unified view

a b/docs/index.rst
1
.. _oddt::
2
.. highlight:: python
3
4
********************************
5
Welcome to ODDT's documentation!
6
********************************
7
8
.. contents::
9
    :depth: 5
10
11
Installation
12
============
13
14
Requirements
15
````````````
16
17
* Python 3.6+
18
* OpenBabel (3.0+) or/and RDKit (2018.03+)
19
* Numpy (1.12+)
20
* Scipy (0.19+)
21
* Sklearn (0.18+)
22
* joblib (0.10+)
23
* pandas (0.19.2+)
24
* Skimage (0.12.3+) (optional, only for surface generation)
25
26
.. note:: All installation methods assume that one of toolkits is installed. For detailed installation procedure visit toolkit’s website (OpenBabel, RDKit)
27
28
Most convenient way of installing ODDT is using PIP. All required python modules will be installed automatically, although toolkits, either OpenBabel (``pip install openbabel``) or RDKit need to be installed manually
29
30
.. code-block:: bash
31
32
    pip install oddt
33
34
If you want to install cutting edge version (master branch from GitHub) of ODDT also using PIP
35
36
.. code-block:: bash
37
38
    pip install git+https://github.com/oddt/oddt.git@master
39
40
Finally you can install ODDT straight from the source
41
42
.. code-block:: bash
43
44
    wget https://github.com/oddt/oddt/archive/0.8.tar.gz
45
    tar zxvf 0.8.tar.gz
46
    cd oddt-0.8/
47
    python setup.py install
48
49
Common installation problems
50
````````````````````````````
51
52
53
Usage Instructions
54
==================
55
Toolkits
56
--------
57
58
You can use any supported toolkit united under common API (for reference see `Pybel <https://open-babel.readthedocs.org/en/latest/UseTheLibrary/Python_Pybel.html>`_ or `Cinfony <https://code.google.com/p/cinfony/>`_). All methods and software which based on Pybel/Cinfony should be drop in compatible with ODDT toolkits. In contrast to its predecessors, which were aimed to have minimalistic API, ODDT introduces extended methods and additional handles. This extensions allow to use toolkits at all its grace and some features may be backported from others to introduce missing functionalities.
59
To name a few:
60
61
* coordinates are returned as Numpy Arrays
62
* atoms and residues methods of Molecule class are lazy, ie. not returning a list of pointers, rather an object which allows indexing and iterating through atoms/residues
63
* Bond object (similar to Atom)
64
* `atom_dict`_, `ring_dict`_, `res_dict`_ - comprehensive Numpy Arrays containing common information about given entity, particularly useful for high performance computing, ie. interactions, scoring etc.
65
* lazy Molecule (asynchronous), which is not converted to an object in reading phase, rather passed as a string and read in when underlying object is called
66
* pickling introduced for Pybel Molecule (internally saved to mol2 string)
67
68
Molecules
69
---------
70
71
Atom, residues, bonds iteration
72
```````````````````````````````
73
74
One of the most common operation would be iterating through molecules atoms
75
76
.. code-block:: Python
77
78
    mol = oddt.toolkit.readstring('smi', 'c1cccc1')
79
    for atom in mol:
80
        print(atom.idx)
81
82
.. note:: mol.atoms, returns an object (:class:`~oddt.toolkit.AtomStack`) which can be access via indexes or iterated
83
84
Iterating over residues is also very convenient, especially for proteins
85
86
.. code-block:: python
87
88
    for res in mol.residues:
89
        print(res.name)
90
91
Additionally residues can fetch atoms belonging to them:
92
93
.. code-block:: python
94
95
    for res in mol.residues:
96
        for atom in res:
97
            print(atom.idx)
98
99
Bonds are also iterable, similar to residues:
100
101
.. code-block:: python
102
103
    for bond in mol.bonds:
104
        print(bond.order)
105
        for atom in bond:
106
            print(atom.idx)
107
108
Reading molecules
109
`````````````````
110
111
Reading molecules is mostly identical to `Pybel <https://open-babel.readthedocs.org/en/latest/UseTheLibrary/Python_Pybel.html>`_.
112
113
Reading from file
114
115
.. code-block:: python
116
117
    for mol in oddt.toolkit.readfile('smi', 'test.smi'):
118
        print(mol.title)
119
120
Reading from string
121
122
.. code-block:: python
123
124
    mol = oddt.toolkit.readstring('smi', 'c1ccccc1 benzene'):
125
        print(mol.title)
126
127
.. note:: You can force molecules to be read in asynchronously, aka “lazy molecules”. Current default is not to produce lazy molecules due to OpenBabel’s Memory Leaks in OBConverter. Main advantage of lazy molecules is using them in multiprocessing, then conversion is spreaded on all jobs.
128
129
Reading molecules from file in asynchronous manner
130
131
.. code-block:: python
132
133
    for mol in oddt.toolkit.readfile('smi', 'test.smi', lazy=True):
134
        pass
135
136
This example will execute instantaneously, since no molecules were evaluated.
137
138
Numpy Dictionaries - store your molecule as an uniform structure
139
````````````````````````````````````````````````````````````````
140
141
Most important and handy property of Molecule in ODDT are Numpy dictionaries containing most properties of supplied molecule. Some of them are straightforward, other require some calculation, ie. atom features. Dictionaries are provided for major entities of molecule: atoms, bonds, residues and rings. It was primarily used for interactions calculations, although it is applicable for any other calculation. The main benefit is marvelous Numpy broadcasting and subsetting.
142
143
144
Each dictionary is defined as a format in Numpy.
145
146
atom_dict
147
---------
148
149
Atom basic information
150
151
* '*coords*', type: ``float32``, shape: (3) - atom coordinates
152
* '*charge*', type: ``float32`` - atom's charge
153
* '*atomicnum*', type: ``int8`` - atomic number
154
* '*atomtype*', type: ``a4`` - Sybyl atom's type
155
* '*hybridization*', type: ``int8`` - atoms hybrydization
156
* '*neighbors*', type: ``float32``, shape: (4,3) - coordinates of non-H neighbors coordinates for angles (max of 4 neighbors should be enough)
157
158
Residue information for current atom
159
160
* '*resid*', type: ``int16`` - residue ID
161
* '*resnum*', type: ``int16`` - residue number
162
* '*resname*', type: ``a3`` - Residue name (3 letters)
163
* '*isbackbone*', type: ``bool`` - is atom part of backbone
164
165
Atom properties
166
167
* '*isacceptor*', type: ``bool`` - is atom H-bond acceptor
168
* '*isdonor*', type: ``bool`` - is atom H-bond donor
169
* '*isdonorh*', type: ``bool`` - is atom H-bond donor Hydrogen
170
* '*ismetal*', type: ``bool`` - is atom a metal
171
* '*ishydrophobe*', type: ``bool`` - is atom hydrophobic
172
* '*isaromatic*', type: ``bool`` - is atom aromatic
173
* '*isminus*', type: ``bool`` - is atom negatively charged/chargable
174
* '*isplus*', type: ``bool`` - is atom positively charged/chargable
175
* '*ishalogen*', type: ``bool`` - is atom a halogen
176
177
Secondary structure
178
179
* '*isalpha*', type: ``bool`` - is atom a part of alpha helix
180
* '*isbeta*', type: ``bool'`` - is atom a part of beta strand
181
182
183
ring_dict
184
---------
185
186
* '*centroid*', type: ``float32``, shape: 3 - coordinates of ring's centroid
187
* '*vector*', type: ``float32``, shape: 3 - normal vector for ring
188
* '*isalpha*', type: ``bool`` - is ring a part of alpha helix
189
* '*isbeta*', type: ``bool'`` - is ring a part of beta strand
190
191
res_dict
192
--------
193
194
* '*id*', type: ``int16`` - residue ID
195
* '*resnum*', type: ``int16`` - residue number
196
* '*resname*', type: ``a3`` - Residue name (3 letters)
197
* '*N*', type: ``float32``, shape: 3 - coordinates of backbone N atom
198
* '*CA*', type: ``float32``, shape: 3 - coordinates of backbone CA atom
199
* '*C*', type: ``float32``, shape: 3 - coordinates of backbone C atom
200
* '*isalpha*', type: ``bool`` - is residue a part of alpha helix
201
* '*isbeta*', type: ``bool'`` - is residue a part of beta strand
202
203
204
.. note:: All aforementioned dictionaries are generated “on demand”, and are cached for molecule, thus can be shared between calculations. Caching of dictionaries brings incredible performance gain, since in some applications their generation is the major time consuming task.
205
206
Get all acceptor atoms:
207
208
.. code-block:: python
209
210
    mol.atom_dict['isacceptor']
211
212
213
Interaction Fingerprints
214
````````````````````````
215
Module, where interactions between two molecules are calculated and stored in fingerprint.
216
217
The most common usage
218
---------------------
219
220
Firstly, loading files
221
222
.. code-block:: python
223
224
    protein = next(oddt.toolkit.readfile('pdb', 'protein.pdb'))
225
    protein.protein = True
226
    ligand = next(oddt.toolkit.readfile('sdf', 'ligand.sdf'))
227
228
.. note:: You have to mark a variable with file as protein, otherwise You won't be able to get access to e.g. 'resname; , 'resid' etc. It can be done as above.
229
230
File with more than one molecule
231
232
.. code-block:: python
233
234
  mols = list(oddt.toolkit.readfile('sdf', 'ligands.sdf'))
235
236
When files are loaded, You can check interactions between molecules. Let's find out, which amino acids creates hydrogen bonds
237
::
238
  protein_atoms, ligand_atoms, strict = hbonds(protein, ligand)
239
  print(protein_atoms['resname'])
240
241
Or check hydrophobic contacts between molecules
242
::
243
  protein_atoms, ligand_atoms = hydrophobic_contacts(protein, ligand)
244
  print(protein_atoms, ligand_atoms)
245
246
But instead of checking interactions one by one, You can use fingerprints module.
247
248
.. code-block:: python
249
250
  IFP = InteractionFingerprint(ligand, protein)
251
  SIFP = SimpleInteractionFingerprint(ligand, protein)
252
253
Very often we're looking for similar molecules. We can easily accomplish this by e.g.
254
255
.. code-block:: python
256
257
  results = []
258
  reference = SimpleInteractionFingerprint(ligand, protein)
259
  for el in query:
260
      fp_query = SimpleInteractionFingerprint(el, protein)
261
      # similarity score for current query
262
      cur_score = dice(reference, fp_query)
263
      # score is the lowest, required similarity
264
      if cur_score > score:
265
          results.append(el)
266
  return results
267
268
Molecular shape comparison
269
``````````````````````````
270
Three methods for molecular shape comparison are supported: USR and its two derivatives: USRCAT and Electroshape.
271
272
* USR (Ultrafast Shape Recognition) - function usr(molecule)
273
    Ballester PJ, Richards WG (2007). Ultrafast shape recognition to search
274
    compound databases for similar molecular shapes. Journal of
275
    computational chemistry, 28(10):1711-23.
276
    http://dx.doi.org/10.1002/jcc.20681
277
278
* USRCAT (USR with Credo Atom Types) - function usr_cat(molecule)
279
    Adrian M Schreyer, Tom Blundell (2012). USRCAT: real-time ultrafast
280
    shape recognition with pharmacophoric constraints. Journal of
281
    Cheminformatics, 2012 4:27.
282
    http://dx.doi.org/10.1186/1758-2946-4-27
283
284
* Electroshape - function electroshape(molecule)
285
    Armstrong, M. S. et al. ElectroShape: fast molecular similarity
286
    calculations incorporating shape, chirality and electrostatics.
287
    J Comput Aided Mol Des 24, 789-801 (2010).
288
    http://dx.doi.org/doi:10.1007/s10822-010-9374-0
289
290
    Aside from spatial coordinates, atoms' charges are also used
291
    as the fourth dimension to describe shape of the molecule.
292
293
To find most similar molecules from the given set, each of these methods can be used.
294
295
Loading files:
296
297
.. code-block:: python
298
299
    query = next(oddt.toolkit.readfile('sdf', 'query.sdf'))
300
    database = list(oddt.toolkit.readfile('sdf', 'database.sdf'))
301
302
Example code to find similar molecules:
303
304
.. code-block:: python
305
306
    results = []
307
    query_shape = usr(query)
308
    for mol in database:
309
        mol_shape = usr(mol)
310
        similarity = usr_similarity(query_shape, mol_shape)
311
        if similarity > 0.7:
312
            results.append(mol)
313
314
To use another method, replace usr(mol) with usr_cat(mol) or electroshape(mol).
315
316
ODDT command line interface (CLI)
317
=================================
318
319
There is an `oddt` command to interface with Open Drug Discovery Toolkit from terminal, without any programming knowleadge.
320
It simply reproduces :class:`oddt.virtualscreening.virtualscreening`.
321
One can filter, dock and score ligands using methods implemented or compatible with ODDT.
322
All positional arguments are treated as input ligands, whereas output must be assigned using `-O` option (following `obabel` convention).
323
Input and output formats are defined using `-i` and `-o` accordingly.
324
If output format is present and no output file is assigned, then molecules are printed to STDOUT.
325
326
327
328
To list all the available options issue `-h` option:
329
330
.. code-block:: bash
331
332
    oddt_cli -h
333
334
Examples
335
--------
336
337
1. Docking ligand using Autodock Vina (construct box using ligand from crystal structure) with additional RFscore v2 rescoring:
338
339
.. code-block:: bash
340
341
    oddt_cli input_ligands.sdf --dock autodock_vina --receptor rec.mol2 --auto_ligand crystal_ligand.mol2 --score rfscore_v2 -O output_ligands.sdf
342
343
344
2. Filtering ligands using Lipinski RO5 and PAINS. Afterwards dock with Autodock Vina:
345
346
.. code-block:: bash
347
348
    oddt_cli input_ligands.sdf --filter ro5 --filter pains --dock autodock_vina --receptor rec.mol2 --auto_ligand crystal_ligand.mol2 -O output_ligands.sdf
349
350
3. Dock with Autodock Vina, with precise box position and dimensions. Fix seed for reproducibility and increase exhaustiveness:
351
352
.. code-block:: bash
353
354
    oddt_cli ampc/actives_final.mol2.gz --dock autodock_vina --receptor ampc/receptor.pdb --size '(8,8,8)' --center '(1,2,0.5)' --exhaustiveness 20 --seed 1 -O ampc_docked.sdf
355
356
4. Rescore ligands using 3 versions of RFscore and pre-trained scoring function (either pickle from ODDT or any other SF implementing :class:`oddt.scoring.scorer` API):
357
358
.. code-block:: bash
359
360
    oddt_cli docked_ligands.sdf --receptor rec.mol2 --score rfscore_v1 --score rfscore_v2 --score rfscore_v3 --score TrainedNN.pickle -O docked_ligands_rescored.sdf
361
362
Development and contributions guide
363
===========================================
364
365
1. Indicies
366
All indicies within toolkit are 0-based, but for backward compatibility with OpenBabel there is ``mol.idx`` property.
367
If you develop using ODDT you are encouraged to use 0-based indicies and/or ``mol.idx0`` and ``mol.idx1`` properties to be exact which convention you adhere to.
368
Otherwise you can run into bags which are hard to catch, when writing toolkit independent code.
369
370
ODDT API documentation
371
======================
372
373
.. toctree:: rst/oddt.rst
374
375
References
376
==========
377
378
To be announced.
379
380
Documentation Indices and tables
381
=================================
382
383
* :ref:`genindex`
384
* :ref:`modindex`
385
* :ref:`search`