|
a |
|
b/CHANGELOG.md |
|
|
1 |
# Changelog |
|
|
2 |
|
|
|
3 |
## v2.2.0 - 15.01.2025 |
|
|
4 |
- Fixed bug in the kekulization of molecules with radicals (thanks Olabisi-Aishat-Bello for reporting, thanks Robert Pollice for fixing) |
|
|
5 |
- Fixed constraints for validity of molecules with changed C, P or S, to align with validity-definition of RDKit. |
|
|
6 |
|
|
|
7 |
## v2.1.2 - 15.07.2024 |
|
|
8 |
- Fixed recursion bug for very long molecules (thanks haydn-jones) |
|
|
9 |
- Added warning when dot-symbol (".") exists in peculiar cases (thanks vandrw) |
|
|
10 |
|
|
|
11 |
## v2.1.1 - 14.07.2022 |
|
|
12 |
- Fixed index bug in attribution |
|
|
13 |
|
|
|
14 |
## v2.1.0 - 17.05.2022 |
|
|
15 |
|
|
|
16 |
### Changed: |
|
|
17 |
- Dropped support for Python 3.5-3.6 and will continue to support only current Python versions. |
|
|
18 |
|
|
|
19 |
### Added: |
|
|
20 |
- optional attribution to map encoder/decoder output string back to input string (Issue #48, #79) |
|
|
21 |
|
|
|
22 |
## v2.0.0 - 21.10.2021 |
|
|
23 |
|
|
|
24 |
### Changed: |
|
|
25 |
- Improved SMILES parsing (by using adjacencey lists internally), with tighter error handling |
|
|
26 |
(e.g. issues #62 and #60). |
|
|
27 |
- Faster and improved kekulization algorithm (issue #55 fixed). |
|
|
28 |
- Support for symbols that are constrained to 0 bonds (e.g., `[CH4]`) or >8 bonds |
|
|
29 |
(users can now specify custon bond constraints with over 8 bonds). |
|
|
30 |
- New `strict=True` flag to `selfies.encoder`, which raises an error if the input |
|
|
31 |
SMILES violates the current bond constraints. `True` by default, can be `False` for speed-up (if |
|
|
32 |
SMILES are guaranteed to be correct). |
|
|
33 |
- Added bond constraints for B (max. 3 bonds) to the default and preset constraints. |
|
|
34 |
- Updated the syntax of SELFIES symbols to be cleaner and more readable. |
|
|
35 |
- Removing `expl` from atomic symbols, e.g., `[C@@Hexpl]` becommes `[C@@H]` |
|
|
36 |
- Cleaner branch symbols, e.g., `[BranchL_2]` becomes `[=BranchL]` |
|
|
37 |
- Cleaner ring symbols, e.g., `[Expl=RingL]` becomes `[=RingL]` |
|
|
38 |
- If you want to use the old symbols, use the `compatible=True` flag to `selfies.decoder`, |
|
|
39 |
e.g., `sf.decoder('[C][C][Expl=Ring1]',compatible=True)` (not recommended!) |
|
|
40 |
- More logically consistent behaviour of `[Ring]` symbols. |
|
|
41 |
- Standardized SELFIES alphabet, i.e., no two symbols stand for the same atom/ion (issue #58), e.g., |
|
|
42 |
`[N+1]` and `[N+]` are equivalent now. |
|
|
43 |
- Indexing symbols are now included in the alphabet returned by `selfies.get_semantic_robust_alphabet`. |
|
|
44 |
|
|
|
45 |
### Removed |
|
|
46 |
- Removed `constraints` flag from `selfies.decoder`; please use `selfies.set_semantic_constraints()` |
|
|
47 |
and pass in `"hypervalent"` or `"octet_rule"` instead. |
|
|
48 |
- Removed `print_error` flag in `selfies.encoder` and `selfies.decoder`, |
|
|
49 |
which now raise errors `selfies.EncoderError` and `selfies.DecoderError`, respectively. |
|
|
50 |
|
|
|
51 |
### Bug Fixes |
|
|
52 |
- Potential chirality inversion of atoms making ring bonds (e.g. ``[C@@H]12CCC2CCNC1``): |
|
|
53 |
fixed by inverting their chirality in ``selfies.encoder`` such that they are decoded with |
|
|
54 |
the original chirality preserved. |
|
|
55 |
- Failure to represent mismatching stereochemical specifications at ring bonds |
|
|
56 |
(e.g. ``F/C=C/C/C=C\C``): fixed by adding new ring symbols (e.g. ``[-/RingL]``, ``[\/RingL]``, etc.). |
|
|
57 |
|
|
|
58 |
--- |
|
|
59 |
|
|
|
60 |
## v1.0.4 - 23.04.2021 |
|
|
61 |
### Added: |
|
|
62 |
* decoder option for relaxed hypervalence rules, `decoder(...,constraints='hypervalent')` |
|
|
63 |
* decoder option for strict octet rules, `decoder(...,constraints='octet_rule')` |
|
|
64 |
### Bug Fix: |
|
|
65 |
* Fixed constraint for Phosphorus |
|
|
66 |
|
|
|
67 |
--- |
|
|
68 |
|
|
|
69 |
## v1.0.3 - 13.01.2021 |
|
|
70 |
### Added: |
|
|
71 |
* Support for aromatic Si and Al (is not officially supported by Daylight SMILES, but RDKit supports it and examples exist in PubChem). |
|
|
72 |
|
|
|
73 |
--- |
|
|
74 |
|
|
|
75 |
## v1.0.2 - 14.10.2020 |
|
|
76 |
### Added: |
|
|
77 |
* Support for aromatic Te and triple bonds. |
|
|
78 |
* Inbuild SELFIES to 1hot encoding, and 1hot encoding to SELFIES |
|
|
79 |
|
|
|
80 |
### Changed: |
|
|
81 |
* Added default semantic constraints for charged atoms (single positive/negative charge of `[C]`, `[N]`, `[O]`, `[S]`, `[P]`) |
|
|
82 |
* Raised the bond capacity of `P` to 7 bonds (from 5 bonds). |
|
|
83 |
|
|
|
84 |
### Bug Fixes: |
|
|
85 |
* Fixed bug: `selfies.decoder` did not terminate for malformed SELFIES |
|
|
86 |
that are missing the closed bracket `']'`. |
|
|
87 |
|
|
|
88 |
--- |
|
|
89 |
|
|
|
90 |
## v1.0.1 - 25.08.2020 |
|
|
91 |
### Changed: |
|
|
92 |
* Code so that is compatible with python >= 3.5. |
|
|
93 |
* More descriptive error messages. |
|
|
94 |
|
|
|
95 |
### Bug Fixes: |
|
|
96 |
* Minor bug fixes in the encoder for SMILES ending in branches (e.g. `C(Cl)(F)`), |
|
|
97 |
and SMILES with ring numbers between branches (e.g. `C(Cl)1(Br)CCCC1`) |
|
|
98 |
* Minor bug fix with ring ordering in decoder (e.g. `C1CC2CCC12` vs `C1CC2CCC21`). |
|
|
99 |
|
|
|
100 |
--- |
|
|
101 |
|
|
|
102 |
## v1.0.0 - 17.08.2020: |
|
|
103 |
### Added: |
|
|
104 |
* Added semantic handling of aromaticity / delocalization (by kekulizing SMILES with aromatic symbols before |
|
|
105 |
they are translated into SELFIES by `selfies.encoder`). |
|
|
106 |
* Added semantic handling of charged species (e.g. `[CH+]1CCC1`). |
|
|
107 |
* Added semantic handling of radical species (`[CH]1CCC1`) or any species with explicit hydrogens (e.g. `CC[CH2]`). |
|
|
108 |
* Added semantic handling of isotopes (e.g. `[14CH2]=C` or `[235U]`). |
|
|
109 |
* Improved semantic handling of explicit atom symbols in square brackets, e.g. Carbene (`[C]=C`). |
|
|
110 |
* Improved semantic handling of chirality (e.g. `O=C[Co@@](F)(Cl)(Br)(I)S`). |
|
|
111 |
* Improved semantic handling of double-bond configuration (e.g. `F/C=C/C=C/C`). |
|
|
112 |
* Added new functions to the library, such as `selfies.len_selfies` and |
|
|
113 |
`selfies.split_selfies`. |
|
|
114 |
* Added advanced-user functions to the library to customize the SELFIES semantic constraints, e.g. |
|
|
115 |
`selfies.set_semantic_constraints`. Allows to encode for instance diborane, `[BH2]1[H][BH2][H]1`. |
|
|
116 |
* Introduced new padding `[nop]` (no operation) symbol. |
|
|
117 |
|
|
|
118 |
### Changed: |
|
|
119 |
* Optimized the indexing alphabet (it is base-16 now). |
|
|
120 |
* Optimized the behaviours of rings and branches to fix an issue with specific non-standard molecules that could not be translated. |
|
|
121 |
* Changed behaviour of Ring/Branch, such that states `X9991-X9993` are not necessary anymore. |
|
|
122 |
* Significantly improved encoding and decoding algorithms, it is much faster now. |
|
|
123 |
|
|
|
124 |
--- |
|
|
125 |
|
|
|
126 |
## v0.2.4 - 01.10.2019: |
|
|
127 |
### Added: |
|
|
128 |
* Function ``get_alphabet()`` which returns a list of 29 selfies symbols |
|
|
129 |
whose arbitrary combination produce >99.99% valid molecules. |
|
|
130 |
|
|
|
131 |
### Bug Fixes: |
|
|
132 |
* Fixed bug which happens when three rings start at one node, and two of |
|
|
133 |
them form a double ring. |
|
|
134 |
* Enabled rings with sizes of up to 8000 SELFIES symbols. |
|
|
135 |
* Bug fix for tiny ring to RDKit syntax conversion, spanning multiple |
|
|
136 |
branches. |
|
|
137 |
|
|
|
138 |
We thank Kevin Ryan (LeanAndMean@github), Theophile Gaudin and Andrew Brereton |
|
|
139 |
for suggestions and bug reports. |
|
|
140 |
|
|
|
141 |
--- |
|
|
142 |
|
|
|
143 |
## v0.2.2 - 19.09.2019: |
|
|
144 |
|
|
|
145 |
### Added: |
|
|
146 |
* Enabled ``[C@],[C@H],[C@@],[C@@H],[H]`` to use in a semantic |
|
|
147 |
constrained way. |
|
|
148 |
|
|
|
149 |
We thank Andrew Brereton for suggestions and bug reports. |
|
|
150 |
|
|
|
151 |
--- |
|
|
152 |
|
|
|
153 |
## v0.2.1 - 02.09.2019: |
|
|
154 |
|
|
|
155 |
### Added: |
|
|
156 |
* Decoder: added optional argument to restrict nitrogen to 3 bonds. |
|
|
157 |
``decoder(...,N_restrict=False)`` to allow for more bonds; |
|
|
158 |
standard: ``N_restrict=True``. |
|
|
159 |
* Decoder: added optional argument make ring-function bi-local |
|
|
160 |
(i.e. confirms bond number at target). |
|
|
161 |
``decoder(...,bilocal_ring_function=False)`` to not allow bi-local ring |
|
|
162 |
function; standard: ``bilocal_ring_function=True``. The bi-local ring |
|
|
163 |
function will allow validity of >99.99% of random molecules. |
|
|
164 |
* Decoder: made double-bond ring RDKit syntax conform. |
|
|
165 |
* Decoder: added state X5 and X6 for having five and six bonds free. |
|
|
166 |
|
|
|
167 |
### Bug Fixes: |
|
|
168 |
* Decoder + Encoder: allowing for explicit brackets for organic atoms, for |
|
|
169 |
instance ``[I]``. |
|
|
170 |
* Encoder: explicit single/double bond for non-canonical SMILES input |
|
|
171 |
issue fixed. |
|
|
172 |
* Decoder: bug fix for ``[Branch*]`` in state X1. |
|
|
173 |
|
|
|
174 |
We thank Benjamin Sanchez-Lengeling, Theophile Gaudin and Zhenpeng Yao |
|
|
175 |
for suggestions and bug reports. |
|
|
176 |
|
|
|
177 |
--- |
|
|
178 |
|
|
|
179 |
## v0.1.1 - 04.06.2019: |
|
|
180 |
* initial release |