Changelog
v2.2.0 - 15.01.2025
- Fixed bug in the kekulization of molecules with radicals (thanks Olabisi-Aishat-Bello for reporting, thanks Robert Pollice for fixing)
- Fixed constraints for validity of molecules with changed C, P or S, to align with validity-definition of RDKit.
v2.1.2 - 15.07.2024
- Fixed recursion bug for very long molecules (thanks haydn-jones)
- Added warning when dot-symbol (".") exists in peculiar cases (thanks vandrw)
v2.1.1 - 14.07.2022
- Fixed index bug in attribution
v2.1.0 - 17.05.2022
Changed:
- Dropped support for Python 3.5-3.6 and will continue to support only current Python versions.
Added:
- optional attribution to map encoder/decoder output string back to input string (Issue #48, #79)
v2.0.0 - 21.10.2021
Changed:
- Improved SMILES parsing (by using adjacencey lists internally), with tighter error handling
(e.g. issues #62 and #60).
- Faster and improved kekulization algorithm (issue #55 fixed).
- Support for symbols that are constrained to 0 bonds (e.g.,
[CH4]
) or >8 bonds
(users can now specify custon bond constraints with over 8 bonds).
- New
strict=True
flag to selfies.encoder
, which raises an error if the input
SMILES violates the current bond constraints. True
by default, can be False
for speed-up (if
SMILES are guaranteed to be correct).
- Added bond constraints for B (max. 3 bonds) to the default and preset constraints.
- Updated the syntax of SELFIES symbols to be cleaner and more readable.
- Removing
expl
from atomic symbols, e.g., [C@@Hexpl]
becommes [C@@H]
- Cleaner branch symbols, e.g.,
[BranchL_2]
becomes [=BranchL]
- Cleaner ring symbols, e.g.,
[Expl=RingL]
becomes [=RingL]
- If you want to use the old symbols, use the
compatible=True
flag to selfies.decoder
,
e.g., sf.decoder('[C][C][Expl=Ring1]',compatible=True)
(not recommended!)
- More logically consistent behaviour of
[Ring]
symbols.
- Standardized SELFIES alphabet, i.e., no two symbols stand for the same atom/ion (issue #58), e.g.,
[N+1]
and [N+]
are equivalent now.
- Indexing symbols are now included in the alphabet returned by
selfies.get_semantic_robust_alphabet
.
Removed
- Removed
constraints
flag from selfies.decoder
; please use selfies.set_semantic_constraints()
and pass in "hypervalent"
or "octet_rule"
instead.
- Removed
print_error
flag in selfies.encoder
and selfies.decoder
,
which now raise errors selfies.EncoderError
and selfies.DecoderError
, respectively.
Bug Fixes
- Potential chirality inversion of atoms making ring bonds (e.g.
[C@@H]12CCC2CCNC1
):
fixed by inverting their chirality in selfies.encoder
such that they are decoded with
the original chirality preserved.
- Failure to represent mismatching stereochemical specifications at ring bonds
(e.g. F/C=C/C/C=C\C
): fixed by adding new ring symbols (e.g. [-/RingL]
, [\/RingL]
, etc.).
v1.0.4 - 23.04.2021
Added:
- decoder option for relaxed hypervalence rules,
decoder(...,constraints='hypervalent')
- decoder option for strict octet rules,
decoder(...,constraints='octet_rule')
Bug Fix:
- Fixed constraint for Phosphorus
v1.0.3 - 13.01.2021
Added:
- Support for aromatic Si and Al (is not officially supported by Daylight SMILES, but RDKit supports it and examples exist in PubChem).
v1.0.2 - 14.10.2020
Added:
- Support for aromatic Te and triple bonds.
- Inbuild SELFIES to 1hot encoding, and 1hot encoding to SELFIES
Changed:
- Added default semantic constraints for charged atoms (single positive/negative charge of
[C]
, [N]
, [O]
, [S]
, [P]
)
- Raised the bond capacity of
P
to 7 bonds (from 5 bonds).
Bug Fixes:
- Fixed bug:
selfies.decoder
did not terminate for malformed SELFIES
that are missing the closed bracket ']'
.
v1.0.1 - 25.08.2020
Changed:
- Code so that is compatible with python >= 3.5.
- More descriptive error messages.
Bug Fixes:
- Minor bug fixes in the encoder for SMILES ending in branches (e.g.
C(Cl)(F)
),
and SMILES with ring numbers between branches (e.g. C(Cl)1(Br)CCCC1
)
- Minor bug fix with ring ordering in decoder (e.g.
C1CC2CCC12
vs C1CC2CCC21
).
v1.0.0 - 17.08.2020:
Added:
- Added semantic handling of aromaticity / delocalization (by kekulizing SMILES with aromatic symbols before
they are translated into SELFIES by selfies.encoder
).
- Added semantic handling of charged species (e.g.
[CH+]1CCC1
).
- Added semantic handling of radical species (
[CH]1CCC1
) or any species with explicit hydrogens (e.g. CC[CH2]
).
- Added semantic handling of isotopes (e.g.
[14CH2]=C
or [235U]
).
- Improved semantic handling of explicit atom symbols in square brackets, e.g. Carbene (
[C]=C
).
- Improved semantic handling of chirality (e.g.
O=C[Co@@](F)(Cl)(Br)(I)S
).
- Improved semantic handling of double-bond configuration (e.g.
F/C=C/C=C/C
).
- Added new functions to the library, such as
selfies.len_selfies
and
selfies.split_selfies
.
- Added advanced-user functions to the library to customize the SELFIES semantic constraints, e.g.
selfies.set_semantic_constraints
. Allows to encode for instance diborane, [BH2]1[H][BH2][H]1
.
- Introduced new padding
[nop]
(no operation) symbol.
Changed:
- Optimized the indexing alphabet (it is base-16 now).
- Optimized the behaviours of rings and branches to fix an issue with specific non-standard molecules that could not be translated.
- Changed behaviour of Ring/Branch, such that states
X9991-X9993
are not necessary anymore.
- Significantly improved encoding and decoding algorithms, it is much faster now.
v0.2.4 - 01.10.2019:
Added:
- Function
get_alphabet()
which returns a list of 29 selfies symbols
whose arbitrary combination produce >99.99% valid molecules.
Bug Fixes:
- Fixed bug which happens when three rings start at one node, and two of
them form a double ring.
- Enabled rings with sizes of up to 8000 SELFIES symbols.
- Bug fix for tiny ring to RDKit syntax conversion, spanning multiple
branches.
We thank Kevin Ryan (LeanAndMean@github), Theophile Gaudin and Andrew Brereton
for suggestions and bug reports.
v0.2.2 - 19.09.2019:
Added:
- Enabled
[C@],[C@H],[C@@],[C@@H],[H]
to use in a semantic
constrained way.
We thank Andrew Brereton for suggestions and bug reports.
v0.2.1 - 02.09.2019:
Added:
- Decoder: added optional argument to restrict nitrogen to 3 bonds.
decoder(...,N_restrict=False)
to allow for more bonds;
standard: N_restrict=True
.
- Decoder: added optional argument make ring-function bi-local
(i.e. confirms bond number at target).
decoder(...,bilocal_ring_function=False)
to not allow bi-local ring
function; standard: bilocal_ring_function=True
. The bi-local ring
function will allow validity of >99.99% of random molecules.
- Decoder: made double-bond ring RDKit syntax conform.
- Decoder: added state X5 and X6 for having five and six bonds free.
Bug Fixes:
- Decoder + Encoder: allowing for explicit brackets for organic atoms, for
instance [I]
.
- Encoder: explicit single/double bond for non-canonical SMILES input
issue fixed.
- Decoder: bug fix for
[Branch*]
in state X1.
We thank Benjamin Sanchez-Lengeling, Theophile Gaudin and Zhenpeng Yao
for suggestions and bug reports.
v0.1.1 - 04.06.2019: