3.0.0
Version 3.0.0
of deduce
includes many optimizations that allow more accurate de-identification, some already included in 2.1.0
- 2.5.0.
It also includes some structural optimizations. Version 3.0.0
should be backwards compatible, but some functionality is scheduled for removal in 3.1.0
. Those changes are listed below.
Adding a custom config is now possible as a dict
or as a filename pointing to a json
. Both should be presented to deduce
with the config
keyword, e.g.:
deduce = Deduce(config='my_own_config.json')
deduce = Deduce(config={'redactor_open_char': '**', 'redactor_close_char': '**'})
The config_file
keyword is no longer used, please use config
instead.
For consistency, lookup structures names are now all in singular form:
Old name | New name |
---|---|
prefixes | prefix |
first_names | first_name |
interfixes | interfixes |
interfix_surnames | interfix_surname |
surnames | surname |
streets | street |
placenames | placename |
hospitals | hospital |
healthcare_institutions | healthcare_institution |
Additionally, the first_name_exceptions
and surname_exceptions
list are removed. The exception items are now simply removed from the original list in a more structured way, so there is no need to explicitly filter exceptions in patterns, etc.
annotator_type
field in configIn a config, each each annotator should specify annotator_type
, so Deduce
knows what annotator to load. In 3.0.0
we simplified this a bit. In most cases, the annotator_type
field should be set to module.Class
of the annotator that should be loaded, and Deduce
will handle the rest (sometimes with a little bit of magic, so all arguments are presented with the right type). You should make the following changes:
annotator_type | Change |
---|---|
multi_token | docdeid.process.MultiTokenLookupAnnotator |
dd_token_pattern | This used to load docdeid.process.TokenPatternAnnotator , but this is now replaced by deduce.annotator.TokenPatternAnnotator . The latter is more poweful, but needs a different pattern. A docdeid.process.TokenPatternAnnotator can no longer be loaded through config, although adding it manually to Deduce.processors is always possible. |
token_pattern | deduce.annotator.TokenPatternAnnotator |
annotation_context | deduce.annotator.ContextAnnotator |
custom | Use module.Class directly, where module and class fields used to be specified in args . They should be removed there. |
regexp | docdeid.process.RegexpAnnotator |
2.0.0
Version 2.0.0
of deduce
sees a major refactor that enables speedup, configuration, customization, and more. With it, the interface to apply deduce
to text changes slightly. Updating your code to the new interface should not take more than a few minutes. The details are outlined below.
deduce
deduce
is now called from Deduce.deidentify
, which replaces the annotate_text
and deidentify_annotations
functions. Those functions will give a DeprecationWarning
from version 2.0.0
, and will be deprecated from version 2.1.0
.
deprecated | new |
---|---|
|
|
The annotations and deidentified text are now available in the Document
object. Intext annotations can still be useful for comparisons, they can be obtained by passing the document to a util function from the docdeid
library (note that the format has changed).
deprecated | new |
---|---|
|
|
The patient_first_names
, patient_initials
, patient_surname
and patient_given_name
keywords of annotate_text
are replaced with a structured way to enter this information, in the Person
class. This class can be passed to deidentify()
as metadata. The use of a given name is deprecated, it can instead be added as a separate first name. The behaviour is still the same.
deprecated | new |
---|---|
|
|
Previously, the annotate_text
function offered disabling specific categories by using dates
, ages
, names
, etc. keywords. This behaviour can be achieved by setting the disabled
argument of the Deduce.deidentify
method. Note that the identification logic of Deduce is now further split up into Annotator
classes, allowing disabling/enabling specific components. You can read more about the specific annotators and other components in the tutorial here, and more information on enabling, disabling, replacing or modifying specific components here.
deprecated | new |
---|---|
|
|