Download gold standard corpus here.
Download deidentification software package here.
Copy id.text from the directory of gold standard corpus to the directory of software package.
Make sure that you have perl installed.
Run the following command to generate output file id.phi
perl deid.pl id deid.config
Performance statistics can be derived by running the following command.
perl runStat.pl id.deid id.phi
For example, for the gold standard corpus, the performance should be the following.
==========================
Num of true positives = 1720
Num of false positives = 546
Num of false negatives = 59
Sensitivity/Recall = 0.967
PPV/Specificity = 0.748
==========================
Check software package page for more information about the deid
package and how to configure the tool. We provide a simple example for demonstration purpose here.
Download test.text
here to software package directory.
Open deid.config
in text editor, comment out Gold standard comparison = 1
and uncomment Gold standard comparison = 0
to use output mode.
Use the command of the format perl deid.pl input_file config
to generate output file input_file.phi. Note that input_file should have the extension text but the extension should be omitted in the command. For example, we have the input file as test.text here, but we use the following command where input file is notated as test
.
perl deid.pl test deid.config
Output files:
- test.phi
: contains the numerical spans of sensitive fields
- test.info
: contains the neumerical spans of sensitive fields AND corresponding tokens AND types
- test.res
: scrubbed text with sensitive fields removed