In some cases, you are not interested in individual extractions, but rather in document-level aggregated variables. For instance, you may be interested to know if a patient is diabetic without caring abou the actual mentions of diabetes. Here, we propose a simple and generic rule which work by:
Below is a simple implementation of this aggregation rule (this can be adapted for other comorbidity components and other qualification methods):
MIN_NUMBER_ENTITIES = 2 # (1)!
if not Doc.has_extension("aggregated"):
Doc.set_extension("aggregated", default={}) # (2)!
spans = doc.spans["diabetes"] # (3)!
kept_spans = [
(span, span._.status, span._.detailed_status)
for span in spans
if not any([span._.negation, span._.hypothesis, span._.family])
] # (4)!
if len(kept_spans) < MIN_NUMBER_ENTITIES: # (5)!
status = "ABSENT"
else:
status = max(kept_spans, key=itemgetter(1))[2] # (6)!
doc._.aggregated["diabetes"] = status
doc._.aggregated
dictionarydiabetes
component