|
a |
|
b/README.md |
|
|
1 |
# Automated Extraction of Medical Risk Factors For Life Insurance Underwriting |
|
|
2 |
|
|
|
3 |
Life insurance underwriting considers an applicant’s medical risk factors, usually provided inside free-text documents. New insurance-specific Natural Language Processing (NLP) models can automatically extract material medical history and risk factors from such documents. This joint Solution Accelerator with John Snow Labs makes it easy to implement this in practice – enabling a faster, more consistent, and more scalable underwriting experience. This tutorial will cover: |
|
|
4 |
- The end-to-end solution architecture on Databricks, from data ingestion to dashboarding |
|
|
5 |
- Easily analyze free-text documents to extract medical history & risk factors using NLP |
|
|
6 |
- Executable Python notebooks implementing the solution that you can start from today |
|
|
7 |
|
|
|
8 |
We will get the following list of medical risk factors from unstructured clinical notes using Spark NLP models and tools and make analysis on the results.<br> |
|
|
9 |
|
|
|
10 |
|
|
|
11 |
- Basic Profile |
|
|
12 |
- ✅ Age |
|
|
13 |
- ✅ Gender |
|
|
14 |
- ✅ Weight |
|
|
15 |
- ✅ Height |
|
|
16 |
- ✅ Race/Ethnicity |
|
|
17 |
- ✅ Disability |
|
|
18 |
- Personal History |
|
|
19 |
- ✅ Medical records (ICD-10-CM) |
|
|
20 |
- ✅ Prescription history (RxNorm) |
|
|
21 |
- ✅ Actions of prescriptions (Action Mapper) |
|
|
22 |
- ✅ Family health history (ICD-10-CM + Assertion) |
|
|
23 |
- Lifestyle |
|
|
24 |
- ✅ Profession |
|
|
25 |
- ✅ Marital status |
|
|
26 |
- ✅ Housing |
|
|
27 |
- ✅ Smoking |
|
|
28 |
- ✅ Alcohol |
|
|
29 |
- ✅ Substance |
|
|
30 |
- Diseases |
|
|
31 |
- ✅ Asthma and breathing problems |
|
|
32 |
- ✅ Heart disease, including heart attacks and angina |
|
|
33 |
- ✅ High cholesterol |
|
|
34 |
- ✅ High blood pressure |
|
|
35 |
- ✅ Hypertension |
|
|
36 |
- ✅ Cancer |
|
|
37 |
- ✅ Strokes, including mini-strokes and brain haemorrage |
|
|
38 |
- ✅ Anxiety |
|
|
39 |
- ✅ Depression |
|
|
40 |
- ✅ Diabetes |
|
|
41 |
- ✅ Obesity |
|
|
42 |
- ✅ Epilepsy |
|
|
43 |
- ✅ Cerebral palsy and other neurological conditions |
|
|
44 |
- ✅ Kidney diseases |
|
|
45 |
|
|
|
46 |
|
|
|
47 |
## License |
|
|
48 |
Copyright / License info of the notebook. Copyright [2023] the Notebook Authors. The source in this notebook is provided subject to the [Apache 2.0 License](https://spdx.org/licenses/Apache-2.0.html). All included or referenced third party libraries are subject to the licenses set forth below. |
|
|
49 |
|
|
|
50 |
|Library Name|Library License|Library License URL|Library Source URL| |
|
|
51 |
| :-: | :-:| :-: | :-:| |
|
|
52 |
|Pandas |BSD 3-Clause License| https://github.com/pandas-dev/pandas/blob/master/LICENSE | https://github.com/pandas-dev/pandas| |
|
|
53 |
|Numpy |BSD 3-Clause License| https://github.com/numpy/numpy/blob/main/LICENSE.txt | https://github.com/numpy/numpy| |
|
|
54 |
|Apache Spark |Apache License 2.0| https://github.com/apache/spark/blob/master/LICENSE | https://github.com/apache/spark/tree/master/python/pyspark| |
|
|
55 |
|MatPlotLib | | https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE | https://github.com/matplotlib/matplotlib| |
|
|
56 |
|Seaborn |BSD 3-Clause License | https://github.com/seaborn/seaborn/blob/master/LICENSE | https://github.com/seaborn/seaborn/| |
|
|
57 |
|Spark NLP Display|Apache License 2.0|https://github.com/JohnSnowLabs/spark-nlp-display/blob/main/LICENSE|https://github.com/JohnSnowLabs/spark-nlp-display| |
|
|
58 |
|Spark NLP |Apache License 2.0| https://github.com/JohnSnowLabs/spark-nlp/blob/master/LICENSE | https://github.com/JohnSnowLabs/spark-nlp| |
|
|
59 |
|Spark NLP for Healthcare|[Proprietary license - John Snow Labs Inc.](https://www.johnsnowlabs.com/spark-nlp-health/) |NA|NA| |
|
|
60 |
|
|
|
61 |
|
|
|
62 |
|Author| |
|
|
63 |
|-| |
|
|
64 |
|Databricks Inc.| |
|
|
65 |
|John Snow Labs Inc.| |
|
|
66 |
|
|
|
67 |
|
|
|
68 |
## Disclaimers |
|
|
69 |
Databricks Inc. (“Databricks”) does not dispense medical, diagnosis, or treatment advice. This Solution Accelerator (“tool”) is for informational purposes only and may not be used as a substitute for professional medical advice, treatment, or diagnosis. This tool may not be used within Databricks to process Protected Health Information (“PHI”) as defined in the Health Insurance Portability and Accountability Act of 1996, unless you have executed with Databricks a contract that allows for processing PHI, an accompanying Business Associate Agreement (BAA), and are running this notebook within a HIPAA Account. Please note that if you run this notebook within Azure Databricks, your contract with Microsoft applies. |
|
|
70 |
|
|
|
71 |
|
|
|
72 |
## Instruction |
|
|
73 |
To run this accelerator, set up JSL Partner Connect [AWS](https://docs.databricks.com/integrations/ml/john-snow-labs.html#connect-to-john-snow-labs-using-partner-connect), [Azure](https://learn.microsoft.com/en-us/azure/databricks/integrations/ml/john-snow-labs#--connect-to-john-snow-labs-using-partner-connect) and navigate to **My Subscriptions** tab. Make sure you have a valid subscription for the workspace you clone this repo into, then **install on cluster** as shown in the screenshot below, with the default options. You will receive an email from JSL when the installation completes. |
|
|
74 |
|
|
|
75 |
<br> |
|
|
76 |
<img src="https://raw.githubusercontent.com/databricks-industry-solutions/oncology/main/images/JSL_partner_connect_install.png" width=65%> |
|
|
77 |
|
|
|
78 |
Once the JSL installation completes successfully, clone this repo into a Databricks workspace. Attach the `RUNME` notebook to any cluster and execute the notebook via `Run-All`. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. Execute the multi-step-job to see how the pipeline runs. |