{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **FEATURE ENGINEERING**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have too many columns with values True/ NaN. We will try to group them by categories. \n",
"
\n",
" **Diagnoses**\n",
"- Respiratory Disorders\n",
"- Heart and Cardiovascular Diseases\n",
"- Metabolic and Endocrine Disorders\n",
"- Neurological Disorders\n",
"- Orthopedic Injuries\n",
"- Mental Health Conditions\n",
"- Reproductive and Pregnancy-related\n",
"\n",
"**Medications**\n",
"- Pain Relievers and Analgesics\n",
"- Cardiovascular and Blood Pressure Medications\n",
"- Infection Medications\n",
"- Oral Medications\n",
"- Other Medications\n",
"\n",
"**Treatments and Care**\n",
"- Therapies and Regimes\n",
"- Diagnostic Procedures\n",
"- Surgerical Interventions\n",
"- Patient Care Management\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"from tabulate import tabulate"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | label | \n", "scc | \n", "race | \n", "marital | \n", "ethnic | \n", "gender | \n", "state | \n", "age | \n", "Pain severity - 0-10 verbal numeric rating [Score] - Reported | \n", "Influenza seasonal injectable preservative free | \n", "... | \n", "Parainfluenza virus 1 RNA [Presence] in Respiratory specimen by NAA with probe detection | \n", "Influenza virus B RNA [Presence] in Respiratory specimen by NAA with probe detection | \n", "Influenza virus A RNA [Presence] in Respiratory specimen by NAA with probe detection | \n", "Adenovirus A+B+C+D+E DNA [Presence] in Respiratory specimen by NAA with probe detection | \n", "SARS-CoV-2 RNA Pnl Resp NAA+probe | \n", "Hydroxychloroquine Sulfate 200 MG Oral Tablet | \n", "1 ML denosumab 60 MG/ML Prefilled Syringe | \n", "Fexofenadine hydrochloride 60 MG Oral Tablet | \n", "Leronlimab 700 MG Injection | \n", "Lenzilumab 200 MG IV | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "101 | \n", "white | \n", "m | \n", "nonhispanic | \n", "m | \n", "massachusetts | \n", "50t70 | \n", "abnormal | \n", "True | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
1 | \n", "0 | \n", "110 | \n", "white | \n", "m | \n", "nonhispanic | \n", "m | \n", "massachusetts | \n", "50t70 | \n", "normal | \n", "True | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2 | \n", "0 | \n", "127 | \n", "black | \n", "m | \n", "nonhispanic | \n", "m | \n", "massachusetts | \n", "50t70 | \n", "abnormal | \n", "True | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
3 | \n", "0 | \n", "129 | \n", "white | \n", "m | \n", "nonhispanic | \n", "m | \n", "massachusetts | \n", "50t70 | \n", "abnormal | \n", "True | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
4 | \n", "1 | \n", "69 | \n", "white | \n", "m | \n", "nonhispanic | \n", "m | \n", "massachusetts | \n", "50t70 | \n", "abnormal | \n", "True | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
5 rows × 783 columns
\n", "