[2d970e]: / clustering / Hierarchical Clustering.ipynb

Download this file

802 lines (801 with data), 69.7 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Agglomerative Hierarchical Clustering for Eligibility Criteria Sentences</h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Agglomerative Hierarchical Clustering with Python using Scipy and Scikit-learn package.\n",
    "* We used the hierarchical agglomerative clustering algorithm, works in a “bottom-up” manner, to cluster established semantic feature matrix and generate clusters based on criteria sentences similarity.\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "load packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from sklearn import metrics\n",
    "from scipy.cluster import hierarchy\n",
    "from scipy.cluster.hierarchy import dendrogram\n",
    "from sklearn.cluster import AgglomerativeClustering\n",
    "\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**load feature data (UMLS semantic feature based)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('./data/feature_matrix_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of data:  (19185, 127)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aapp</th>\n",
       "      <th>acab</th>\n",
       "      <th>acty</th>\n",
       "      <th>aggp</th>\n",
       "      <th>amas</th>\n",
       "      <th>amph</th>\n",
       "      <th>anab</th>\n",
       "      <th>anim</th>\n",
       "      <th>anst</th>\n",
       "      <th>antb</th>\n",
       "      <th>...</th>\n",
       "      <th>shro</th>\n",
       "      <th>socb</th>\n",
       "      <th>sosy</th>\n",
       "      <th>spco</th>\n",
       "      <th>tisu</th>\n",
       "      <th>tmco</th>\n",
       "      <th>topp</th>\n",
       "      <th>virs</th>\n",
       "      <th>vita</th>\n",
       "      <th>vtbt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.100000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.333333</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 127 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       aapp  acab  acty  aggp  amas  amph  anab  anim  anst  antb  ...  shro  \\\n",
       "1  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "2  0.100000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "3  0.333333   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "4  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "5  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "\n",
       "   socb  sosy  spco  tisu  tmco  topp  virs  vita  vtbt  \n",
       "1   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "2   0.0   0.0   0.0   0.0   0.0  0.05   0.0   0.0     0  \n",
       "3   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "4   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "5   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "\n",
       "[5 rows x 127 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print (\"Shape of data: \", data.shape)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**To check how good our clustering we use the Silhouette coefficient**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5 0.13493583340566784\n",
      "0.55 0.11100734584557431\n",
      "0.6 0.08518307406187034\n",
      "0.65 0.03891117895152641\n",
      "0.7 0.05687572605457083\n",
      "0.75 0.05831862225544753\n",
      "0.8 0.09151479032132039\n",
      "0.85 0.10091343110972859\n",
      "0.9 0.1251138667772942\n"
     ]
    }
   ],
   "source": [
    "# n_clusters: it must be None if distance_threshold is not None\n",
    "# affinity: euclidean,\n",
    "# linkage: average uses the average of the distances of each observation of the two sets.\n",
    "# distance_threshold: The linkage distance threshold above which, clusters will not be merged\n",
    "silhouette_scores = list()\n",
    "labels_ = list()\n",
    "for d in [0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90]:\n",
    "    model = AgglomerativeClustering(n_clusters=None, affinity=\"euclidean\", linkage=\"average\", distance_threshold=d)\n",
    "    clustering = model.fit(data)\n",
    "    labels = clustering.labels_\n",
    "    labels_.append(labels)\n",
    "    ss = metrics.silhouette_score(data, labels, metric='euclidean')\n",
    "    silhouette_scores.append(ss)\n",
    "    print(d, ss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot([0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90],silhouette_scores)\n",
    "plt.title(\"Silhouette coefficient score\")\n",
    "plt.xlabel(\"distance_threshold\")\n",
    "plt.ylabel(\"silhouette_scores\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[808, 622, 445, 295, 118, 71, 60, 51, 44]\n"
     ]
    }
   ],
   "source": [
    "labels_count = [len(set(l)) for l in labels_]\n",
    "print(labels_count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot([0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90], labels_count)\n",
    "plt.title(\"clusters number\")\n",
    "plt.xlabel(\"distance threshold\")\n",
    "plt.ylabel(\"clusters number\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now, we train a model with an optimum d(=0.65) value**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = AgglomerativeClustering(n_clusters=None, affinity=\"euclidean\", linkage=\"average\", distance_threshold=0.65)\n",
    "clustering = model.fit(data)\n",
    "labels = clustering.labels_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "295\n",
      "[ 28  28  28 165  25   1  35 144  28  14]\n"
     ]
    }
   ],
   "source": [
    "print(len(set(labels)))\n",
    "print(labels[:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.03891117895152641"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "metrics.silhouette_score(data, labels, metric='euclidean')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**add cluster labels to data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "data1 = data.copy()\n",
    "data1['cluster'] = labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aapp</th>\n",
       "      <th>acab</th>\n",
       "      <th>acty</th>\n",
       "      <th>aggp</th>\n",
       "      <th>amas</th>\n",
       "      <th>amph</th>\n",
       "      <th>anab</th>\n",
       "      <th>anim</th>\n",
       "      <th>anst</th>\n",
       "      <th>antb</th>\n",
       "      <th>...</th>\n",
       "      <th>socb</th>\n",
       "      <th>sosy</th>\n",
       "      <th>spco</th>\n",
       "      <th>tisu</th>\n",
       "      <th>tmco</th>\n",
       "      <th>topp</th>\n",
       "      <th>virs</th>\n",
       "      <th>vita</th>\n",
       "      <th>vtbt</th>\n",
       "      <th>cluster</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.100000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.333333</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 128 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       aapp  acab  acty  aggp  amas  amph  anab  anim  anst  antb  ...  socb  \\\n",
       "1  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "2  0.100000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "3  0.333333   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "4  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "5  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "\n",
       "   sosy  spco  tisu  tmco  topp  virs  vita  vtbt  cluster  \n",
       "1   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       28  \n",
       "2   0.0   0.0   0.0   0.0  0.05   0.0   0.0     0       28  \n",
       "3   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       28  \n",
       "4   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0      165  \n",
       "5   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       25  \n",
       "\n",
       "[5 rows x 128 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**show the label distribution**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x=\"cluster\", data=data1, palette=\"Greens_d\")\n",
    "plt.ylim(0,1000)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**check the cluster with raw sentences**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "clustering_results = []\n",
    "with open(\"./data/criteria_sentences_preprocessed_metamap_filter(19185).json\", \"r\", encoding=\"utf-8\") as f:\n",
    "    criteria_info = json.load(f)\n",
    "    for criteria in criteria_info[\"criteria\"]:\n",
    "        no = int(criteria[\"No.\"])\n",
    "        criteria_sentence = criteria[\"criteria_sentence\"]\n",
    "        cluster_ = data1.loc[no, \"cluster\"]\n",
    "        clustering_results.append([cluster_, no, criteria_sentence])\n",
    "clustering_results_sort = sorted(clustering_results, key=lambda x:x[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cluster: 22, raw ID: 63, raw sentence: (5) 正在参与其他药物临床试验的患者。\n",
      "Cluster: 22, raw ID: 297, raw sentence: 7、参与其他临床研究治疗\n",
      "Cluster: 22, raw ID: 905, raw sentence: ⑧正在参加其他临床试验的患者;\n",
      "Cluster: 22, raw ID: 1548, raw sentence: 2. 正在参加其他药物临床试验患者;\n",
      "Cluster: 22, raw ID: 2250, raw sentence: 7. 患者同时正参加其他临床试验研究。\n",
      "Cluster: 22, raw ID: 3005, raw sentence: (5)病人参与其他试验。\n",
      "Cluster: 22, raw ID: 3034, raw sentence: (8)正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 3092, raw sentence: (3)正在其它临床试验中\n",
      "Cluster: 22, raw ID: 3121, raw sentence: (8)正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 3679, raw sentence: 1. 正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 5195, raw sentence: 8.正参加其它临床试验者。\n",
      "Cluster: 22, raw ID: 5271, raw sentence: (5)正在参加其它临床实验的患者。\n",
      "Cluster: 22, raw ID: 6249, raw sentence: 8、正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 7731, raw sentence: H.伴随其他抗肿瘤治疗或正在参加其他临床试验;\n",
      "Cluster: 22, raw ID: 9178, raw sentence: 9)同时参加其它临床试验者。\n",
      "Cluster: 22, raw ID: 13307, raw sentence: b)患者同时参加了其他临床研究\n",
      "Cluster: 22, raw ID: 13527, raw sentence: 14)同时参与其他临床研究者\n",
      "Cluster: 22, raw ID: 13918, raw sentence: 2.正参加其他临床试验,或瘤体接受其他有创或无创治疗\n",
      "Cluster: 22, raw ID: 15063, raw sentence: (8)近期参加其他临床试验者。\n",
      "Cluster: 22, raw ID: 15272, raw sentence: 1)诊断需进行PCI;\n",
      "Cluster: 22, raw ID: 16973, raw sentence: k) 正在参加其它临床试验者\n",
      "Cluster: 22, raw ID: 17864, raw sentence: (4)正在其它临床试验中;\n",
      "Cluster: 22, raw ID: 18382, raw sentence: ⑤正在参加影响本研究结果评价的其他临床试验者。\n",
      "Cluster: 22, raw ID: 18807, raw sentence: 6、正参加其它临床试验的患者;\n"
     ]
    }
   ],
   "source": [
    "for i in clustering_results_sort:\n",
    "    if i[0] == 22:\n",
    "        print(\"Cluster: {}, raw ID: {}, raw sentence: {}\".format(i[0], i[1], i[2].strip()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"./data/hierarchical_cluster_results.csv\", \"w\", newline=\"\", encoding=\"utf-8\") as f:\n",
    "    csv_writer = csv.writer(f)\n",
    "    csv_writer.writerows(clustering_results_sort)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}