[2d970e]: / clustering / Hierarchical Clustering.ipynb

Download this file

802 lines (801 with data), 69.7 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Agglomerative Hierarchical Clustering for Eligibility Criteria Sentences</h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Agglomerative Hierarchical Clustering with Python using Scipy and Scikit-learn package.\n",
    "* We used the hierarchical agglomerative clustering algorithm, works in a “bottom-up” manner, to cluster established semantic feature matrix and generate clusters based on criteria sentences similarity.\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "load packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from sklearn import metrics\n",
    "from scipy.cluster import hierarchy\n",
    "from scipy.cluster.hierarchy import dendrogram\n",
    "from sklearn.cluster import AgglomerativeClustering\n",
    "\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**load feature data (UMLS semantic feature based)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('./data/feature_matrix_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of data:  (19185, 127)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aapp</th>\n",
       "      <th>acab</th>\n",
       "      <th>acty</th>\n",
       "      <th>aggp</th>\n",
       "      <th>amas</th>\n",
       "      <th>amph</th>\n",
       "      <th>anab</th>\n",
       "      <th>anim</th>\n",
       "      <th>anst</th>\n",
       "      <th>antb</th>\n",
       "      <th>...</th>\n",
       "      <th>shro</th>\n",
       "      <th>socb</th>\n",
       "      <th>sosy</th>\n",
       "      <th>spco</th>\n",
       "      <th>tisu</th>\n",
       "      <th>tmco</th>\n",
       "      <th>topp</th>\n",
       "      <th>virs</th>\n",
       "      <th>vita</th>\n",
       "      <th>vtbt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.100000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.333333</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 127 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       aapp  acab  acty  aggp  amas  amph  anab  anim  anst  antb  ...  shro  \\\n",
       "1  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "2  0.100000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "3  0.333333   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "4  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "5  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...     0   \n",
       "\n",
       "   socb  sosy  spco  tisu  tmco  topp  virs  vita  vtbt  \n",
       "1   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "2   0.0   0.0   0.0   0.0   0.0  0.05   0.0   0.0     0  \n",
       "3   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "4   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "5   0.0   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0  \n",
       "\n",
       "[5 rows x 127 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print (\"Shape of data: \", data.shape)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**To check how good our clustering we use the Silhouette coefficient**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5 0.13493583340566784\n",
      "0.55 0.11100734584557431\n",
      "0.6 0.08518307406187034\n",
      "0.65 0.03891117895152641\n",
      "0.7 0.05687572605457083\n",
      "0.75 0.05831862225544753\n",
      "0.8 0.09151479032132039\n",
      "0.85 0.10091343110972859\n",
      "0.9 0.1251138667772942\n"
     ]
    }
   ],
   "source": [
    "# n_clusters: it must be None if distance_threshold is not None\n",
    "# affinity: euclidean,\n",
    "# linkage: average uses the average of the distances of each observation of the two sets.\n",
    "# distance_threshold: The linkage distance threshold above which, clusters will not be merged\n",
    "silhouette_scores = list()\n",
    "labels_ = list()\n",
    "for d in [0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90]:\n",
    "    model = AgglomerativeClustering(n_clusters=None, affinity=\"euclidean\", linkage=\"average\", distance_threshold=d)\n",
    "    clustering = model.fit(data)\n",
    "    labels = clustering.labels_\n",
    "    labels_.append(labels)\n",
    "    ss = metrics.silhouette_score(data, labels, metric='euclidean')\n",
    "    silhouette_scores.append(ss)\n",
    "    print(d, ss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEXCAYAAABYsbiOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAA380lEQVR4nO3dd3gU5drH8e+dQkLogYAICaEjTUoodsQCVix09NiOvbdjPedYjx6PCmIvrxUpYsWKYkWlhd4hhA5C6J20+/1jJrrGbLIL2czu5v5c117ZnbLzyxD23pln5nlEVTHGGGNKEuN1AGOMMeHLioQxxhi/rEgYY4zxy4qEMcYYv6xIGGOM8cuKhDHGGL+sSJjDJiLDRORrn9cqIi3c52+KyCPepQtvItJaROaIyG4RuUlEqorIpyKyU0TGF9+3pbzPvSLyWkVkNpWLFQkTEBE5XkR+dT+8tonILyLSDUBV31XV073O6EtEfhCRvxeb9nvxCiP/AL5X1RqqOhLoDzQA6qrqgED3rar+R1X/XtZyZRGRdHc/xR3ue5noYEXClElEagKfAc8CyUAj4EHgoJe5okQTYGGx18tUNd+jPJ6zAhVmVNUe9ij1AWQAO0qZfynws89rBVq4z98Engc+B3YD04DmPsseC8wAdro/j/WZtwo41ef1A8Aon9c9gV+BHcBcoJc7/VGgADgA7AGeA35yc+11pw1ylz0bmOO+x69Ax1J+z3bAN8A2YBNwrzs9ARgBbHAfI4AEn/VK3AbwXbGcY4BcIM99fUUJ+9ZfhoD2jTvvB+Bh4Bf33+RroJ47b427n/a4j2NK2A/dgUxgl5vhaZ95x/tsdy1wqTu9FvA2kAOsBu4HYnz+fn4BhgNbgUfcffqkm2cT8BJQ1ev/C5Xx4XkAe4T/A6jp/ud9CzgDqFNsfvEPsuJFYqv7wRIHvAuMdeclA9uBi915Q9zXdd35q/BTJHCOZrYCZ+IcEZ/mvk5x5/8A/L1Yzt9zua87A5uBHkAscIm7zYQS9kENYCNwO5Dovu7hznsImArUB1LcD8mHA9lG8Zz89cP+931bRoZg980KoBVQ1X39uDsv3d1PcaX8PUwBLnafVwd6us+b4BSdIUA8UBfo5M57G/jEzZwOLAOu8Pkd84Ebcf4OquIUjAk4fyM1gE+Bx7z+v1AZH3a6yZRJVXfhfENU4FUgR0QmiEiDAN/iI1Wdrs4plHeBTu70s4DlqvqOquar6hhgCXBOAO95EfCFqn6hqoWq+g3Ot9szA//NuAp4WVWnqWqBqr6FcwqtZwnLng38pqpPqeoBVd2tqtPcecOAh1R1s6rm4JyKu/gQtlGW0jL4CmTfvKGqy1R1P/Aef/ybBCIPaCEi9VR1j6pOdacPBSap6hhVzVPVrao6R0RigcHAPW7mVcBT/LGPADao6rPu38gBnP12q6puU9XdwH/c9zAVzIqECYiqLlbVS1W1MdAeOBLntEogfvN5vg/n2yfue6wutuxqnG/CZWkCDBCRHUUPnELWMMBMRe9xe7H3SHVzFZeK8+27JMV/j9U+7xHMNspSWgZfgewbf/8mgbgC5yhkiYjMEJGzy8hXD+fIovg+8v13XuvzPAVIAmb65P/KnW4qmDUQmaCp6hIReRO4+jDfagPOB5qvNJwPBHDaD5J85h3h83wt8I6qXukvZgDbXws8qqqPBrisv2+yRb9HUQN0mjst2G0cTobiy5W2b0pT5n5T1eXAEBGJAS4A3heRuu52u5ewyhaco48mwCJ3Whqw3s92twD7gXaq6ruM8YAdSZgyiUgbEbldRBq7r1NxzjtPLX3NMn0BtBKRoSISJyKDgLY4V1KB09g7WETiRSQD5/LQIqOAc0Skj4jEikiiiPQqyojT2Nms2PaKT3sVuEZEeoijmoicJSI1Ssj6GdBQRG4RkQQRqSEiPdx5Y4D7RSRFROoB/3LzBbuNspSWwVdZ+6Y0OUAhf913vxORi0QkRVULcRqocdd5FzhVRAa6/551RaSTqhbgnNJ61M3cBLiNP/bRn7jv+yowXETqu9tsJCJ9AshvypkVCROI3TgNr9NEZC9OcViA04B6yFR1K8559ttxGlb/AZytqlvcRf4JNMdpzH4QGO2z7lqgH3AvzgfbWuBO/vibfgboLyLbRWSkO+0B4C33FMZAVc0ErsS5+mk7kIXTiFpS1t04DcDn4JyqWQ6c7M5+BOec/zxgPjDLnUYw2yhLGRl8lytr35S2jX04V4f94u6nktpO+gILRWQPzn4erKr7VXUNTrvH7ThXX80BjnbXuRHnyDAb+Bnn3/L1UqLchbOvporILmAS0Lqs/Kb8iaoNOmSMMaZkdiRhjDHGLysSxhhj/LIiYYwxxi8rEsYYY/yKqvsk6tWrp+np6V7HMMaYiDJz5swtqlrizYpRVSTS09PJzMz0OoYxxkQUESne88Hv7HSTMcYYv6xIGGOM8cuKhDHGGL+sSBhjjPHLioQxxhi/rEgYY4zxy4qEMcYYv6xIAHkFhbzwQxZ7D+Z7HcUYY8KKFQlg/vqdPDlxKfd8OB/rOt0YY/5gRQLoklaH209vzYS5G3h7it8bD40xptKxIuG69qTmnNKmPo98vohZa7Z7HccYY8KCFQlXTIzw9MBOHFErkevfncXWPQe9jmSMMZ6zIuGjVlI8Lw7ryta9udw8dg4FhdY+YYyp3KxIFNO+US0e7teOn7O28MykZV7HMcYYT1mRKMGgbmkM6NqYkd9l8d2STV7HMcYYz1iR8OPh89pzVMOa3DpuLmu37fM6jjHGeMKKhB+J8bG8dFEXClW59t2ZHMgr8DqSMcZUOCsSpWhStxpPD+zEgvW7ePDTRV7HMcaYCmdFogyntW3Atb2aM2b6GsZnrvU6jjHGVCgrEgG4/bRWHNOsLvd/vIBFG3Z5HccYYyqMFYkAxMXGMHJIZ2onxXPtuzPZuT/P60jGGFMhrEgEKKVGAs8P7cL67fu5Y/xc6wjQGFMpWJEIQkZ6MveceRTfLNrEyz9lex3HGGNCzopEkC4/Lp2zOjTkia+WMGXFVq/jGGNMSFmRCJKI8N/+HUmvV40bx8xm064DXkcyxpiQsSJxCKonxPHyRV3Zl5vPDaNnkVdQ6HUkY4wJCSsSh6hlgxo8dkEHZqzazn+/XOJ1HGOMCQkrEoehX6dGXHJME177eSVfzN/odRxjTCW1eOMutu3NDcl7W5E4TPed1ZZOqbX5x/vzWJGzx+s4xphKZuueg1z+5gyue3dmSN7fisRhqhIXwwvDulAlLoZrR81kX26+15GMMZVEQaFyy7g5bN2by/1ntQ3JNqxIlIMja1flmcGdWL55D/d+ON9utDPGVIjnvsti8vItPHhuO9o3qhWSbViRKCcntEzhtlNb8fGcDYyatsbrOMaYKDd5eQ4jvl3GBV0aMbhbasi2Y0WiHF1/cgtObp3CQ58uZM7aHV7HMcZEqY0793Pz2Dm0rF+dR85rj4iEbFtWJMpRTIwwfFAnGtRM5LpRM0N2tYExpvLKKyjkhtGzOZhXwIsXdSWpSlxIt2dFopzVTqrCi8O6smVPLjePnU1BobVPGGPKzxNfLWHm6u08fmFHmqdUD/n2rEiEQIfGtXiwXzsmL9/CyG+Xex3HGBMlvlrwG69OXsklxzThnKOPrJBthrxIiEhfEVkqIlkicncJ808UkVkiki8i/X2mdxKRKSKyUETmicigUGctT4O7pdK/a2NGfrec75du9jqOMSbCrd66lzvHz+XoxrW496yjKmy7IS0SIhILPA+cAbQFhohI8Yt51wCXAqOLTd8H/E1V2wF9gREiUjuUecuTiPBwv/a0blCDW8fNYd32fV5HMsZEqAN5BVw7ahYxMcLzw7qQEBdbYdsO9ZFEdyBLVbNVNRcYC/TzXUBVV6nqPKCw2PRlqrrcfb4B2AykhDhvuapaJZaXLupKQYFy3buzOJhf4HUkY0wEevDThSzauIvhg46mcZ2kCt12qItEI2Ctz+t17rSgiEh3oAqwooR5V4lIpohk5uTkHHLQUEmvV42nBh7NvHU7eejTRV7HMcZEmA9mrmPM9LVcf3JzerdpUOHbD/uGaxFpCLwDXKaqf+mTW1VfUdUMVc1ISQnPA43T2x3BNSc1591pa/hg5jqv4xhjIsSS33Zx38fz6dksmVtPbeVJhlAXifWA762Ajd1pARGRmsDnwH2qOrWcs1WoO05vRc9mydz38XyW/LbL6zjGmDC352A+1707ixqJ8Ywc0pm4WG++04d6qzOAliLSVESqAIOBCYGs6C7/EfC2qr4fwowVIi42hpFDOlMzMZ5rR81i14E8ryMZY8KUqnLXB/NYtWUvzw7pTP0aiZ5lCWmRUNV84AZgIrAYeE9VF4rIQyJyLoCIdBORdcAA4GURWeiuPhA4EbhUROa4j06hzBtq9Wsk8vywLqzZto87x8+1jgCNMSV6e8pqPp+3kTv7tKFns7qeZpFo+qDKyMjQzMxMr2OU6bXJ2Tzy+WLuPbMNV53Y3Os4xpgwMmftDga89Csntkzh1b9lEBMTun6ZiojITFXNKGle2DdcR6Mrjm/KGe2P4L9fLWVa9lav4xhjwsT2vblc/+4sGtRM5KmBR1dIgSiLFQkPiAhP9O9Ik+Qkbhgzm827DngdyRjjscJC5bb35pCz+yAvDOtC7aQqXkcCrEh4pkZiPC9e1JU9B/K5YfRs8gr+cnWvMaYSefHHFXy/NId/ntOWjo1rex3nd1YkPNT6iBo8dkEHpq/axv8mLvU6jjHGI7+u2MJTXy/l3KOP5KIeaV7H+RMrEh47r3MjLu7ZhFd+yuarBRu9jmOMqWCbdx3gpjFzaJZSnccu6BDSAYQOhRWJMHD/2UdxdGpt7hg/j+ycPV7HMcZUkPyCQm4YM5u9B/N5cVgXqiWEdgChQ2FFIgwkxMXywrAuxMcK146axb7cfK8jGWMqwJNfL2P6ym08dkEHWjao4XWcElmRCBONalflmcGdWbZ5N/d/tMButDMmyk1atImXflzB0B5pnNc56H5PK4wViTByYqsUbjmlFR/OXs/o6Wu8jmOMCZG12/Zx23tzaN+oJv86u/gQO+HFikSYubF3C3q1TuHBCYuYu3aH13GMMeXsYH4B1707CwVeGNqVxPiKG0DoUFiRCDMxMcLwgZ1IqZHAde/OYvveXK8jGWPK0cOfLWL++p08PbATaXUrdgChQ2FFIgzVqVaFFy/qQs7ug9wybg6FhdY+YUw0+GTOekZNXcPVJzbjtLYVP4DQobAiEaY6Nq7Nv89ty4/Lcnjxx78MyGeMiTBZm3dzz4fz6Z6ezB19WnsdJ2BWJMLY0O5pnN2xIcO/WcYca58wJmLtPZjPNaNmkVQllmeHdibeowGEDkXkJK2ERIRHz+9Ag5qJ3Dx2NnsO2v0TxkQaVeW+j+aTnbOHZwZ3pkFN7wYQOhRWJMJcrarxDB/UibXb9vHAhIVlr2CMCSujp6/h4zkbuPXUVhzXop7XcYJmRSICdG+azPUnt+D9mev4bN4Gr+MYYwI0f91OHpywiF6tU7j+5BZexzkkViQixE2ntKRzWm3u+XA+63fs9zqOMaYMO/flcd3omdSrXoXhAzuFxQBCh8KKRISIj43hmUGdUYVbx86hwC6LNSZsqSq3j5/LbzsP8NywLtSpFh4DCB0KKxIRJK1uEg/1a8f0Vdt44fssr+MYY/x45adsJi3exL1nHkWXtDpexzksViQizPmdG3Hu0Ucy4tvlzFqz3es4xphipmVv5YmJSzmrQ0MuPTbd6ziH7ZCKhIjUEZGO5R3GlE1EeOT89jSslcgtY+ew+0Ce15GMMa6c3Qe5ccxs0pKTePzC8BtA6FAEXCRE5AcRqSkiycAs4FUReTp00Yw/NRPjGTGoE+u27+PfdlmsMWGhoFC5eexsdh3I48WLulAjMd7rSOUimCOJWqq6C7gAeFtVewCnhiaWKUtGejI39m7Jh7PW88mc9V7HMabSGzFpGb+u2MrD/drT5oiaXscpN8EUiTgRaQgMBD4LUR4ThBt7t6Brkzrc/9EC1m7b53UcYyqt75du5tnvshiUkcqAjFSv45SrYIrEQ8BEYIWqzhCRZsDy0MQygYiLjWHEoE4A3DpuDvkFhd4GMqYSWr9jP7eOm8NRDWvyYL92XscpdwEXCVUdr6odVfVa93W2ql4YumgmEKnJSTxyfnsyV2/n+e+tt1hjKlJufiHXvTuLggLlhWFdwn4AoUMRTMN1KxH5VkQWuK87isj9oYtmAtWvUyPO79yIkd8tZ+ZquyzWmIryny8WM3ftDv43oCNN61XzOk5IBHO66VXgHiAPQFXnAYNDEcoE76F+7TiyduLvV1cYY0Lrs3kbePPXVVxxfFP6tm/odZyQCaZIJKnq9GLTrO/qMFEjMZ4RgzqzcecB/vXxAq/jGBPVVuTs4a7359ElrTZ3n9HG6zghFUyR2CIizQEFEJH+wMaQpDKHpGuTOtzUuyUfz9nAx7PtslhjQmF/bgHXjZpFQnwszw3tElEDCB2KYH6764GXgTYish64BbgmFKHMobv+5OZ0S6/D/R/bZbHGlDdV5f6PF7Bs825GDOrEkbWreh0p5AIqEiISC1ynqqcCKUAbVT1eVVcHsG5fEVkqIlkicncJ808UkVkiku8enfjOu0RElruPSwL8nSq1uNgYhg/qhAjcPHa2XRZrTDlRVUZNXc0Hs9ZxU++WnNgqxetIFSKgIqGqBcDx7vO9qro7kPXc4vI8cAbQFhgiIm2LLbYGuBQYXWzdZODfQA+gO/BvEYns7hQrSOM6STx6fgdmrdnBs99Zb7HGHK5fV2xh4MtT+OcnCzmhZT1uOqWl15EqTFwQy84WkQnAeGBv0URV/bCUdboDWaqaDSAiY4F+wCKf9Ve584p/5e0DfKOq29z53wB9gTFBZK60zj36SH5Yuplnv1vO8S3r0S092etIxkScGau28fTXy5iSvZUGNRN4uF87BnZLJTZCBxA6FMEUiURgK9DbZ5oCpRWJRsBan9frcI4MAlHSuo2KLyQiVwFXAaSlpQX41pXDQ/3ak7lqO7eMncMXN59ArarR0eGYMaE2c/V2RkxaxuTlW6hXPYF/n9OWId3TovJmubIEXCRU9bJQBjlUqvoK8ApARkaGDdfmo3pCHM8M7kT/l6bwz48X8MzgTlHRdbExoTJ37Q6GT1rGD0tzqFutCvedeRQX9WxC1SqVrzgUCbhIiEhj4FngOHfSZOBmVV1XymrrAd/erhq70wKxHuhVbN0fAlzXuDqn1eHWU1vy5NfL6NU6hQu6NPY6kjFhZ8H6nYyYtJxJizdROymeu/q24W/HNKFaQjAnW6JTMHvgDZzG5QHu64vcaaeVss4MoKWINMX50B8MDA1wexOB//g0Vp+Oc8e3CdK1vVrw0/It/OuThXRtUocmdaOz+wBjgrXkt12M+GY5Xy38jZqJcdxxeisuOTY9asaCKA/B3CeRoqpvqGq++3gT53JYv1Q1H7gB5wN/MfCeqi4UkYdE5FwAEekmIutwis/LIrLQXXcb8DBOoZkBPFTUiG2CExsjDB/UiRiBm8fOIc8uizWVXNbm3Vw/ehZ9R0zml6wt3HxKSybf1Zsbere0AlGMqAZ2Gl9EvsU5cii6umgIcJmqnhKibEHLyMjQzMxMr2OErc/mbeCG0bO5sXcLbj+9tddxjKlw2Tl7GPntcj6Zu4Gk+FguO64pfz+hKbWTqngdzVMiMlNVM0qaF8zppstx2iSG41zV9CsQlo3ZpmRndzySH5fm8Pz3WZzQMoXuTe2yWFM5rN66l5HfZvHR7HUkxMVy9YnNuerEZiRXq9zFIRABH0lEAjuSKNveg/mcNXIyeQVql8WaqLd22z6e/z6L8TPXERcjXNyzCdf0ak696gleRwsrpR1JBDOexFsiUtvndR0Reb0c8pkKVC0hjmcGd2bTrgPc+9F8oulLgjFFNuzYz30fzaf3Uz/w4ez1XNyzCZP/cTL3n93WCkSQgjnd1FFVdxS9UNXtItK5/COZUDs6tTa3nd6KJ75aSq9WKVE3Jq+pvDbtOsAL32cxZvpaFGVwtzSuO7k5DWtFf0d8oRJMkYgRkTqquh1+71vJLiKOUFef2JyfluXw7wkL6ZaeTHqUjqplKoec3Qd56ccVjJq6moJCZUBGKtef3JzGdZK8jhbxgvmQfwqYIiLjAQH6A4+GJJUJuaLLYvuOmMzNY2fz/rXHRn2/+Cb6bN1zkFd+yuatKavIK1Au6NyIG3u3JK2uFYfyEky3HG+LSCZ/9N10gaouKm0dE94a1qrK4xd04Np3ZzFi0jLu7BPdI2yZ6LFjXy6vTs7mjV9WcSCvgPM6NeLGU1pG7TjTXgqmW47mwApVXSQivYBTRWSDbzuFiTxndGjIoIxUXvhhBSe0TKFns7peRzLGr5378/i/n1fy+s8r2Zubz9kdj+TmU1rSon51r6NFrWBON30AZIhIC5wR6ibgdNNxZiiCmYrzr3PaMmPVNm4dN4cvbz6h0t9YZMLP7gN5vPHLKl6dnM3uA/mc2eEIbj6lFa2PqOF1tKgXTJEoVNV8EbkAeE5VnxWR2aEKZipO0WWxF7z4C/d+NJ/nh3ax3mJNWNh7MJ+3pqzilZ+y2bEvj9PbNuCWU1vR9siaXkerNIIpEnkiMgT4G3COO83uxIoSHRrX4vbTW/P4l0sYn7mOgd3ssljjnf25BbwzdRUv/ZjNtr259G5Tn1tPbUWHxrW8jlbpBFMkLgOuAR5V1ZVuz67vhCaW8cJVJzTjp2U5PPDpQro1TbZGQOOJg/kFnPvczyzfvIcTW6Vw66kt6ZxmIxd7JeBrHlV1karepKpj3NcrVfW/RfNF5INQBDQVJyZGeHpgJ6rExXDz2Nnk5ltvsabivTNlNcs37+GFYV14+/LuViA8Vp4Xxjcrx/cyHjmiViKPX9CReet28vQ3y7yOYyqZ7XtzGfntck5qlcKZHRp6HcdQvkXCOgGKEn3bH8GQ7mm8/NMKfl2xxes4phJ59rss9hzM594zj/I6inHZLbamRP88+yia1qvGbePmsn1vrtdxTCWwaste3pm6ikHdUu3S1jBSnkXCrpmMIklV4hg5uDNb9x7kng+tt1gTev/9agnxsTHcelorr6MYH0EVCRGpKiL+hjS7qxzymDDSvlEt/tGnDV8t/I1xM9Z6HcdEsRmrtvHlgt+45qTm1K+R6HUc4yOY8STOAeYAX7mvO4nIhKL5qvp1uacznrvi+Kac0LIeD366iBU5e7yOY6KQqvLI54s5omYiV55g17+Em2COJB4AugM7AFR1DtC03BOZsBITIzw54GgS4+2yWBMan87byNy1O7ijT2uqVon1Oo4pJpgikaeqO4tNsxPVlUCDmok80f9oFqzfxVNfL/U6jokiB/IK+O+XS2jbsCYXdG7kdRxTgmCKxEIRGQrEikhLEXkW+DVEuUyYOa1tA4b1SOPln7L5JcsuizXl461fV7F+x37uP+soYmLs2pdwFEyRuBFoBxzE6f11J3BzKEKZ8HT/WW1pUb86t703h212Waw5TNv25vLc91n0blOfY1vU8zqO8SOYInGWqt6nqt3cx/3AuaEKZsJP1SqxjBzcme1787jrg3l2Waw5LCO/Xc6+3ALuPdMGuwpnwRSJewKcZqJY2yNr8o++rflm0SYmzN3gdRwTobJz9jBq6moGd0ulRX27cS6cldkLrIicgTOwUCMRGekzqyaQH6pgJnxdflxTRk9fw9tTVtOvkzU2muA9/uUSEuNj7ca5CBDIkcQGIBM4AMz0eUwA+oQumglXMTHC0O5pzFy9naW/7fY6jokwU7O38vWiTVzbqzn1qid4HceUocwioapzVfUt4HlVfcvn8SHOAESmErqgS2OqxMYwZvoar6OYCFJYqPzni8U0rJXIFcfbbVaRIJg2icElTLu0nHKYCJNcrQp92x/Bh7PWsT+3wOs4JkJMmLuBeet2cmef1iTG241zkaDMIiEiQ0TkU6CpiEzweXwPbAt9RBOuhvZIY9eBfL6Yv9HrKCYCHMgr4H8Tl9K+UU3Os7asiBHI8KW/AhuBesBTPtN3A/NCEcpEhh5Nk2lWrxpjpq/hwq6NvY5jwtzrv6xk/Y79PDngaLtxLoIE0iaxWlV/UNVjgFVAvKr+CCwGqoY4nwljIsKQ7mlkrt7Osk3WgG3827rnIC98v4JTj2rAMc3reh3HBCGYXmCvBN4HXnYnNQY+DmC9viKyVESyROTuEuYniMg4d/40EUl3p8eLyFsiMl9EFouI3ZMRhi7s6jRgj55mDdjGvxGTlrM/r4C7z7Ab5yJNMA3X1wPHAbsAVHU5UL+0FUQkFngeOANoCwwRkbbFFrsC2K6qLYDhwH/d6QOABFXtAHQFri4qICZ8JFerQh+3AftAnjVgm7/K2ryH0dPXMKxHGi3qV/c6jglSMEXioKr+3mGPiMRRdi+w3YEsVc121x0L9Cu2TD/gLff5+8ApIiLue1dzt1MVyMUtUCa8DO1uDdjGv8e/XExSfCw3n9LS6yjmEARTJH4UkXuBqiJyGjAe+LSMdRoBvkOarXOnlbiMqubjdBxYF6dg7MVpNF8DPKmqf7maSkSuEpFMEcnMyckJ4tcx5aVns2Sa1qtmp5zMX/y6YguTFm/mupNbUNdunItIwRSJu4EcYD5wNfAFcH8oQrm6AwXAkTiDG90uIn8ZtkpVX1HVDFXNSElJCWEc44/TgJ1qDdjmT4punGtUuyqXHZfudRxziAIuEqpaqKqvquoAVe3vPi/rdNN6INXndWN3WonLuKeWagFbgaHAV6qap6qbgV+AjEDzmop1od2BbYr5eM56FqzfxT/62o1zkSyYq5tWikh28UcZq80AWopIUxGpgnPX9oRiy0wALnGf9we+c4vPGqC3u+1qQE9gSaB5TcWqWz3BbcBebw3Yhv25zo1zHRvX4pyOR3odxxyGYE43ZQDd3McJwEhgVGkruG0MNwATce6reE9VF4rIQyJSNBbF/wF1RSQLuA3ntBY4V0VVF5GFOMXmDVW1m/fC2JDuqezcn2cN2IbXf1nJxp0HuO9MG3Eu0snhDBwjIjNVtWs55jksGRkZmpmZ6XWMSktVOfnJH0ipkcD4a471Oo7xSM7ug/T63/cc16Ier/zNzhBHAvezvMR/rEC65Sh6ky4+L2NwjiwCXt9Ev6I7sB/7cgnLN+2mZQMbTKYyGjFpGQfzC+3GuSgRzOmmp3wej+Hc4DYwFKFM5OrftTHxscKY6WvLXthEneWbdjNm+hou6tmEZil241w0CPhIQFVPDmUQEx3qVk+gT7sj+GDWOruqpRJ67MslVEuI4ya7cS5qBHN1Uy0RebroxjUReUpEaoUynIlMQ7unsXN/Hl8usAbsyuSXrC18t2QzN/ZuQXK1Kl7HMeUkmNNNr+N0Dz7QfewC3ghFKBPZejarS3rdJMZMs1NOlUVBofLI54tpXKcqfzsm3es4phwFUySaq+q/3X6YslX1QeAvd0AbExPjNGBPX7WNrM12B3Zl8OGsdSzeuIu7+raxU4xRJpgisV9Eji96ISLHAfvLP5KJBhe6Ddij7Wgi6u3PLeDJr5fSKbU2Z3ds6HUcU86CKRLXAM+LyCoRWQ08504z5i/qVU/gdLcB2+7Ajm6vTs5m066D3H/WUTgdOJtoEkzfTXNV9WigI9BBVTur6tzQRTORrqgB+6sFv3kdxYTI5t0HeOnHFZzR/ggy0pO9jmNCIJib6RKAC4F0IK7oG4OqPhSSZCbiHeM2YI+evobzOtvA99Fo+DfLyCso5K6+duNctArmdNMnOAME5eOM81D0MKZEMTHC4O5pTF9pDdjRaOlvuxk3Yy0X90wnvV41r+OYEAmmW43Gqto3ZElMVOrftTFPfb2UMdPX8s+zi49cayLZf75YTPWEOG46pYXXUUwIBXMk8auIdAhZEhOV6lVP4PS21oAdbX5alsOPy3K46ZSW1E6yG+eiWZlFQkTmi8g84HhglogsFZF5PtONKdXQHmns2JfHxIXWgB0NCtwR59KSk7j4mCZexzEhFsjpprNDnsJEtWOa1aVJ3STenbaGfp2sATvSvT9zLUt+283zQ7uQEGc3zkW7QE437S7jYUypYmKEwd2KGrD3eB3HHIa9B/N56utldEmrzZkdjvA6jqkAgRSJmUCm+7P4w0b4MQHp37UxcTHCWBsDO6K98lM2m3cf5L6z2tqNc5VEmaebVLVpRQQx0S2lxh9diN/Rx7oQj0Sbdh3glZ+yOatjQ7o2qeN1HFNBAmm4buP+7FLSI/QRTbQY0j2N7daAHbGe+nopBYXKXX3sxrnKJJCG69uAq3BGpCviOzB273JNZKLWsc3rkpacxGhrwI44izfuYvzMdfz9+Kak1U3yOo6pQGUeSajqVe7TF4F+7gh13wM7gTtCmM1EGecO7FSmrdzGihxrwI4Uqs4lr7WqxnPDyTbiXGUTzM1096vqLre78N7AaziFw5iADeiaag3YEebHZTlMXr6Fm3q3pFZSvNdxTAULpkgU3S57FvCqqn4O2K2WJigpNRI4vV0D3p9pd2BHgvyCQv7zxWLS6yZxUU+7ca4yCqZIrBeRl4FBwBdur7DBrG8MYA3YkWT8zHUs27SHu89oQ5U4++9eGQXzrz4QmAj0UdUdQDJwZyhCmeh2XPN6pCUnMcZOOYW1Pe6Nc93S69Cnnd04V1kFM+jQPlX9UFWXu683qurXoYtmolVRA/bU7G1kWwN22HrlxxVs2WM3zlV2dvxoPFF0B7YdTYSnjTv388rkbM49+kg6pdb2Oo7xkBUJ44n6NRI5ra3TgH0w3xqww81TXy+jUOHOPq29jmI8ZkXCeOaPBuxNXkcxPhZu2MkHs9Zx2XHppCbbjXOVnRUJ45njW9QjNbkqY6bZKadwoao8+vlialeN57peNuKcsSJhPFTUhfiU7K3WgB0mvl+6mV9XbOWWU1tRq6rdOGesSBiPDchwuxCfsdbrKJWec+PcEprVq8bQHmlexzFhIuRFQkT6ukOeZonI3SXMTxCRce78aSKS7jOvo4hMEZGF7nCpiaHOaypW/RqJnHqUNWCHg7Ez1pK12blxLj7Wvj8aR0j/EkQkFngeOANoCwwRkbbFFrsC2K6qLYDhwH/ddeOAUcA1qtoO6AXkhTKv8cbQHmls25vL19aA7ZndB/IYMWkZ3Zsmc1rbBl7HMWEk1F8XugNZqpqtqrnAWKBfsWX6AW+5z98HThHnzp3TgXmqOhdAVbeqqn3VjELHt6hH4zpVGW0N2J556ccVbNmTy/1nHWU3zpk/CXWRaAT4nmxe504rcRlVzcfpgrwu0ApQEZkoIrNE5B8hzmo8EhMjDOluDdhe2bBjP69NXsl5nY6kY+PaXscxYSacTzzGAccDw9yf54vIKcUXEpGrRCRTRDJzcnIqOqMpJwPcO7DHWQN2hXty4lIUuLOvjThn/irURWI9kOrzurE7rcRl3HaIWsBWnKOOn1R1i6ruA74A/jJcqqq+oqoZqpqRkpISgl/BVIT6NZ0G7PHWgF2h5q/byYez13PF8U1pVLuq13FMGApk+NLDMQNoKSJNcYrBYGBosWUmAJcAU4D+wHeqqiIyEfiHiCQBucBJOA3bJkoN6ZHGVwt/45tFmzi745Fex4l4qsq+3AK27c1l+75cn595bN+by7Z9uUxdsZW61apwXa/mXsc1YSqkRUJV80XkBpwuxmOB11V1oYg8BGSq6gTg/4B3RCQL2IZTSFDV7SLyNE6hUeALd6AjE6VOaFGPRrWdBmwrEn91IK/gjw/7vXls25frfNiXUgRy8wtLfK/YGKFOUjzJ1arw2AUdqJFoN86ZkoX6SAJV/QLnVJHvtH/5PD8ADPCz7iicy2BNJeA0YKfy5NfLWLllL03rVfM6UsjkFRSyfZ/7Ye/7Ie9+uDs/8/5UBPbllnwaTgRqV42nTrUqJCdVoXGdqnRoVPP313/66T6vkRhHTIxdxWTKFvIiYUwwBmakMnzScsbOWMM9ZxzldZxysS83n4c/W8SijbvZ7n7g7z6Q73f5GolxJFerQp2kKqTUSKBVgxokV4v/y4d9nSTnZ62q8cTaB74JESsSJqw4Ddj1eT9zHbef1jrih8zcczCfy9+YQebqbRzXoh5N6yb5/bCvUy2e2lWrRPzvbKKLFQkTdoZ0T2Piwk18vei3iG6b2Lk/j0vfmM68dTsZOaRzRP8upvKyrywm7JzQMoVGtatG9Kh1O/blctFr01iwficvDOtiBcJELCsSJuzEug3Yv2RtZdWWvV7HCdrWPQcZ8uo0lm7azcsXd6VPuyO8jmTMIbMiYcLSgIxUYiOwC/HNuw8w+JWprNyyh9f+lkHvNtZZnolsViRMWGpQM5FT2tTn/Zlr/V7rH25+23mAwS9PZf2O/bxxaXdObGU9AJjIZ0XChK0hPdLYsieXbxaFfxfi67bvY+DLU9i8+yBvX96dY5rX9TqSMeXCioQJWydGSAP2mq37GPTyVHbsy2XU33uQkZ7sdSRjyo0VCRO2YmOEwd1S+TlrC6u3hmcDdnbOHga+PIW9ufmMvrInnVJrex3JmHJlRcKEtaIG7DHTw68Be/mm3Qx8eSp5BYWMvaon7RvV8jqSMeXOioQJa0fUSqR3GDZgL964i8GvTCVGYNzVPWlzRE2vIxkTElYkTNgb6jZgT1ocHg3Y89ftZMirU6kSF8O4q4+hRf0aXkcyJmSsSJiwF04N2LPWbGfoa1OpnhDHe1cfE9U91RoDViRMBIiNEQZ1S2Xycm8bsKev3MbFr00juVoVxl19DKnJSZ5lMaaiWJEwEWGgx3dg/5q1hUten06DWom8d/UxNtSnqTSsSJiIUNSAPT6z4huwf1yWw2VvziAtOYlxVx1Dg5qJFbp9Y7xkRcJEjKHdnQbsbyuwAfvbxZu48q1MmqdUZ8xVPUmpkVBh2zYmHFiRMBHjxFYpHFkrkdEV1ID91YKNXP3OTI5qWIPRV/YguVqVCtmuMeHEioSJGE4DdhqTl29hzdZ9Id3WhLkbuH70bI5Orc07f+9B7SQrEKZysiJhIsqgbqnECIydEbqjiQ9mruOWsbPp2qQOb13enZqJ8SHbljHhzoqEiShOA3YD3stcR15B+Tdgj52+hjven8uxzevx1mXdqZ5gI/yays2KhIk4Q3uksmXPQSaVcxfib09Zxd0fzuekVim8dkkGVavEluv7GxOJrEiYiHNSq/o0LOcG7NcmZ/OvTxZyWtsGvHxxVxLjrUAYA1YkTATyvQN77bbDb8B+/vssHvl8MWd1aMgLw7qQEGcFwpgiViRMRCqPBmxVZfg3y/jfxKWc1+lInhncifhY+y9hjC/7H2EiUsNaVendpv4hN2CrKk9MXMoz3y6nf9fGPDWwE3FWIIz5C/tfYSLWkO5p5Ow+GPQd2KrKI58v5sUfVjC0RxpPXNiR2BgJUUpjIpsVCROxTmqV4jZgB97pX2Gh8q9PFvJ/P6/k0mPTefS89sRYgTDGLysSJmLFxca4Ddg5ATVgFxYq9340n3emrubqE5vx73PaImIFwpjSWJEwEW1gRipC2Q3Y+QWF3DF+LmNnrOWm3i24+4w2ViCMCYAVCRPRjqxdlZNbl96AnVdQyC3j5vDh7PXcflorbju9tRUIYwJkRcJEvD8asDf/ZV5ufiE3jJ7FZ/M2cs8ZbbjxlJYeJDQmcoW8SIhIXxFZKiJZInJ3CfMTRGScO3+aiKQXm58mIntE5I5QZzWRqVdrpwG7+BjYB/IKuGbUTCYu3MS/z2nL1Sc19yihMZErpEVCRGKB54EzgLbAEBFpW2yxK4DtqtoCGA78t9j8p4EvQ5nTRLa42BgGZqTyk08D9v7cAq58O5Pvlmzm0fPbc9lxTT1OaUxkCvWRRHcgS1WzVTUXGAv0K7ZMP+At9/n7wCninjAWkfOAlcDCEOc0EW5gN6cBe9yMtezLzefyN2fwc9YWnujfkWE9mngdz5iIFep+kBsBvhexrwN6+FtGVfNFZCdQV0QOAHcBpwF+TzWJyFXAVQBpaWnll9xElEa1q9KrdX3GZa5l2sqtzFy9neEDO3Fe50ZeRzMmooVzw/UDwHBV3VPaQqr6iqpmqGpGSkpKxSQzYWmo24A9e80Onh3SxQqEMeUg1EcS64FUn9eN3WklLbNOROKAWsBWnCOO/iLyBFAbKBSRA6r6XIgzmwjVq3UKF/VMo3eb+vRu08DrOMZEhVAXiRlASxFpilMMBgNDiy0zAbgEmAL0B75TVQVOKFpARB4A9liBMKWJi43hkfM6eB3DmKgS0iLhtjHcAEwEYoHXVXWhiDwEZKrqBOD/gHdEJAvYhlNIjDHGhAFxvrRHh4yMDM3MzPQ6hjHGRBQRmamqGSXNC+eGa2OMMR6zImGMMcYvKxLGGGP8siJhjDHGLysSxhhj/LIiYYwxxq+ougRWRHKA1YfxFvWALeUUpzxZruBYruBYruBEY64mqlpiv0ZRVSQOl4hk+rtW2EuWKziWKziWKziVLZedbjLGGOOXFQljjDF+WZH4s1e8DuCH5QqO5QqO5QpOpcplbRLGGGP8siMJY4wxflmRMMYY41elKBIi0ldElopIlojcXcL8S0UkR0TmuI+/+8y7RESWu49LwihXgc/0CRWZy11moIgsEpGFIjLaZ7pn+6uMXJ7tLxEZ7rPtZSKyw2eel39fpeXycn+licj3IjJbROaJyJk+8+5x11sqIn3CIZeIpIvIfp/99VJ55gowWxMR+dbN9YOINPaZd3h/Y6oa1Q+cwY5WAM2AKsBcoG2xZS4Fnith3WQg2/1Zx31ex+tc7rw9Hu6vlsDson0B1A+T/VViLq/3V7Hlb8QZfMvz/eUvl9f7C6cB9lr3eVtglc/zuUAC0NR9n9gwyJUOLAjF/goi23jgEvd5b+Cd8vobqwxHEt2BLFXNVtVcYCzQL8B1+wDfqOo2Vd0OfAP0DYNcoRRIriuB5919gqpudqd7vb/85QqlYP8dhwBj3Ode7y9/uUIpkFwK1HSf1wI2uM/7AWNV9aCqrgSy3PfzOleoBZKtLfCd+/x7n/mH/TdWGYpEI2Ctz+t17rTiLnQP1d4XkdQg163oXACJIpIpIlNF5LxyyhRorlZAKxH5xd1+3yDW9SIXeLu/AOeUAM434KL/zF7vL3+5wNv99QBwkYisA77AOcoJdF0vcgE0dU9D/SgiJ5RTpmCyzQUucJ+fD9QQkboBrluqylAkAvEpkK6qHXEq7Vse5ylSWq4m6tyCPxQYISLNKzBXHM6pnV4430BfFZHaFbh9f0rL5eX+KjIYeF9VCzzYdmlKyuXl/hoCvKmqjYEzgXdEJBw+q/zl2gikqWpn4DZgtIjULOV9QuEO4CQRmQ2cBKwHyuXvLBx2fKitB3y/gTd2p/1OVbeq6kH35WtA10DX9SgXqrre/ZkN/AB0rqhcON9GJqhqnnvYvwznw9nT/VVKLq/3V5HB/PmUjtf7y18ur/fXFcB77vanAIk4ndd5vb9KzOWe/trqTp+J037QqpxyBZRNVTeo6gVuobrPnbYjkHXLFKrGlnB54Hy7zMY5nC5q9GlXbJmGPs/PB6bqH40+K3EafOq4z5PDIFcdIMF9Xg9YTimNkiHI1Rd4y2f7a4G6YbC//OXydH+5y7UBVuHewBoOf1+l5PL67+tL4FL3+VE45/4FaMefG66zKb+G68PJlVKUA6dxeX15/TsGka0eEOM+fxR4qLz+xsrllwj3B86h4TKcCn+fO+0h4Fz3+WPAQnfnfw+08Vn3cpwGsizgsnDIBRwLzHenzweuqOBcAjwNLHK3PzhM9leJubzeX+7rB4DHS1jXs/3lL5fX+wunEfYXd/tzgNN91r3PXW8pcEY45AIudP+fzgFmAeeUZ64As/XHKebLcM46JJTX35h1y2GMMcavytAmYYwx5hBZkTDGGOOXFQljjDF+WZEwxhjjlxUJY4wxflmRMMYY45cVCRPRROQBEblDRB4SkVNLWe48EWlbkdnKyuB26ZxRzttIF5EFQa7zpoj0L2F6LxH5rPzSmUhkRcJEBVX9l6pOKmWR83BuhvJS0BlEJC40UYwJjBUJE3FE5D53kJyfgdbutN+/DYvI4+IMPDRPRJ4UkWOBc4H/uYPCNBeRK0VkhojMFZEPRCTJ531GisivIpLt+w1bRO4SkfnuOo+705qLyFciMlNEJotIGz+Z/5LBnTVARKa7v88J7rKXisgEEfkO+FZEqonI6+5ys0Wkn7tcO3faHPd3bem+Z6yIvCrOwEtfi0hVd/lObq+u80TkIxGpU0LOviKyRERm8UevoqYyK+/bx+1hj1A+cDo5nA8k4fTtn4XTA+abOF0T1MXpsqGoN4Ha7s83gf4+71PX5/kjwI0+y43H+QLVFqcff4AzgF+BJPd1svvzW6Cl+7wH8F0p2Ytn+AF4yn1+JjDJfX4pTmeFRdv4D3BR0e+D0/VCNeBZYJg7vQpQFWcAnHygkzv9PZ915wEnuc8fAkb45sLpsG4tTqeI4q77mdf/5vbw9mGHsibSnAB8pKr7AOSvQ2vuBA4A/+eeT/d3Tr29iDyC86FbHZjoM+9jVS0EFolIA3faqcAbRdtV1W0iUh2nn6PxIlK0bkKQv8+H7s+ZOB/wRb5R1W3u89OBc0XkDvd1IpAGTAHuE2eoyg9VdbmbY6WqzvF9XxGphVMwf3Snv4VTDH21cdddDiAio4Crgvx9TJSxImGiiqrmi0h34BScb8c34AznWNybwHmqOldELsUZg6LIQZ/ngn8xwA5V7XQYkYu2VcCf/z/uLZbhQlVdWmzdxSIyDTgL+EJErsbpLdQ3fwHOEYYxh8TaJEyk+Qk4T0SqikgN4Bzfme63+1qq+gVwK3C0O2s3UMNn0RrARhGJB4YFsN1vgMt82i6SVXUXsFJEBrjTRESOLuU9imcI1ETgRnEPE0Sks/uzGZCtqiOBT4CO/t5AVXcC231GTbsY+LHYYktwjjqK2kuGHEJWE2WsSJiIoqqzgHE43TV/CcwotkgN4DMRmQf8jDNSGDjjAt/pNvw2B/4JTMPp+nlJANv9CpgAZIrIHJx2EHAKzBUiMhenu+jSxpEuniFQDwPxwDwRWei+BhgILHDztAfeLuN9LsFpOJ8HdMJpl/idqh7AOb30udtwXRFjhJswZ12FG2OM8cuOJIwxxvhlDdfGlDMRuQ8YUGzyeFV91Is8xhwOO91kjDHGLzvdZIwxxi8rEsYYY/yyImGMMcYvKxLGGGP8+n9ytZ1TLTQUrwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot([0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90],silhouette_scores)\n",
    "plt.title(\"Silhouette coefficient score\")\n",
    "plt.xlabel(\"distance_threshold\")\n",
    "plt.ylabel(\"silhouette_scores\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[808, 622, 445, 295, 118, 71, 60, 51, 44]\n"
     ]
    }
   ],
   "source": [
    "labels_count = [len(set(l)) for l in labels_]\n",
    "print(labels_count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAw3ElEQVR4nO3dd3xW9fn/8deVQWJC2BAQEJApoKwoWrVuRWtdddaBVcFatcOfVu20rbaOVqu1WlcVUXDVKrVWS1Frq3WEvQTCUEDZm7ASrt8f55N4wzckdyB3Tsb7+Xicx33uM9/3Lea6zzmfcz7m7oiIiACkxR1ARETqDhUFEREpp6IgIiLlVBRERKScioKIiJRTURARkXIqClKvmNnlZvbfuHPUVWb2lJndHncOqb9UFKRRMjM3sx5x5xCpa1QURKrJzDLizlAbzCw97gxS+1QUpE4ys85m9rKZrTSz1Wb2YAXLdA2/+DMSpr1jZleF8R5m9m8zW29mq8zs+TD93bD4VDPbZGYXhOmnm9kUM1tnZu+b2SEJ211kZjeb2TRgs5llhPdLzWyjmc0xsxP28FmeMrM/mtnfw7Ifmln3JD/D5Wb2npndF3ItMLOvhOmLzWyFmQ3fbZdtzGx82Ne/zaxLwrb7hHlrQubzd8v5sJm9bmabgeOS+68lDYmKgtQ54Rfqa8CnQFegI/DcXmzqV8A/gZZAJ+APAO7+1TB/gLs3dffnzWwQ8GfgaqA18AgwzsyyErZ3EfA1oAXQHbgOONTd84BTgEWVZLkQ+EXIUgTcUY3PMRSYFnKNIfouDgV6AJcAD5pZ04TlLw6fvQ0wBXgWwMxygfFhG+1CpofMrG/Cut8M2fIAXbtphFQUpC46DNgfuMndN7v7Vnffmz9QO4AuwP5JbGMk8Ii7f+jupe4+CtgGHJ6wzAPuvtjdtwClQBbQ18wy3X2Ru8+vZPt/dfeP3L2E6I/0wGp8joXu/qS7lwLPA52BX7r7Nnf/J7CdqECU+bu7v+vu24AfA0eYWWfgdGBR2FaJu08G/gKcl7Duq+7+nrvvdPet1cgoDYSKgtRFnYFPwx/QffFDwICPzGymmV1RybJdgP8XTtGsM7N1Icf+CcssLhtx9yLg+8BtwAoze87MEpfd3bKE8WKg6Z4WrMDyhPEtYf+7T0vcXmLOTcAaos/RBRi622e8GGhf0brSOKkoSF20GDggiQu6m8NrTsK08j9w7r7M3Ue4+/5Ep4UeqqTF0WLgDndvkTDkuPvYhGV2eaSwu49x96OI/tg6cFfVH616n2EvdS4bCaeVWgGfE33Gf+/2GZu6+zUJ6+qxyY2cioLURR8BXwB3mlmumWWb2ZG7L+TuK4GlwCVmlh6OBLqXzTez88ysU3i7lugP3s7wfjlwYMLmHgO+bWZDLZJrZl8zs7yKAppZbzM7Plxz2Er0a31nRctWpqrPsJdOM7OjzKwJ0bWFD9x9MdF1ml5mdqmZZYbhUDM7aB/3Jw2IioLUOeHc+deJzpN/BiwBLtjD4iOAm4DVQD/g/YR5hwIfmtkmYBzwPXdfEObdBowKp1HOd/fCsK0HiQpIEXB5JTGzgDuBVUSnhtoBt1brgyb3GfbGGODnRKeNhhBdjMbdNwInE11g/pwo911En0UEAFMnOyIiUkZHCiIiUk5FQUREyqkoiIhIORUFEREpV68f7NWmTRvv2rVr3DFEROqViRMnrnL3thXNq9dFoWvXrhQWFsYdQ0SkXjGzT/c0T6ePRESknIqCiIiUU1EQEZFyKgoiIlJORUFERMqpKIiISDkVBRERKZfSomBmPwg9Xs0ws7HhufjdQsflRWb2fHjmO2aWFd4XhfldU5VrwcpN3PXGJ+gJsSIiu0pZUTCzjsB3gQJ37w+kEz3H/S7gPnfvQfTc+ivDKlcCa8P0+9i7XqySMmH2Ch5+Zz6P/WdB1QuLiDQiqT59lAHsF7pVzCHqTet44KUwfxRwVhg/M7wnzD/BzCwVoa46uhun9m/Pnf/4hPeLVqViFyIi9VLKioK7LwV+S9Rz1hfAemAisC6hQ/YlQMcw3pHQaXiYvx5ovft2zWykmRWaWeHKlSv3KpuZcc95AziwbVOuGzuZpeu27NV2REQamlSePmpJ9Ou/G7A/kAsM29ftuvuj7l7g7gVt21b4PKekNM3K4JFLh7CjZCfXPDORrTtK9zWaiEi9l8rTRycCC919pbvvAF4GjgRahNNJAJ2IOi0nvHYGCPObE/VZmzLd2zbld+cPYNqS9fz0lRm68CwijV4qi8JnwOFmlhOuDZwAzALeBs4NywwHXg3j48J7wvy3vBb+Sp/crz3XH9+DFycuYcxHn6V6dyIidVoqryl8SHTBeBIwPezrUeBm4AYzKyK6ZvBEWOUJoHWYfgNwS6qy7e77J/bimF5tuW3cTCZ+ura2disiUudYfT5lUlBQ4DXVn8K64u2c8eB7bCsp5W/XH0W7vOwa2a6ISF1jZhPdvaCiebqjOWiR04Q/XTKE9Vt2cN2zk9lRujPuSCIitU5FIUHf/Ztx1zcO4aNFa7jj77PjjiMiUuvqdXecqXDmwI5MXbyeP7+3kAGdm3P2oE5xRxIRqTU6UqjAraf1YWi3Vtz68nRmfr4+7jgiIrVGRaECmelpPPjNwbTYrwlXj57IuuLtcUcSEakVKgp70DYvi4cvGcyKDdu4fuxkSnfW31ZaIiLJUlGoxKADWvKLM/vxn3mruHf8nLjjiIiknIpCFS467AAuPLQzf3x7Pm/MWBZ3HBGRlFJRSMJtZ/RjQKfm3PjiVIpWbIo7johIyqgoJCE7M52HLxlCVkYaV48uZOPWHXFHEhFJCRWFJO3fYj8e/OZgFq0u5sYXp7JTF55FpAFSUaiGI7q35tZT+/DmzOU8/O/5cccREalxKgrVdOVR3fj6gP357T/n8O7cvev5TUSkrlJRqCYz465vHEzv/DyuHzuZxWuK444kIlJjVBT2Qk6TqCtPd+fq0RPZsl1deYpIw6CisJe6tM7l/gsHMXvZBn781+nqylNEGgQVhX1wXJ92/ODEXrw8eSmj3l8UdxwRkX2WsqJgZr3NbErCsMHMvm9mrcxsvJnNC68tw/JmZg+YWZGZTTOzwanKVpOuO64HJx6Uz+1/n81HC9fEHUdEZJ+kso/mOe4+0N0HAkOAYuCvRH0vT3D3nsAEvuyL+VSgZxhGAg+nKltNSksz7r1gAJ1b5fCdZyexbP3WuCOJiOy12jp9dAIw390/Bc4ERoXpo4CzwviZwNMe+QBoYWYdainfPmmWnckjlw6heHsJ1zw7kW0luvAsIvVTbRWFC4GxYTzf3b8I48uA/DDeEVicsM6SMG0XZjbSzArNrHDlyrpzn0Cv/DzuOXcAkz9bx69emxV3HBGRvZLyomBmTYAzgBd3n+dRk51qNdtx90fdvcDdC9q2bVtDKWvG1w7pwNXHHMgzH3zGC4WLq15BRKSOqY0jhVOBSe6+PLxfXnZaKLyuCNOXAp0T1usUptUrN53cmyN7tOYnr8xg2pJ1cccREamW2igKF/HlqSOAccDwMD4ceDVh+mWhFdLhwPqE00z1RkZ6Gn+4aDBtm2bx7dETWb1pW9yRRESSltKiYGa5wEnAywmT7wROMrN5wInhPcDrwAKgCHgM+E4qs6VSq9wm/OmSIazavJ3rx06mpHRn3JFERJKS0qLg7pvdvbW7r0+YttrdT3D3nu5+oruvCdPd3a919+7ufrC7F6YyW6od3Kk5d5zVn/fnr+aeN9WVp4jUDxlxB2jIzivozLQl63nk3QUc3Kk5px+yf9yRREQqpcdcpNhPT+/LkC4t+eFL05izbGPccUREKqWikGJNMtJ46OLB5GZlcPXoQtZvUVeeIlJ3qSjUgvxm2Tx08WCWrN3CDc9PUVeeIlJnqSjUkkO7tuKnp/dlwicr+MNbRXHHERGpkIpCLbrsiC6cM7gjv58wl7c+WV71CiIitUxFoRaZGb8++2D6dmjG956bwqJVm+OOJCKyCxWFWpadmc6fLhlCeppx9eiJbN5WEnckEZFyKgox6Nwqhz9cNIh5KzZy81+mqStPEakzVBRicnTPttx4Sm9em/YFT/x3YdxxREQAFYVYXXNMd07t357f/OMT3p+/Ku44IiIqCnEyM+45bwDd2uRy3ZjJfL5uS9yRRKSRU1GIWdOsDB65dAjbS3ZyzTMT2bpDXXmKSHxUFOqA7m2bcu/5A5i6ZD0/f3WmLjyLSGxUFOqIk/u15/rje/B84WJGvb8o7jgi0kipKNQhPzixFyf1zeeXr83i3bkr444jIo2QikIdkpZm3HfBQHrl53HtmEnMX7kp7kgi0sikujvOFmb2kpl9YmazzewIM2tlZuPNbF54bRmWNTN7wMyKzGyamQ1OZba6qmlWBo8PL6BJehpXjSpkXfH2uCOJSCOS6iOF+4E33L0PMACYDdwCTHD3nsCE8B7gVKBnGEYCD6c4W53VqWUOj1w6hCVri7l2zCR2qI9nEaklKSsKZtYc+CrwBIC7b3f3dcCZwKiw2CjgrDB+JvB06Kv5A6CFmXVIVb66rqBrK3599sG8V7SaX702K+44ItJIpPJIoRuwEnjSzCab2eNmlgvku/sXYZllQH4Y7wgsTlh/SZi2CzMbaWaFZla4cmXDvhh7XkFnRn71QJ7+36eM/uDTuOOISCOQyqKQAQwGHnb3QcBmvjxVBIBHDfKr1Sjf3R919wJ3L2jbtm2Nha2rbh7Wh+P7tOO2cTN5v0iPwhCR1EplUVgCLHH3D8P7l4iKxPKy00LhdUWYvxTonLB+pzCtUUtPM+6/cCDd2+ZyzbOTWKg+GEQkhVJWFNx9GbDYzHqHSScAs4BxwPAwbTjwahgfB1wWWiEdDqxPOM3UqOVlZ/L4ZYeSZnDlqI9Zv2VH3JFEpIFKdeuj64FnzWwaMBD4NXAncJKZzQNODO8BXgcWAEXAY8B3UpytXjmgdQ5/umQIn60u5vqxkylRiyQRSQGrz8/ZKSgo8MLCwrhj1KrnPvqMW16ezreO7MrPv94v7jgiUg+Z2UR3L6hoXqVHCmaWZmbnpyaW7I0LDzuAK47sxpPvLWLsR5/FHUdEGphKi4K77wR+WEtZJEk/Oq0Px/Rqy09fmcEHC1bHHUdEGpBkrin8y8xuNLPO4REVrcysVcqTyR5lpKfxh28OokvrHK55ZiKfrS6OO5KINBDJFIULgGuBd4GJYWhcJ/LroGbZmTwx/FCcqEXSxq1qkSQi+67KouDu3SoYDqyNcFK5rm1yeejiwSxctZnvjp1M6c7622hAROqGKouCmeWY2U/M7NHwvqeZnZ76aJKMr3Rvw21n9OPtOSu58x+z444jIvVcMqePngS2A18J75cCt6cskVTbJYd3YfgRXXjsPwt5oXBx1SuIiOxBMkWhu7vfDewAcPdiwFKaSqrtp6f35agebfjxX6fz8aI1cccRkXoqmaKw3cz2Izy4zsy6A9tSmkqqLSM9jT9+czCdW+Zw9eiJLF6jFkkiUn3JFIWfA28Anc3sWaKOcXTvQh3UPCeTx4cXUFK6k6tGFbJpW0nckUSknkmm9dF44BzgcmAsUODu76Q2luytA9s25Y8XD6Zo5Sa+/5xaJIlI9ST7QLxjiJ5yehxwdOriSE04umdbfnZ6X/41ewX3vDkn7jgiUo9kVLWAmT0E9CA6SgC42sxOdPdrU5pM9sllR3Rh7vKN/Onf8+nZrinfGNIp7kgiUg9UWRSA44GDQi9pmNkoYGZKU8k+MzNuO6MfC1Zu5taXp9O1TS5DurSMO5aI1HHJnD4qAg5IeN85TJM6LjM9jYcuHkyHFtlcPbqQpeu2xB1JROq4PRYFM/ubmY0D8oDZZvaOmb0NzA7TpB5omduEJ4YXsG1H1CJps1okiUglKjt99NtaSyEp1aNdHn/45iCueOpjbnhhCg9fPIS0NN1/KCL/1x6PFNz934kDMBmYnjBUycwWmdl0M5tiZoVhWiszG29m88JryzDdzOwBMysys2lmNnjfP56UObZ3O378tb68OXM5946fG3ccEamjknkg3kgzWwZMI3pkdnUfnX2cuw9M6PrtFmCCu/ckuhHuljD9VKBnGEYCD1djH5KEK47syoWHdubBt4t4dcrSuOOISB2UTOujm4D+7r6qhvZ5JnBsGB8FvAPcHKY/HVo5fWBmLcysg7t/UUP7bfTMjF+e2Z8FqzZz00vT6NI6l4GdW8QdS0TqkGRaH80H9vZBOg7808wmmtnIMC0/4Q/9MiA/jHcEEh/xuSRM20U4cik0s8KVK1fuZazGq0lGGn+6ZAj5zbIY8XQhX6xXiyQR+VIyReFW4H0zeySc83/AzB5IcvtHuftgolND15rZVxNnhqOCaj2Hwd0fdfcCdy9o27ZtdVaVoFVuE54YfijF20oY8XQhxdvVIklEIskUhUeAt4AP+LI7zonJbNzdl4bXFcBfgcOA5WbWASC8rgiLLyW6B6JMpzBNUqBXfh4PXDSImZ9v4MYXp7JTz0gSEZIrCpnufoO7P+nuo8qGqlYys1wzyysbB04GZgDjgOFhseHAq2F8HHBZaIV0OLBe1xNS64SD8rn11D68Pn0Z90+YF3ccEakDkrnQ/I9wPeBvJPSj4O5V9eSSD/zVzMr2M8bd3zCzj4EXzOxK4FPg/LD868BpRHdLFwPfqs4Hkb0z4ugDmbt8E/dPmEfP/Kacfsj+cUcSkRhZeKTRnhcwW1jBZHf3A1MTKXkFBQVeWFid1rFSkW0lpVz82IdMX7qeF799BId0ahF3JBFJITObmHCbwC6S6U+hWwVD7AVBak5WRjp/unQIbZpGLZKWb9gadyQRiUkyN69dVtFQG+Gk9rRpmsXjwwvYuLWEkU8XsnVHadyRRCQGyVxoPjRhOBq4DTgjhZkkJgd1aMbvLxjItKXruemlaVR1alFEGp4qLzS7+/WJ782sBfBcqgJJvE7u156bTunN3W/MoVe7plx/Qs+4I4lILUqm9dHuNgPdajqI1B3XHNOdecs38bvxc+nRrimnHtwh7kgiUkuS6Y7zb3x513Ea0Bd4IZWhJF5mxm/OOZhFqzdzwwtT6dwqh/4dm8cdS0RqQTJNUo9JeFsCfOruS1KaKklqkppaKzZu5awH38OBV687knZ52XFHEpEasK9NUhP7VXivrhQESb12edk8elkB64p3MPLpiWzZrhZJIg1dMk1Szwkd4qw3sw1mttHMNtRGOIlf/47Nue+CAUxbso6rnv5YTVVFGrhkmqTeDZzh7s3dvZm757l7s1QHk7pjWP8O3HPuAN6fv5qRoyeqMIg0YMkUheXuPjvlSaRO+8aQTtx1ziG8O3cl33l2EttKVBhEGqJkmqQWmtnzwCvs+kC8l1MVSuqm8w/tzI6dO/nxX2dw3ZjJPHTxYDLTk/ldISL1RTJFoRnRU0tPTpjmgIpCI3Tx0C6U7nR+9upMvjt2Mg9cNEiFQaQBSeaOZj3CWnZx2RFd2VHq/Oq1WdzwwlTuO38AGSoMIg3C3tzRLMKVR3WjpHQnv/nHJ2SkGb89bwDpaRZ3LBHZRyoKsteuPqY7JTude96cQ3qacfc3DiFNhUGkXlNRkH1y7XE92F6yk/snzCMz3bjjrINVGETqsWRuXvuemTULfSc/YWaTzOzkqtZLWD/dzCab2WvhfTcz+9DMiszseTNrEqZnhfdFYX7Xvf5UUqu+f2JPrj2uO2M/WszPxs3QI7dF6rFkrg5e4e4biFoftQQuBe6sxj6+ByTe53AXcJ+79wDWAleG6VcCa8P0+8JyUg+YGTee3Jurv3ogz3zwGb/42ywVBpF6KpmiUHYu4DRgtLvPTJhW+YpmnYCvAY+H9wYcD7wUFhkFnBXGzwzvCfNPCMtLPWBm3HJqH644shtPvb+IX78+W4VBpB5K5prCRDP7J1EfCreaWR6wM8nt/x74IZAX3rcG1rl7SXi/BOgYxjsCiwHcvcTM1oflVyVu0MxGAiMBDjjggCRjSG0wM356+kGU7tzJY/9ZSEZ6Gj88pTeq7SL1R6VFIfxS/xnQFljg7sVm1hqo8t4FMzsdWOHuE83s2BrICoC7Pwo8CtGjs2tqu1IzzIzbzujHjp3Ow+/MJzPNuOHk3nHHEpEkVVoU3N3N7HV3Pzhh2mpgdRLbPhI4w8xOA7KJ7oy+H2hhZhnhaKETsDQsvxToDCwxswygeZL7kTrGzLj9zP6UljoPvFVERnoa31W3niL1QjLXFCaZ2aHV3bC73+rundy9K3Ah8Ja7Xwy8DZwbFhsOvBrGx4X3hPlvuU5K11tpaVHvbecM7si94+fy0DtFcUcSkSQkc01hKHCJmS0i6p/ZiA4iDtnLfd4MPGdmtwOTgSfC9CeA0WZWBKwhKiRSj6WlGfecO4DSnc7db8whMy2NEV89MO5YIlKJZIrCKfu6E3d/B3gnjC8ADqtgma3Aefu6L6lb0tOM3503gJKdzh2vzyY9zbjiqG5xxxKRPUjmgXifmtlRQE93f9LM2gJNUx9NGoqM9DR+f8FASkudX742i8x049IjusYdS0QqkMwdzT8nOuVza5iUCTyTylDS8GSmp/HARYM48aB2/PTVmYz58LO4I4lIBZK50Hw2cAbR9QTc/XO+vO9AJGlNMtL448WDOa53W3701+m8ULg47kgisptkisL20ArIAcwsN7WRpCHLykjn4UuGcHTPNtz8l2m8PGlJ3JFEJEEyReEFM3uE6P6CEcC/CI+tENkb2ZnpPHZZAUcc2JobX5zKq1OWVr2SiNSKKouCu/+W6FlEfwF6Az9z9wdSHUwatuzMdB4fXkBB11bc8MJU/j7ti7gjiQjJXWi+y93Hu/tN7n6ju483Mz3BVPZZTpMMnrz8UAZ1bsH3npvMmzOXxR1JpNFL5vTRSRVMO7Wmg0jjlJuVwZPfOpSDOzXnujGTmDB7edyRRBq1PRYFM7vGzKYDvc1sWsKwEJhWexGlocvLzmTUFYdxUIdmXPPMJN6ZsyLuSCKNVmVHCmOArxM9k+jrCcMQd7+kFrJJI9IsO5PRVwylR7umjBw9kf/OW1X1SiJS4/ZYFNx9vbsvAn4CLHP3T4n6VLjEzFrUTjxpTJrnZPLsVUM5sE0uVz39Mf+br4fkitS2ZK4p/AUoNbMeRP0YdCY6ihCpcS1zm/DMVUPp3DKHK576mI8Wrok7kkijkkxR2Bn6PjgH+IO73wR0SG0saczaNM3i2RFD6dAim289+RETP1VhEKktyRSFHWZ2EXAZ8FqYlpm6SCLQLi+bsSMOp12zbIb/+WOmLF4XdySRRiGZovAt4AjgDndfaGbdgNGpjSUC+c2yGTNiKK1ym3DpEx8yfcn6uCOJNHjJ3NE8y92/6+5jw/uF7q6b16RWdGi+H2NGDKVZdiaXPPEhMz9XYRBJpWTuaF5oZgt2H2ojnAhAp5Y5PDfycHKbpHPJ4x/yybINcUcSabCSOX1UABwahqOBB0iiPwUzyzazj8xsqpnNNLNfhOndzOxDMysys+fNrEmYnhXeF4X5Xff6U0mD07lVDmNGHE6TjDQufuxD5i3fGHckkQYpmdNHqxOGpe7+e+BrSWx7G3C8uw8ABgLDzOxw4C7gPnfvAawFrgzLXwmsDdPvC8uJlOvaJpexIw4nLc246LEPmb9yU9yRRBqcZE4fDU4YCszs2yTXjae7e9n/tZlhcOB4oqeuAowCzgrjZ4b3hPknmJkl/UmkUTiwbVPGjhgKON987AMWrdocdySRBiWZ00e/Sxh+AwwBzk9m42aWbmZTgBXAeGA+sC7c9wCwBOgYxjsCiwHC/PVA6wq2OdLMCs2scOXKlcnEkAamR7s8nr3qcHaUOhc99gGfr9sSdySRBiOZ00fHJQwnufsId5+TzMbdvdTdBwKdgMOAPvsWF9z9UXcvcPeCtm3b7uvmpJ7q3T6PZ64cysatJYx4upDi7SVVryQiVdrjaSAzu6GyFd393mR34u7rzOxtovsdWphZRjga6ASUdbu1lOgRGkvMLANoDujhN7JHffdvxgMXDeTKUYXc+OJUHrxoMGlpOuMosi8qO1LIq2KolJm1LXtwnpntR9Qvw2zgbeDcsNhw4NUwPi68J8x/K/QNLbJHx/fJ50enHsTr05fx+wnz4o4jUu/t8UjB3X+xj9vuAIwys3Si4vOCu79mZrOA58zsdmAy8ERY/glgtJkVAWuAC/dx/9JIXHV0N+Yu38gDE+bRs11Tvj5g/7gjidRbVbYiMrNRwPfcfV143xL4nbtfUdl67j4NGFTB9AVE1xd2n74VOC+52CJfMjNuP7s/i1Zv5sYXp9KldQ6HdGoRdyyReimZ1keHlBUEAHdfSwV/7EXilJWRzsOXDKFN0yxGPF3IsvVb444kUi8lUxTSwtEBAGbWiiSOMERqW5umWTw+vICNW0sYObqQrTtK444kUu8ke5/C/8zsV2b2K+B94O7UxhLZOwd1aMb9Fw5i+tL13PTSNNRWQaR6krlP4WmiDnaWh+Ecd9ejs6XOOqlvPj88pQ9/m/o5f3irKO44IvVKUqeB3H0WMCvFWURqzLePOZB5yzdy7/i59GzXlFMPVmeBIslI5vSRSL1jZvz6nIMZdEALbnhhKjOWqh8GkWSoKEiDlZ2ZziOXDqFlTiYjni5kxQa1SBKpioqCNGjt8rJ5bHgB64p3MHL0RLVIEqmCioI0eP32b859FwxgyuJ13PIXtUgSqYyKgjQKw/p34MaTe/HKlM956J35cccRqbN0E5o0Gtce14O5yzdxz5tz6NmuKSf3ax93JJE6R0cK0miYGXefewgDOjXn+89PYdbnG+KOJFLnqChIo5Kdmc5jlxXQLDtqkbRy47a4I4nUKSoK0ui0a5bNY5cVsHrzNr79zES2lahFkkgZFQVplA7u1JzfnTeQiZ+u5Ucvz1CLJJFARUEara8d0oHvn9iTv0xawqPvLog7jkidkLKiYGadzextM5tlZjPN7HtheiszG29m88JryzDdzOwBMysys2lmNjhV2UTKfPf4nnzt4A7c+cYnTJi9PO44IrFL5ZFCCfD/3L0vcDhwrZn1BW4BJrh7T2BCeA9wKtAzDCOBh1OYTQSAtDTjt+cNoP/+zfnu2MnMWbYx7kgisUpZUXD3L9x9UhjfCMwGOgJnAqPCYqOAs8L4mcDTHvkAaGFmerSlpNx+TaIWSblZGVw56mNWb1KLJGm8auWagpl1JerC80Mg392/CLOWAflhvCOwOGG1JWGaSMq1b57No5cVsHLjNq55ZhLbS3bGHUkkFikvCmbWFPgL8H133+VuIY+afFSr2YeZjTSzQjMrXLlyZQ0mlcZuYOcW3H3uIXy0aA0/fUUtkqRxSmlRMLNMooLwrLu/HCYvLzstFF5XhOlLgc4Jq3cK03bh7o+6e4G7F7Rt2zZ14aVROnNgR64/vgfPFy7mif8ujDuOSK1LZesjA54AZrv7vQmzxgHDw/hw4NWE6ZeFVkiHA+sTTjOJ1JofnNiLU/rl8+vXZ/P2nBVVryDSgKTySOFI4FLgeDObEobTgDuBk8xsHnBieA/wOrAAKAIeA76Twmwie5SWZtx3wUD6tG/Gd8dMpmiFWiRJ42H1+bxpQUGBFxYWxh1DGqil67Zw5oPvkZuVzivfOZKWuU3ijiRSI8xsorsXVDRPdzSL7EHHFvvxyKVD+GLdVq55diI7StUiSRo+FQWRSgzp0pK7zj2YDxas4efjZqpFkjR46mRHpApnD+rE3OWbePid+fRq15TLj+wWdySRlNGRgkgSbjq5Nyf1zeeXr83i3bm6P0YaLhUFkSSUtUjqlZ/HtWMmMX/lprgjiaSEioJIkppmZfD48AKapKdx1ahC1hVvjzuSSI1TURCphk4tc3jk0iEsWVvMtWMmqUWSNDgqCiLVVNC1Fb8++2DeK1rNr16bFXcckRql1kcie+G8gs7MW7GJR99dQM/8PC49vEvckURqhI4URPbSzcP6cHyfdtw2bibvF62KO45IjVBRENlL6WnG/RcOpHvbXK55dhILV22OO5LIPlNRENkHedmZPH7ZoaQZXDnqY9Zv2RF3JJF9oqIgso8OaJ3Dny4Zwmeri7l+7GRK1CJJ6jEVBZEaMPTA1tx+Vn/enbuSO16fHXcckb2m1kciNeTCww5g7vJN/Pm9hfTKz+Oiww6IO5JItakoiNSgH53Wh/krN/GTV2YwYfYKhvVvz4kHtaNFjvpikPpBRUGkBmWkp/GHbw7i/n/N4x/Tv+Bfs5eTnmYcfmArhvVrz8n92pPfLDvumCJ7lLKe18zsz8DpwAp37x+mtQKeB7oCi4Dz3X1t6M/5fuA0oBi43N0nVbUP9bwmdZm7M33pet6YsYw3ZixjQWiyOviAFgzr355T+rWnS+vcmFNKY1RZz2upLApfBTYBTycUhbuBNe5+p5ndArR095tD383XExWFocD97j60qn2oKEh94e4UrdjEGzOW8easZcxYugGAPu3zGNa/PcP6t6d3fh7R7yOR1IqlKIQddwVeSygKc4Bj3f0LM+sAvOPuvc3skTA+dvflKtu+ioLUV4vXFPPmzGW8OXMZhZ+uxR26ts7hlHAEMbBTC9LSVCAkNSorCrV9TSE/4Q/9MiA/jHcEFicstyRM+z9FwcxGAiMBDjhArTukfurcKoerjj6Qq44+kBUbtzJ+1nLenLmcJ/6zkEf+vYD8Zlmc0q89w/q157BurchIV+txqR2xXWh2dzezah+muPujwKMQHSnUeDCRWtYuL5uLh3bh4qFdWF+8g7fmLOeNGct4oXAxT//vU1rkZHLSQfmc0q89R/VsQ3ZmetyRpQGr7aKw3Mw6JJw+WhGmLwU6JyzXKUwTaVSa52Ry9qBOnD2oE1u2l/LvuSt5c+Yy3pi5jBcnLiG3STrH9mnHsH7tOa5PO5pmqQGh1Kza/hc1DhgO3BleX02Yfp2ZPUd0oXl9VdcTRBq6/Zqkl1+E3l6yk/8tWM0bM5YxftYy/j7tC5qkp3FUzzYM69eeE/vm0ypX90LIvktl66OxwLFAG2A58HPgFeAF4ADgU6ImqWtCk9QHgWFETVK/5e5VXkHWhWZpjEp3OpM+W1ve1HXpui2kGQzt1pph/dtzcr98OjTfL+6YUofF1voo1VQUpLFzd2Z+viFq6jpzGfNWbAJgQOcWDOsXHWV0a6N7IWRXKgoijUTRik3lTV2nLVkPQO/8vNDUNZ++HZrpXghRURBpjJau28I/Z0anmD5etIadDh1b7EfHlvvRMieTljlNaJnbhJY5mbTIaRK9D+OtcpvQfL9M0nWvRIOkoiDSyK3atI1/zVrOf+atYuXGbawt3s7a4h2sK95Oyc6K/waYQbPszISiUXUhaZGTqSaz9YCKgohUyN3ZtK2EdcU7WLN5O2uLt7OueMcuRaPsdc3mL+cVby/d4zb3y0z/smjkhoKRUDzKprUM01vkZpKXlaHTWrWoLt3RLCJ1iJmRl51JXnYmnVvlJL3etpLSL4vH5rIiEorG5sSCsp0v1m1gTfF21m/ZwZ5+g2akGXnZGTTbL5Nm2ZnReHYmzfbLIC87c7fxaLkvl8mkaVaGTnXVEBUFEam2rIx08pulV+sx4KU7nQ1b/u9RyNpwhLJxawkbtu6IXrfsYMGqTWzYUsLGrTvYXMmRSZm8rN2LxZ6LSEXjTTL0KBFQURCRWpKeZtH1iL24ya6kdCcbt5aUF44NW3awIWH8y+llhWUHn6/byoatG8N6O9jDpZNy2ZlpuxWRaLxpVgY5TTLIzUonp0kGTcNrbsJrblYGuU0yyGkSjWdlpNXb02EqCiJS52Wkp+11QQHYudPZvL1kl+KxceuO3cZLdikw64u3s3hNMZu3lVC8vZTN20v2ePprd+lpFhWIJhnkZKXvUjCiApJeZwuNioKINHhpaV9eO9mfvbvb293ZsqOUzdtKKd5e8uXr9lI2byvZpXgUbytl07aS8vnF26LX5Ru2RsvUQKH5/om9OGPA/nv1WSqjoiAikgQzI6dJdCoJsmpkm/tSaFrmZNZIht2pKIiIxCQVhWZf6XK7iIiUU1EQEZFyKgoiIlJORUFERMqpKIiISDkVBRERKaeiICIi5VQURESkXL3uT8HMVgKf7uXqbYBVNRinpihX9ShX9dXVbMpVPfuSq4u7t61oRr0uCvvCzAr31MlEnJSrepSr+upqNuWqnlTl0ukjEREpp6IgIiLlGnNReDTuAHugXNWjXNVXV7MpV/WkJFejvaYgIiL/V2M+UhARkd2oKIiISLkGWRTMbJiZzTGzIjO7pYL5l5vZSjObEoarEuYNN7N5YRheh3KVJkwfV5u5wjLnm9ksM5tpZmMSpsf2fVWRK7bvy8zuS9j3XDNblzAvzn9fleWK8/s6wMzeNrPJZjbNzE5LmHdrWG+OmZ1SF3KZWVcz25Lwff2plnN1MbMJIdM7ZtYpYd6+//ty9wY1AOnAfOBAoAkwFei72zKXAw9WsG4rYEF4bRnGW8adK8zbFOP31ROYXPZdAO3qyPdVYa64v6/dlr8e+HNd+L72lCvu74voguk1YbwvsChhfCpRl2TdwnbS60CursCMGL+vF4HhYfx4YHRN/vtqiEcKhwFF7r7A3bcDzwFnJrnuKcB4d1/j7muB8cCwOpArlZLJNQL4Y/hOcPcVYXrc39eecqVSdf87XgSMDeNxf197ypVKyeRyoFkYbw58HsbPBJ5z923uvhAoCtuLO1cqJZOrL/BWGH87YX6N/PtqiEWhI7A44f2SMG133wiHXy+ZWedqrlvbuQCyzazQzD4ws7NqKFOyuXoBvczsvbD/YdVYN45cEO/3BUSH+US/cMv+B477+9pTLoj3+7oNuMTMlgCvEx3FJLtuHLkAuoXTSv82s6NrKFOyuaYC54Txs4E8M2ud5LpVaohFIRl/A7q6+yFE1XRUzHnKVJari0e3tH8T+L2Zda/FXBlEp2qOJfqF+ZiZtajF/e9JZbni/L7KXAi85O6lMey7MhXlivP7ugh4yt07AacBo82sLvxt2lOuL4AD3H0QcAMwxsyaVbKdmnYjcIyZTQaOAZYCNfZvrC588TVtKZD4C7tTmFbO3Ve7+7bw9nFgSLLrxpQLd18aXhcA7wCDaisX0S+Oce6+IxzGzyX6Yxzr91VJrri/rzIXsuspmri/rz3livv7uhJ4Iez/f0A20cPe4v6+KswVTmetDtMnEl0D6FVbudz9c3c/JxSlH4dp65L8TFVLxcWSOAeiX48LiA6Pyy7U9NttmQ4J42cDH/iXF2oWEl2kaRnGW9WBXC2BrDDeBphHJRcRU5BrGDAqYf+LgdZ14PvaU65Yv6+wXB9gEeEG0brw76uSXHH/+/oHcHkYP4jo3L0B/dj1QvMCau5C877kaluWg+iC8NJa/nffBkgL43cAv6zJf1/7/CHq4kB0qDeXqIL/OEz7JXBGGP8NMDN84W8DfRLWvYLoglYR8K26kAv4CjA9TJ8OXFnLuQy4F5gV9n9hHfm+KswV9/cV3t8G3FnBurF9X3vKFff3RXTh9L2w/ynAyQnr/jisNwc4tS7kAr4R/j+dAkwCvl7Luc4lKtxzic4oZNXkvy895kJERMo1xGsKIiKyl1QURESknIqCiIiUU1EQEZFyKgoiIlJORUHqFTO7zcxuDOO/NLMTK1n2LDPrW3vpdtn3sWb2lYT3T5nZuSnYz6ZqLl/+/e02vauZzai5ZFJfqShIveXuP3P3f1WyyFlEbc3jcCxR+/+kmVlGaqKIJE9FQeo8M/txeP7/f4HeCdPLf32b2Z0W9aswzcx+G36lnwHcE555393MRpjZx2Y21cz+YmY5Cdt5wMzeN7MFib/ozexmM5se1rkzTOtuZm+Y2UQz+4+Z9dktb1fg28APwr7LHpj21d33EY4o/mNRHwazzCzdzO4JOaeZ2dVhuQ5m9m7Y3ozEh7CZ2R0h3wdmll+WwczeCtuYYGYHVPC9DgnrTQWu3cf/TNJQ1OSdeBo01PRA9Pyn6UAO0WOMi4Abw7yniO7ubE10x2vZzZgtEucnbKt1wvjtwPUJy71I9COpL9GjiwFOBd4HcsL7VuF1AtAzjA8F3qog921lOavYx7HAZqBbeD8S+EkYzwIKiR558P/48u7WdCAvjDvhjlrg7oR1/8aXz9y/Anhl91zANOCrYfweUtRHgIb6NehwVeq6o4G/unsxgFXcK9h6YCvwhJm9Bry2h231N7PbgRZAU+DNhHmvuPtOol/r+WHaicCTZft29zVm1pTotNCLZla2blaSn6WifQB85NED/QBOBg5JOFppTvSQv4+BP5tZZtjOlDB/e8LnnQicFMaP4MvHK48mKhjlwtNkW7j7uwnLnJrk55AGTEVB6j13LzGzw4ATiI4criPqkWp3TwFnuftUM7uc6Fd6mW0J48aepQHr3H3gXkTd0z427zb9endPLFjRDLOvAl8DnjKze939aWCHu5c9q6YU/T8t+0jXFKSuexc4y8z2M7M84Ou7LxB+vTd399eBHwADwqyNQF7ConnAF+HX9sVJ7Hs88K2Eaw+t3H0DsNDMzgvTzMwGVLDu7vtO1pvANSEjZtbLzHIt6hhnubs/RvQQtMFVbOd9okdkQ/RZ/5M406NHLa8zs6MSlhFRUZC6zd0nAc8TPanyH0SnUXaXB7xmZtOA/xJ1fAJRV4Y3WdRDVnfgp8CHRE++/CSJfb8BjAMKzWwKUecmEP0BvTJcoJ1Jxd1e/g04e7cLzcl4nOipr5NCE9FHiH79HwtMDR2rXADcX8V2ricqaNOAS4HvVbDMt4A/hs9W2dGRNCJ6SqqIiJTTkYKIiJRTURARkXIqCiIiUk5FQUREyqkoiIhIORUFEREpp6IgIiLl/j/eNiyGmysiyAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot([0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90], labels_count)\n",
    "plt.title(\"clusters number\")\n",
    "plt.xlabel(\"distance threshold\")\n",
    "plt.ylabel(\"clusters number\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now, we train a model with an optimum d(=0.65) value**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = AgglomerativeClustering(n_clusters=None, affinity=\"euclidean\", linkage=\"average\", distance_threshold=0.65)\n",
    "clustering = model.fit(data)\n",
    "labels = clustering.labels_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "295\n",
      "[ 28  28  28 165  25   1  35 144  28  14]\n"
     ]
    }
   ],
   "source": [
    "print(len(set(labels)))\n",
    "print(labels[:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.03891117895152641"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "metrics.silhouette_score(data, labels, metric='euclidean')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**add cluster labels to data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "data1 = data.copy()\n",
    "data1['cluster'] = labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aapp</th>\n",
       "      <th>acab</th>\n",
       "      <th>acty</th>\n",
       "      <th>aggp</th>\n",
       "      <th>amas</th>\n",
       "      <th>amph</th>\n",
       "      <th>anab</th>\n",
       "      <th>anim</th>\n",
       "      <th>anst</th>\n",
       "      <th>antb</th>\n",
       "      <th>...</th>\n",
       "      <th>socb</th>\n",
       "      <th>sosy</th>\n",
       "      <th>spco</th>\n",
       "      <th>tisu</th>\n",
       "      <th>tmco</th>\n",
       "      <th>topp</th>\n",
       "      <th>virs</th>\n",
       "      <th>vita</th>\n",
       "      <th>vtbt</th>\n",
       "      <th>cluster</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.100000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.333333</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 128 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       aapp  acab  acty  aggp  amas  amph  anab  anim  anst  antb  ...  socb  \\\n",
       "1  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "2  0.100000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "3  0.333333   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "4  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "5  0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  ...   0.0   \n",
       "\n",
       "   sosy  spco  tisu  tmco  topp  virs  vita  vtbt  cluster  \n",
       "1   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       28  \n",
       "2   0.0   0.0   0.0   0.0  0.05   0.0   0.0     0       28  \n",
       "3   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       28  \n",
       "4   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0      165  \n",
       "5   0.0   0.0   0.0   0.0  0.00   0.0   0.0     0       25  \n",
       "\n",
       "[5 rows x 128 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**show the label distribution**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZQAAAEKCAYAAAA1qaOTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbnUlEQVR4nO3debRdZX3/8fc3BERQZEqpAmtBlepPu/pTGwHBOhBERAVqAScgBTTWAQec0Ko4i4oiiKLBAGEQmVRSx/IDh9YKGgaRoS0pFQkFE0aVkPn5/bGfnfvck3PvPUmec8+9yfu11ll3j8/+7jPsz97PGW6klJAkaUNNGXQBkqSNg4EiSarCQJEkVWGgSJKqMFAkSVUYKJKkKvoWKBFxdkQsioibi2nbR8SVEXF7/rtdnh4RcXpELIiImyLi2cU6M/Pyt0fEzH7VK0naMP28QjkXOLBj2onAVSmlPYCr8jjAS4E98m0WcCY0AQScBOwF7Amc1IaQJGli6VugpJR+BjzQMfkQYG4engscWkw/LzWuAbaNiCcCLwGuTCk9kFJ6ELiStUNKkjQBTB3n7e2UUronD98L7JSHdwbuKpZbmKeNNH0tETGL5uqGrbfe+m+e9rSnVSx7uAeW3M/2W+0w4vz7ltzHjlvt2LftS1I/XHfddfellKat7/rjHShrpJRSRFT73ZeU0mxgNsD06dPT/PnzazW9lgtuOJ8jn3XUiPPPmX8Ox0w/pm/bl6R+iIg7N2T98f6U1+9zVxb576I8/W5g12K5XfK0kaZLkiaY8Q6UeUD7Sa2ZwBXF9KPzp732Bh7OXWM/Ag6IiO3ym/EH5GmSpAmmb11eEXER8EJgx4hYSPNprZOBSyLiOOBO4Ii8+PeBg4AFwBLgGICU0gMR8XHgV3m5j6WUOt/olyRNAH0LlJTSa0aYNaPLsgl4ywjtnA2cXbE0SVIf+E15SVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUxUACJSLeGRG3RMTNEXFRRGwZEbtHxLURsSAiLo6ILfKyj8njC/L83QZRsyRpdOMeKBGxM/A2YHpK6a+AzYBXA58BTk0pPQV4EDgur3Ic8GCefmpeTpI0wQyqy2sq8NiImApsBdwD7AdclufPBQ7Nw4fkcfL8GRER41eqJKkX4x4oKaW7gVOA39EEycPAdcBDKaWVebGFwM55eGfgrrzuyrz8Dp3tRsSsiJgfEfMXL17c352QJK1lEF1e29FcdewOPAnYGjhwQ9tNKc1OKU1PKU2fNm3ahjYnSVpHg+jy2h/4n5TS4pTSCuBbwL7AtrkLDGAX4O48fDewK0Ce/wTg/vEtWZI0lkEEyu+AvSNiq/xeyAzgVuDHwGF5mZnAFXl4Xh4nz786pZTGsV5JUg8G8R7KtTRvrl8P/CbXMBt4H3BCRCygeY9kTl5lDrBDnn4CcOJ41yxJGtvUsRepL6V0EnBSx+Q7gD27LLsUOHw86pIkrT+/KS9JqsJAkSRVYaBIkqowUCRJVRgokqQqDBRJUhUGiiSpik0qUL5z62VjLyRJWi+bVKBIkvrHQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVTGQQImIbSPisoj4j4i4LSKeGxHbR8SVEXF7/rtdXjYi4vSIWBARN0XEswdRsyRpdIO6QjkN+GFK6WnA/wVuA04Erkop7QFclccBXgrskW+zgDPHv1xJ0ljGPVAi4gnA84E5ACml5Smlh4BDgLl5sbnAoXn4EOC81LgG2DYinjiuRUuSxjSIK5TdgcXAORFxQ0R8PSK2BnZKKd2Tl7kX2CkP7wzcVay/ME8bJiJmRcT8iJi/ePHiPpYvSepmEIEyFXg2cGZK6VnAIwx1bwGQUkpAWpdGU0qzU0rTU0rTp02bVq1YSVJvBhEoC4GFKaVr8/hlNAHz+7YrK/9dlOffDexarL9LniZJmkDGPVBSSvcCd0XEU/OkGcCtwDxgZp42E7giD88Djs6f9tobeLjoGpuUZl9z1qBLkKTqpg5ou8cDF0bEFsAdwDE04XZJRBwH3AkckZf9PnAQsABYkpeVJE0wAwmUlNKNwPQus2Z0WTYBb+l3TZKkDeM35SVJVRgokqQqDBRJUhUGiiSpCgNFklRFT4ESEVf1Mk2StOka9WPDEbElsBWwY/45+ciztqHL72lJkjZdY30P5Y3AO4AnAdcxFCh/AM7oX1mSpMlm1EBJKZ0GnBYRx6eUvjRONUmSJqGevimfUvpSROwD7Fauk1I6r091SZImmZ4CJSLOB54M3AisypMTYKBIkoDef8trOvD0/LtakiStpdfvodwM/Hk/C5EkTW69XqHsCNwaEb8ElrUTU0oH96UqSdKk02ugfKSfRUiSJr9eP+X1034XIkma3Hr9lNcfaT7VBbAFsDnwSEppm34VJkmaXHq9Qnl8OxwRARwC7N2voiRJk886/9pwanwHeEn9ciRJk1WvXV6vLEan0HwvZWlfKpIkTUq9fsrrFcXwSuC3NN1ekiQBvb+Hcky/C9kUfe2as3jj3m8YdBmSVEWv/2Brl4j4dkQsyrfLI2KXfhcnSZo8en1T/hxgHs3/RXkS8M95miRJQO+BMi2ldE5KaWW+nQtM62NdkqRJptdAuT8ijoyIzfLtSOD+fhYmSZpceg2UY4EjgHuBe4DDgH/oU02SpEmo148NfwyYmVJ6ECAitgdOoQkaSZJ6vkL56zZMAFJKDwDP6k9JkqTJqNdAmRIR27Uj+Qql16sbSdImoNdQ+Dzwi4i4NI8fDnyyPyVJkiajXr8pf15EzAf2y5NemVK6tX9lSZImm567rXKAGCKSpK7W+efrJUnqxkCRJFUxsEDJ37i/ISK+m8d3j4hrI2JBRFwcEVvk6Y/J4wvy/N0GVbMkaWSDvEJ5O3BbMf4Z4NSU0lOAB4Hj8vTjgAfz9FPzcpKkCWYggZJ/+v5lwNfzeNB8guyyvMhc4NA8fEgeJ8+fkZeXJE0gg7pC+SLwXmB1Ht8BeCiltDKPLwR2zsM7A3cB5PkP5+WHiYhZETE/IuYvXry4j6VLkroZ90CJiJcDi1JK19VsN6U0O6U0PaU0fdo0f1lfksbbIH4+ZV/g4Ig4CNgS2AY4Ddg2Iqbmq5BdgLvz8ncDuwILI2Iq8AT86XxJmnDG/QolpfT+lNIuKaXdgFcDV6eUXgf8mOZn8QFmAlfk4Xl5nDz/6pRSGseSJUk9mEjfQ3kfcEJELKB5j2ROnj4H2CFPPwE4cUD1SZJGMdBfDE4p/QT4SR6+A9izyzJLaX6MUpI0gU2kKxRJ0iRmoEiSqjBQJElVGCiSpCoMFElSFQaKJKkKA0WSVIWBIk1yM05+1aBLkAADRZJUiYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAqmXvd3EGXIEkDZaBIkqowUCRJVRgokqQqDBRJUhUGiiSpinEPlIjYNSJ+HBG3RsQtEfH2PH37iLgyIm7Pf7fL0yMiTo+IBRFxU0Q8e7xrliSNbRBXKCuBd6WUng7sDbwlIp4OnAhclVLaA7gqjwO8FNgj32YBZ45/yZKksYx7oKSU7kkpXZ+H/wjcBuwMHAK0X+aYCxyahw8BzkuNa4BtI+KJ/ajt0t98sx/NStImYaDvoUTEbsCzgGuBnVJK9+RZ9wI75eGdgbuK1RbmaZ1tzYqI+RExf/Hixf0rWpLU1cACJSIeB1wOvCOl9IdyXkopAWld2kspzU4pTU8pTZ82bVrFSjUZHHPhOwddgrTJG0igRMTmNGFyYUrpW3ny79uurPx3UZ5+N7BrsfoueVpVl/3m4tpNSgC85CtHDboEaVwM4lNeAcwBbkspfaGYNQ+YmYdnAlcU04/On/baG3i46BqTJE0QUwewzX2Bo4DfRMSNedoHgJOBSyLiOOBO4Ig87/vAQcACYAlwzLhWK0nqybgHSkrp34AYYfaMLssn4C21tv/tWy7l755xeK3mJEmZ35SXJFVhoEiSqjBQJElVGCiSpCoMlE3Yx//ls4MuQdJGZKMPlHm3fWvshXpw0Y0XVmlHkjZWG32gSJLGh4EiSarCQJEkVWGgSJKqMFAkSVUYKJKkKgwUSVIVBkoXl9zk/5aXpHVloKi698z7yKBLkDQABookqQoDRZJUhYEiSarCQJEkVWGgSJKqMFAkSVUYKOPk67+cM+gSJKmvDJRJ7As//dKgS5CkNQyUjcgpPzlt0CVI2oQZKJKkKgwUSQPx/ONfPugSVJmBIkmqYpMPlMtvvmTQJVT32R9/cdAlSNoEbfKB0ouLfv2NcdnOV/79a+OyHUnqBwNFklTFRhkoDy99aNAlVPGln5856BLG9L5//tigSxjVUee/bdAljJsZnz5i0CWMu+e9/sBBl6DCRhkoqudDP/jUoEuYVF721X8YdAnSwBgoHfz3v3W8+4qTBl2CpHG2SQTKFbddzhW3Xl61zQtuOL9qe7364s/O2KD1P/n/ThlzmZN+eDIf/uGn+XDFq5N3fudD1dqaSF7xtWMHXUJfvOiDrxx0CT1b126vfY96cZ8q0SYRKDVdeMMFPS13zvxzRp3/tWvOqlHOuPun731i0CVU8dq5bxl0CdJGZ9IESkQcGBH/GRELIuLEfm/v4psuAuCb6/CR4bnXzR11/lk9/OLwl//9q5zx86/2vM1On8vfQfnM1af2vM7H/uWz6729Tc2hZ72+6/SXf/WYca5k3e33scMGXcKks88R+w26hEllUgRKRGwGfBl4KfB04DUR8fTBVtXdufPPXTN89q+aq5Q5vzp7zbTZ136963pn/mL2qO2e/m9f4bR//fI61XLy1V/g5Ku/sE7r9MMJ3/nwsPF3fOuDVdo97qITOPYbJ6z3+q85t7lKedU5bwLgiHPexOFn/+Oa+YfNeeOGFVjB/l94TU/LzTj5VX2uZOOz7+v2X6/19jnsRZUrGdte++3T03J7/u3efa5kdJMiUIA9gQUppTtSSsuBbwKHrG9j377l0mHjl/X4bflv3Hhh1+nnXX/e+pYyotE+Mnzqz85Y89P1n//p6T23+amrPg/AJ688hU9c+TkAPl5cnXz0R5/hIz86ea31Oj/p9YHvfYL3f/fjnPjd5iPD5UeH3zvvI2uG33XFh9cKk9LbLv8Ax1/+Ad562ft562XvXzP9zZc2F6BvuuS9ALzx4vfwxovfw6xvvnvNMq+/6F1rho/9xjsBOObCd66ZNvOCt3P0BW8H4Mjzj+d15x0PwOvmvnVYDa8+981AEybd/P2cWbzy62/g7856w1pXJwfPPg6AV3ytuTppP+F10JkzAXjpmTM58CtHA3DAl4/igDOO5MVnHDmsjRef/rrm72mvZf8vvrZrDa39P/fqNcMzPts9QGZ86ghmfLK5lfb7+OFDwx89jP0+OvLVSo33T17wroN7Wq7273k979iXDB+fecCY6+z72rWDZZ9XzRh1nX3+/kU895UvHHH+cw9+AQB7v+L5zd+XPX/MOvZ6yfPY64B9m+EX57/77ztsmT2LYNnzhc/t2s5znrfX0PC+zfBznrvnmNuvIVJK47KhDRERhwEHppRen8ePAvZKKb21WGYWMCuPPhW4v0tT9wE7jrKpseZPlDYmS5012rDOum1MljprtGGd697GU1NKjx+jnRFNXd8VJ5qU0mxgTb9RRMzvssz0btN7nT9R2pgsddZowzo3zTprtGGd9dsYy2Tp8rob2LUY3yVPkyRNEJMlUH4F7BERu0fEFsCrgXkDrkmSVJgUXV4ppZUR8VbgR8BmwNkppVvGWG2kj02N/nGqsedPlDYmS5012rDOum1MljprtGGd9dsY0aR4U16SNPFNli4vSdIEZ6BIkqqYFO+hrIuIOBC4GNgaWAWsBoJmXxPd97ldJsapzG7SKNtfjeEvafw8QnPM+V9gObA78KGU0qi/LrtRHaSKn2iZBbyC5iD9t8A0moN1Au6h+QLP4rzaEmAZ8ADwME0ILQH+BKykOZgvA27O448W028qpv0pr5vy7RGaB+K+3N7yvL1FwH/k9VbmacuAFXl4RR5v27k3b6u1FPjvYr1lDAXnEuCPeV/aGn+f11ma61yWp19PE7yP5G2uAp6Z60x5fHleNgGXF/W2Vud2ycu0da5ieM2puLV1pWLZ1LH8o8XyqzvWL5dfVbRDUXvZXjt/RW63tSqPr+qY1q5zfcf0tu5ym+3wio46ynY6re7Y13tz2+U+leveVCybGLpvyjbadZYz/L5vH9dHi/XL2st9X073eju3UdZS/i2HV3SMl7V26vY4lo9357rtPrZWAg91qScV80vdnm/ttJWMfB+M9FiWdSwq2u58/i1luPI5XrYBzf1XvkbK50e5P6s6/kL3+6xTuc12W//L0LGkPQ7uCzwWuIPm+DlmXmxUgcLQT7RcDNxG80R7MfCEPH8KsJDmTpzK0Itvc+DSPH1FHn+Y5hNl5HYW09ypm9E8IH8AtqU5gE+h+V7MFIYOPlNyO9vn9dtAeVxuY0rRVnnV1IbJqry9RUUdrXb5zRgKlQAek7ezXW5/ad73xwBbMvSECZpvzO6dp7ehuUdeP/K0+2kCJwHPo/uTf4v8t3yyl8+rziCYmu+7NqyiqKncv/aKMRXDUawzpZjXWlEMP1LMb2tYUsxflu+fcn55UF3UsQ/lVWwbIG3d7bxedL7m/lCsv7KjhgQ8qWP/2u202y3rfqijjmU0z5HNWfu+ap+r5bKtzgMdDH8Oruyog47h1DH+B0Y2heH71bbfWsrQfq6mORCXj/ODRfurGXpetc/H8uC6muFh9yjDD/7tc38s3Q7Yy4u2l9M8/8pQKe+/pQw/+WyPQ229C4vtJJr7qHw9wPDHp/OkqBxfzdqPZRTLbp7b3xG4Ji//OJrj2pbANgyd9Ixpo/qUV/kTLRGxG/CvNN9XOQp4PEMP3KM0dxQ0d9zWwK9pLuvag28p0VyBbFNMa4Nkc4YOqr3qfHG3Z7lbMPxJtnn+2xkotXTWsSrX0e7/cpogfSJrH+A3dLusZzujdf+VtbX34dRieDOG70PZVvtC3IL6XaA17rMa223vg1TMW5+61nV/NmT/2yBp615J87ooLWf4a7Cfr5m2pkF1j5evndpd4W1oTAX+B/glcDjN8fK7wPxNqstrBH9Bc0bwOZon2e0MHWSW0ITJFOBWhif5apqzn9U0Zxubs3a3THt23z7pH2YosNouMPL2lxXDnd0t5VVKO608Ayy70+5l6AAJzVXEUrp3EYzVhdFuo61lCsPDdCpNmLQH2LJrZ0mxH+22yrO/RHO/datlrAPZupzllMuW3QplmECzb+1ZYOdVRXsGObWYXp65LmX4dtr7flkxfXkxv/MMtjyzhOFdK7d27EvZTbQ+Z3ud3TBl25t1zC/renCMdstuqXU9mD469iJrvfZaZa8ADB2z2mUeYe0TunI/O69KynXb10u3brvRtPtfrvvbEZZd3zP2su2y3vJKvXy+rOxYdiTdrj5bUxh6DUyledvgHcBzaN46GPM9940tUDp/omVzYC+aJ9xraO7Iv2R4f+kihi6VVzD8zGabPP54mr7EKXm9+4EdOpZPNOHUdj21QZWALzH0QLUv7C2Ldacw9FhMZeiBbee13WQB/DlDB4IlwAUMvS/S2Q/c+cIvfzCzfFI9pmO/l+dbG7Ld2pvKUHcKNGHaHsDb+sozybJr6RyGui46X+yd2ym7N2Dtvvxy2TYM2yvKMjTK+7ccL7sQ2vuuc50tOuq/n6Gu0iiWoWO9keqcWow/rdh+eYBruzlGOyB1O3iU2968Y7my3Xa5to7tRtlO227n86mt9Y90fy+q9diO8W771Nkd19kd1GrDpa1/64522/c0y/3rPBC227qFtbsr1yUsy/tkt6KG8n2hsR5DWPuEpbPtzi7kdlq5TOfzujTayVx7Py8HvsdQV/ETad5XeRfwA+DPgPfnL5iPaGMLlDU/0ULzYtqR5tv1OwPPpnmi3Ulz4Gm7d7aieUAPZuhy+k80b3w/xNATur0tzfOn0BzQp9K8oP6bptvsNwwdzNoH6shcX+T278jL356nt9tYlafdwdADfT1wNfBfefyBYp2lNG+ctYGzhOZMqX2vpAyDRPPCLsPhT8U65VXPFJqD5k4Mvd/0QK65PPOjWKf9hdL2PZHlNCGzpNjmqtzuq2gOwJvlW/sBhc422/ba+jvPJNsPMLTK8NmK4e+zrKA5Cy/vj+gYXspQsN5L8wEOim2073Nsy/CD8jKG96G3y5dn9eXZ5l3FPrZv/rf3W3mQSDQfqhjprLPbAb48s23/lvdLO6384AcMv7rrtr37WDvc26u+9vnXtv8gwx/PzquPzoNs+4GJ8mqvPJiWj2N7ItL5PGyH2+d5ayXNiWZ5v7TPhyd31N32JLT7WfYgdNuXOxm63zo/8FGeELRBN9LjWJ6YtPtRLt/tjfjyNdXtMS91vj/1SLHekjz/YeCFDPW6rACekVLaDXhB3s/Pp5RG/R/kG9V7KAARcRDN/0vZiuH9rlNpHvTHMtiPB4/kPpqzxJH6fssDYD+tb7eGpI1HGe7/RdNbMeZ7KBtdoEiSBmNj6/KSJA2IgSJJqsJAkSRVYaBIkqowUCRJVRgo0gaKiI9ExLvXY71tI+LN/ahJGgQDRRqcbYF1CpRo+LrVhOQTU1pHEXF0RNwUEb+OiPM75v0kIqbn4R0j4rd5+BkR8cuIuDGvuwdwMvDkPO1zebn3RMSv8jIfzdN2i4j/jIjzaP6NQvnzQtKEsdH9gy2pnyLiGcAHgX1SSvdFxPbA23pY9R+B01JKF0ZE+7MzJwJ/lVJ6Zm77AJp/IbAnzS8VzIuI5wO/y9NnppSuqb1PUi0GirRu9gMuTSndB5BSeiCip1+p+QXwTxGxC/CtlNLtXdY7IN9uyOOPowmS3wF3Giaa6Ozykupq/7kaFP8KIKX0DZofIH0U+H5E7Ndl3QA+nVJ6Zr49JaU0J897pMvy0oRioEjr5mrg8IjYASB3eZV+C/xNHj6snRgRfwHckVI6HbgC+GuaX6l+fLHuj4BjI+JxeZ2dI+LP+rETUj/Y5SWtg5TSLRHxSeCnEbGKpnvqt8UipwCXRMQsmv8v0ToCOCoiVtD8NP6ncnfZzyPiZuAHKaX3RMT/AX6Ru8P+RPOvDzr/v4g0Iflrw5KkKuzykiRVYaBIkqowUCRJVRgokqQqDBRJUhUGiiSpCgNFklTF/wcj6rpn6ZALOgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x=\"cluster\", data=data1, palette=\"Greens_d\")\n",
    "plt.ylim(0,1000)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**check the cluster with raw sentences**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "clustering_results = []\n",
    "with open(\"./data/criteria_sentences_preprocessed_metamap_filter(19185).json\", \"r\", encoding=\"utf-8\") as f:\n",
    "    criteria_info = json.load(f)\n",
    "    for criteria in criteria_info[\"criteria\"]:\n",
    "        no = int(criteria[\"No.\"])\n",
    "        criteria_sentence = criteria[\"criteria_sentence\"]\n",
    "        cluster_ = data1.loc[no, \"cluster\"]\n",
    "        clustering_results.append([cluster_, no, criteria_sentence])\n",
    "clustering_results_sort = sorted(clustering_results, key=lambda x:x[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cluster: 22, raw ID: 63, raw sentence: (5) 正在参与其他药物临床试验的患者。\n",
      "Cluster: 22, raw ID: 297, raw sentence: 7、参与其他临床研究治疗\n",
      "Cluster: 22, raw ID: 905, raw sentence: ⑧正在参加其他临床试验的患者;\n",
      "Cluster: 22, raw ID: 1548, raw sentence: 2. 正在参加其他药物临床试验患者;\n",
      "Cluster: 22, raw ID: 2250, raw sentence: 7. 患者同时正参加其他临床试验研究。\n",
      "Cluster: 22, raw ID: 3005, raw sentence: (5)病人参与其他试验。\n",
      "Cluster: 22, raw ID: 3034, raw sentence: (8)正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 3092, raw sentence: (3)正在其它临床试验中\n",
      "Cluster: 22, raw ID: 3121, raw sentence: (8)正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 3679, raw sentence: 1. 正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 5195, raw sentence: 8.正参加其它临床试验者。\n",
      "Cluster: 22, raw ID: 5271, raw sentence: (5)正在参加其它临床实验的患者。\n",
      "Cluster: 22, raw ID: 6249, raw sentence: 8、正参加其它临床试验的患者;\n",
      "Cluster: 22, raw ID: 7731, raw sentence: H.伴随其他抗肿瘤治疗或正在参加其他临床试验;\n",
      "Cluster: 22, raw ID: 9178, raw sentence: 9)同时参加其它临床试验者。\n",
      "Cluster: 22, raw ID: 13307, raw sentence: b)患者同时参加了其他临床研究\n",
      "Cluster: 22, raw ID: 13527, raw sentence: 14)同时参与其他临床研究者\n",
      "Cluster: 22, raw ID: 13918, raw sentence: 2.正参加其他临床试验,或瘤体接受其他有创或无创治疗\n",
      "Cluster: 22, raw ID: 15063, raw sentence: (8)近期参加其他临床试验者。\n",
      "Cluster: 22, raw ID: 15272, raw sentence: 1)诊断需进行PCI;\n",
      "Cluster: 22, raw ID: 16973, raw sentence: k) 正在参加其它临床试验者\n",
      "Cluster: 22, raw ID: 17864, raw sentence: (4)正在其它临床试验中;\n",
      "Cluster: 22, raw ID: 18382, raw sentence: ⑤正在参加影响本研究结果评价的其他临床试验者。\n",
      "Cluster: 22, raw ID: 18807, raw sentence: 6、正参加其它临床试验的患者;\n"
     ]
    }
   ],
   "source": [
    "for i in clustering_results_sort:\n",
    "    if i[0] == 22:\n",
    "        print(\"Cluster: {}, raw ID: {}, raw sentence: {}\".format(i[0], i[1], i[2].strip()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"./data/hierarchical_cluster_results.csv\", \"w\", newline=\"\", encoding=\"utf-8\") as f:\n",
    "    csv_writer = csv.writer(f)\n",
    "    csv_writer.writerows(clustering_results_sort)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}