{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zUhHa46Z4lkT"
},
"source": [
"Instrucciones\n",
"* Leer archivo con los datos\n",
"* Preprocesado de los datos: eleimiar columnos que no nos interesen, limpiar * valores perdidos, cambiar etiqueta de las clases a 0, 1, 2, 3, .... en caso de * que sean strings, codificar o transformar columnas que sean texto, ....\n",
"* Separar entre X e Y\n",
"* Separar en entrenamiento y test (si no nos lo dan por defecto)\n",
"* Normalizar\n",
"* Entrenar los modelos que queramos de clasificación: Predicciones, evaluación * (alguna métrica de clasificación que hemos visto o varias de ellas)\n",
"* Comparar los resultados de todos los modeos y quedarnos con el mejor.\n",
"* Añadir validación cruzada a los hiperparámetros que considere oportuno!!!"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"id": "iNqljQXq4ax8"
},
"outputs": [],
"source": [
"# Para visualizar gráficas en notebooks\n",
"%matplotlib inline\n",
"\n",
"# Para acceder al archivo guardado en drive\n",
"from google.colab import drive\n",
"\n",
"# Librerías utilizadas\n",
"import random\n",
"import math\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.patches import Patch\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qZqHXxMn4-vc"
},
"source": [
"## Carga de datos\n",
"\n",
"\n",
"Datos obtenidos de https://www.kaggle.com/datasets/kukuroo3/body-signal-of-smoking\n",
"\n",
"La variable de salida es `smoking` que tiene dos valores en este dataset, según la documentación:\n",
"* 0 = no han fumado nunca\n",
"* 1 = fumaban anteriormente (pero ya no)\n",
"\n",
"Había una tercera categoría, fumadores activos, que se ha eliminado ya del dataset."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"id": "VxkNberJgSM9"
},
"outputs": [],
"source": [
"#import kaggle\n",
"#! kaggle datasets download gauravduttakiit/cassava-leaf-disease-classificatio"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 334
},
"id": "EN7gozHH49ej",
"outputId": "d3082baf-5bc1-43c3-c9fe-e5535304cb8a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ID gender age height(cm) weight(kg) waist(cm) eyesight(left) \\\n",
"0 0 F 40 155 60 81.3 1.2 \n",
"1 1 F 40 160 60 81.0 0.8 \n",
"2 2 M 55 170 60 80.0 0.8 \n",
"3 3 M 40 165 70 88.0 1.5 \n",
"4 4 F 40 155 60 86.0 1.0 \n",
"\n",
" eyesight(right) hearing(left) hearing(right) ... hemoglobin \\\n",
"0 1.0 1.0 1.0 ... 12.9 \n",
"1 0.6 1.0 1.0 ... 12.7 \n",
"2 0.8 1.0 1.0 ... 15.8 \n",
"3 1.5 1.0 1.0 ... 14.7 \n",
"4 1.0 1.0 1.0 ... 12.5 \n",
"\n",
" Urine protein serum creatinine AST ALT Gtp oral dental caries \\\n",
"0 1.0 0.7 18.0 19.0 27.0 Y 0 \n",
"1 1.0 0.6 22.0 19.0 18.0 Y 0 \n",
"2 1.0 1.0 21.0 16.0 22.0 Y 0 \n",
"3 1.0 1.0 19.0 26.0 18.0 Y 0 \n",
"4 1.0 0.6 16.0 14.0 22.0 Y 0 \n",
"\n",
" tartar smoking \n",
"0 Y 0 \n",
"1 Y 0 \n",
"2 N 1 \n",
"3 Y 0 \n",
"4 N 0 \n",
"\n",
"[5 rows x 27 columns]"
],
"text/html": [
"\n",
"
\n", " | ID | \n", "gender | \n", "age | \n", "height(cm) | \n", "weight(kg) | \n", "waist(cm) | \n", "eyesight(left) | \n", "eyesight(right) | \n", "hearing(left) | \n", "hearing(right) | \n", "... | \n", "hemoglobin | \n", "Urine protein | \n", "serum creatinine | \n", "AST | \n", "ALT | \n", "Gtp | \n", "oral | \n", "dental caries | \n", "tartar | \n", "smoking | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "F | \n", "40 | \n", "155 | \n", "60 | \n", "81.3 | \n", "1.2 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "... | \n", "12.9 | \n", "1.0 | \n", "0.7 | \n", "18.0 | \n", "19.0 | \n", "27.0 | \n", "Y | \n", "0 | \n", "Y | \n", "0 | \n", "
1 | \n", "1 | \n", "F | \n", "40 | \n", "160 | \n", "60 | \n", "81.0 | \n", "0.8 | \n", "0.6 | \n", "1.0 | \n", "1.0 | \n", "... | \n", "12.7 | \n", "1.0 | \n", "0.6 | \n", "22.0 | \n", "19.0 | \n", "18.0 | \n", "Y | \n", "0 | \n", "Y | \n", "0 | \n", "
2 | \n", "2 | \n", "M | \n", "55 | \n", "170 | \n", "60 | \n", "80.0 | \n", "0.8 | \n", "0.8 | \n", "1.0 | \n", "1.0 | \n", "... | \n", "15.8 | \n", "1.0 | \n", "1.0 | \n", "21.0 | \n", "16.0 | \n", "22.0 | \n", "Y | \n", "0 | \n", "N | \n", "1 | \n", "
3 | \n", "3 | \n", "M | \n", "40 | \n", "165 | \n", "70 | \n", "88.0 | \n", "1.5 | \n", "1.5 | \n", "1.0 | \n", "1.0 | \n", "... | \n", "14.7 | \n", "1.0 | \n", "1.0 | \n", "19.0 | \n", "26.0 | \n", "18.0 | \n", "Y | \n", "0 | \n", "Y | \n", "0 | \n", "
4 | \n", "4 | \n", "F | \n", "40 | \n", "155 | \n", "60 | \n", "86.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "... | \n", "12.5 | \n", "1.0 | \n", "0.6 | \n", "16.0 | \n", "14.0 | \n", "22.0 | \n", "Y | \n", "0 | \n", "N | \n", "0 | \n", "
5 rows × 27 columns
\n", "