Card

Bangladesh Smokers Survey - Data Analysis and Predictive Modeling

Overview

The "Bangladesh Smokers Survey" repository hosts an extensive data analysis and predictive modeling project that examines the impact of smoking habits on health issues in different districts of Bangladesh. This project uses machine learning techniques to analyze data collected from both smokers and non-smokers, focusing on a wide array of factors such as health symptoms, smoking status, and demographic details.

Contents of the Notebook

  • BD - District Wise Smokers Choropleth: Visual representation of smoking prevalence across districts.
  • Exploratory Data Analysis (EDA): In-depth analysis including statistical summaries, missing value analysis, and distribution studies of numerical and categorical features.
  • Predictive Analysis on Health Issues Based on Smoking Habits: Examining the likelihood of health symptoms in relation to smoking habits.
  • Comparison of Health Risks between Smokers and Non-Smokers: Analyzing and contrasting health risks among these two groups.
  • Clustering of Districts Based on Health Data: Grouping districts with similar health and smoking profiles.
  • Predictive Analysis on Exposure to Secondhand Smoke: Assessing the impact of secondhand smoke.
  • Logistic Regression for Smoking Initiation Age Prediction: Predicting the age of smoking initiation.
  • Decision Trees for Predicting Smoking Status: Utilizing decision trees to predict smoking status.
  • Principal Component Analysis (PCA) for Data Visualization: Simplifying the data visualization process with PCA.

Methodology

The project utilizes a range of machine learning techniques, including RandomForestClassifier, KMeans, GaussianNB, LogisticRegression, and DecisionTreeClassifier. Principal Component Analysis (PCA) is also employed for effective data visualization.

Data Collection

Survey data cover various aspects such as age, gender, profession, and health symptoms like cough, chest pain, and wheezing, aiming to provide comprehensive insights into the smoking trends and associated health issues in Bangladesh.

Key Insights

The notebook includes significant findings from the survey data, offering insights into the prevalence of smoking habits, their correlation with health issues, and the effectiveness of different predictive models in understanding these trends.

Contributing

This repository welcomes contributions. Feel free to suggest enhancements, report issues, or submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.