a b/README.md
1
# What is this?
2
3
This codebase was to created to make it easier for machine learning researchers to create innovation in the medical field. Combining machine learning with the medical field would decrease false diagnoses and save lives.
4
5
This codebase is focused on predicting lung diseases from a breathing sounds dataset. 8 experiments were conducted on 5 different machine learning models. In addition, a novel data augmentation algorithm was performed and tested.
6
7
The following experiments were conducted on 4 classical machine learning methods (decision tree, random forest, SVM, XGBoost):
8
9
- Using all features to train the models
10
- Using less complex models to decrease overfitting
11
- Using class weights to counter dataset unbalancedness
12
- Using fewer features to decrease noise in the data
13
14
Experiments on the deep learning model (CNN) were as follows:
15
16
- Using all features to train the model
17
- Using class weights to counter dataset unbalancedness
18
- Using a novel data augmentation algorithm
19
20
# Instructions for use 
21
22
1. Clone the code from Github
23
2. Download data files and add them to the "dataframes" folder: https://drive.google.com/drive/folders/1ZEXr-3vSjL-_QR6x-cvRpm4XKJG3j-VU?usp=sharing
24
3. Run the "master" notebook to see the experiments process. Run the "data generation" notebooks to see the data extraction process.
25
26
# Thesis
27
28
## ["Predicting Respiratory Diseases from Lung Sounds Using Machine Learning"](https://docdro.id/bATDzgO)
29
30
![Poster and high level overview of the code and model results](https://i.ibb.co/5xGTfbx/JPG-richard-annilo-poster-v2.jpg)
31
32