Switch to unified view

a/README.md b/README.md
1
<div align = center>
1
<div align = center>
2
 <h1 > 🗃 ViMedical Disease</h1>
2
 <h1 > 🗃 ViMedical Disease</h1>
3
3
4
[![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fgithub.com%2FPB3002%2FViMedical_Disease&label=View&countColor=%230475b6&style=plastic&labelStyle=none)](https://visitorbadge.io/status?path=https%3A%2F%2Fgithub.com%2FPB3002%2FViMedical_Disease) ![Contributor](https://img.shields.io/badge/contributors-2-brightgreen) ![License](https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-orange)
4
[![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fgithub.com%2FPB3002%2FViMedical_Disease&label=View&countColor=%230475b6&style=plastic&labelStyle=none)]
5
5
(https://visitorbadge.io/status?path=https%3A%2F%2Fgithub.com%2FPB3002%2FViMedical_Disease) 
6
![Contributor](https://img.shields.io/badge/contributors-2-brightgreen) ![License](https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-orange)
7
6
<p> Creation date: 2024-04-05 <br>Authors: Phuc Nguyen, Dao Thong 
8
<p> Creation date: 2024-04-05 <br>Authors: Phuc Nguyen, Dao Thong 
7
</div>
9
</div>
8
10
9
[**Vietnamese cap: here**](https://github.com/PB3002/ViMedical_Disease/tree/main/README_Vietnamese.md)
11
[**Vietnamese cap: here**](https://github.com/PB3002/ViMedical_Disease/tree/main/README_Vietnamese.md)
10
12
11
## Overview:
13
## Overview:
12
14
13
ViMedical Disease is a Vietnamese dataset that encompasses a collection of over 12K+ questions and symptoms for common diseases. This dataset is designed to aid in the classification of disease symptoms and the preliminary identification of medical conditions. It contains information about numerous prevalent diseases, including cardiovascular, gastrointestinal, neurological, dermatological, endocrine, and other conditions. This dataset can be used for research purposes, developing predictive models for diseases, or providing users with information on common disease symptoms.
15
ViMedical Disease is a Vietnamese dataset that encompasses a collection of over 12K+ questions and symptoms for common diseases. This dataset is designed to aid in the classification of disease symptoms and the preliminary identification of medical conditions. It contains information about numerous prevalent diseases, including cardiovascular, gastrointestinal, neurological, dermatological, endocrine, and other conditions. This dataset can be used for research purposes, developing predictive models for diseases, or providing users with information on common disease symptoms.
14
16
15
The dataset is now available on other platforms, including:
17
The dataset is now available on other platforms, including:
16
18
17
- <img src="https://static-00.iconduck.com/assets.00/kaggle-icon-2048x2048-fxhlmjy3.png" title="" alt="" width="20">  [**<u>Kaggle</u>**](https://www.kaggle.com/datasets/pb30025030/vimedical-disease/data)
19
- <img src="https://static-00.iconduck.com/assets.00/kaggle-icon-2048x2048-fxhlmjy3.png?raw=true" title="" alt="" width="20">  [**<u>Kaggle</u>**](https://www.kaggle.com/datasets/pb30025030/vimedical-disease/data)
18
20
19
- <img title="" src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" alt="" width="25"> [**<u>Huggingface</u>**](https://huggingface.co/datasets/PB3002/ViMedical_Disease)
21
- <img title="" src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png?raw=true" alt="" width="25"> [**<u>Huggingface</u>**](https://huggingface.co/datasets/PB3002/ViMedical_Disease)
20
22
21
## ⚠️Disclaimer:
23
## ⚠️Disclaimer:
22
24
23
- This dataset provides information about disease symptoms only; it is not a formal medical diagnosis.
25
- This dataset provides information about disease symptoms only; it is not a formal medical diagnosis.
24
26
25
- Users should consult with a healthcare professional for an accurate diagnosis and treatment.
27
- Users should consult with a healthcare professional for an accurate diagnosis and treatment.
26
28
27
## Data construction process:
29
## Data construction process:
28
30
29
The ViMedical_Diseases dataset is based on a pre-existing dataset from the Kalapa Bytebattles 2023 competition in the Vietnamese Medical Question Answering section.
31
The ViMedical_Diseases dataset is based on a pre-existing dataset from the Kalapa Bytebattles 2023 competition in the Vietnamese Medical Question Answering section.
30
32
31
The dataset provides over 600 articles on various diseases, collected from Tam Anh Hospital.
33
The dataset provides over 600 articles on various diseases, collected from Tam Anh Hospital.
32
34
33
From the available dataset, 603 different diseases were extracted, and 20 questions were generated about patients experiencing symptoms of those diseases.
35
From the available dataset, 603 different diseases were extracted, and 20 questions were generated about patients experiencing symptoms of those diseases.
34
36
35
<img src="./asset/image/dataset_progress_en.png"/>
37
<img src="./asset/image/dataset_progress_en.png"/>
36
38
37
## Structure:
39
## Structure:
38
40
39
Each question will be structured as follows: "I'm having symptoms like/I'm feeling/I often/..." + description of the disease's symptoms from the article + ". What could be wrong with me?"
41
Each question will be structured as follows: "I'm having symptoms like/I'm feeling/I often/..." + description of the disease's symptoms from the article + ". What could be wrong with me?"
40
42
41
Each question will begin with phrases like "I/I'm having symptoms like/I'm feeling/I often/...".
43
Each question will begin with phrases like "I/I'm having symptoms like/I'm feeling/I often/...".
42
44
43
Following these phrases will be symptoms of the disease taken from the article.
45
Following these phrases will be symptoms of the disease taken from the article.
44
46
45
And the question will always end with the question "What could be wrong with me?".
47
And the question will always end with the question "What could be wrong with me?".
46
48
47
The dataset has 2 columns:
49
The dataset has 2 columns:
48
50
49
- `Disease`: Disease name
51
- `Disease`: Disease name
50
52
51
- `Question`: Question and description of disease symptoms
53
- `Question`: Question and description of disease symptoms
52
54
53
## Example data:
55
## Example data:
54
56
55
| Disease             | Question                                                                                    |
57
| Disease             | Question                                                                                    |
56
|:------------------- |:------------------------------------------------------------------------------------------- |
58
|:------------------- |:------------------------------------------------------------------------------------------- |
57
| Bệnh Cơ Tim Giãn Nở | Tôi đang cảm thấy mệt mỏi, chóng mặt và nhịp tim không đều. Tôi có thể đang bị bệnh gì?     |
59
| Bệnh Cơ Tim Giãn Nở | Tôi đang cảm thấy mệt mỏi, chóng mặt và nhịp tim không đều. Tôi có thể đang bị bệnh gì?     |
58
| Alzheimer           | Tôi hay quên mất mình đang làm gì và mục đích của hành động đó. Tôi có thể đang bị bệnh gì? |
60
| Alzheimer           | Tôi hay quên mất mình đang làm gì và mục đích của hành động đó. Tôi có thể đang bị bệnh gì? |
59
| Viêm Cầu Thận Lupus | Tôi đang cảm thấy suy giảm chức năng thận, hội chứng thận hư. Tôi có thể đang bị bệnh gì?   |
61
| Viêm Cầu Thận Lupus | Tôi đang cảm thấy suy giảm chức năng thận, hội chứng thận hư. Tôi có thể đang bị bệnh gì?   |
60
62
61
## Dataset Usage:
63
## Dataset Usage:
62
64
63
-    Data Analysis
65
-    Data Analysis
64
-    Building a Disease Prediction Model
66
-    Building a Disease Prediction Model
65
-    Creating a Chatbot
67
-    Creating a Chatbot
66
-    User Support
68
-    User Support
67
69
68
## Contribute to the Project:
70
## Contribute to the Project:
69
71
70
We welcome any ideas and contributions to improve the project. If you have any improvement suggestions, please share them with us by sending detailed information about your desired changes to phucgot3110a1@gmail.com. You can also directly contribute to the project by [creating a Pull Request](https://github.com/PB3002/ViMedical_Disease/pulls). We appreciate and acknowledge all contributions.
72
We welcome any ideas and contributions to improve the project. If you have any improvement suggestions, please share them with us by sending detailed information about your desired changes to phucgot3110a1@gmail.com. You can also directly contribute to the project by [creating a Pull Request](https://github.com/PB3002/ViMedical_Disease/pulls). We appreciate and acknowledge all contributions.
71
73
72
## 📢 Copyright:
74
## 📢 Copyright:
73
75
74
Copyright © 2024 PB. All rights reserved.
76
Copyright © 2024 PB. All rights reserved.
75
77
76
This dataset is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
78
This dataset is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
77
79
78
**By downloading the dataset, users agree to:**
80
**By downloading the dataset, users agree to:**
79
81
80
- Use the dataset only for non-commercial purposes, including research, education, and personal use.
82
- Use the dataset only for non-commercial purposes, including research, education, and personal use.
81
83
82
- Attribute the dataset to ViMedical by clearly and prominently citing it in all instances of use.
84
- Attribute the dataset to ViMedical by clearly and prominently citing it in all instances of use.
83
85
84
- Not modify, adapt, or create derivative works based on the dataset.
86
- Not modify, adapt, or create derivative works based on the dataset.
85
87
86
- Comply with all applicable laws and regulations regarding the use of personal data.
88
- Comply with all applicable laws and regulations regarding the use of personal data.
87
89
88
- To be solely responsible for any consequences arising from the use of the dataset.
90
- To be solely responsible for any consequences arising from the use of the dataset.