a/README.md | b/README.md | ||
---|---|---|---|
1 | # Deep Medical Diagnosis |
1 | # Deep Medical Diagnosis |
2 | 2 | ||
3 | QA solving using DL methods applied to medical diagnosis. |
3 | QA solving using DL methods applied to medical diagnosis. |
4 | 4 | ||
5 | ## Dataset |
5 | ## Dataset |
6 | 6 | ||
7 | We have a dataset of 2350 diseases with their symptoms. The data format is similar to the dataset from |
7 | We have a dataset of 2350 diseases with their symptoms. The data format is similar to the dataset from
|
8 | babi task (Facebook). For example, for the disease Achasia: |
8 | babi task (Facebook). For example, for the disease Achasia: |
9 | 9 | ||
10 |  |
10 |  |
11 | 11 | ||
12 | ### Exploratory analysis of the dataset |
12 | ### Exploratory analysis of the dataset |
13 | 13 | ||
14 | We have 2350 diseases on the dataset. Augmentation has been applied to the original dataset by replacing words by their synonyms and deleting and permuting |
14 | We have 2350 diseases on the dataset. Augmentation has been applied to the original dataset by replacing words by their synonyms and deleting and permuting
|
15 | facts randomly. The final dataset consists of: |
15 | facts randomly. The final dataset consists of: |
16 | 16 | ||
17 | * Vocabulary size: 21520 |
17 | * Vocabulary size: 21520
|
18 | * Story max length: 1647 |
18 | * Story max length: 1647
|
19 | * Number of training stories: 133093 |
19 | * Number of training stories: 133093
|
20 | * Number of test stories: 59394 |
20 | * Number of test stories: 59394 |
21 | 21 | ||
22 | 22 | ||
23 | Here's how a sample looks like (input, disease): |
23 | Here's how a sample looks like (input, disease): |
24 | 24 | ||
25 | ([[u'adult', u'symptoms', u'of', u'likewise', u'begin', u'to', u'be', u'far', u'more', u'subtle', u'than', u'childhood', u'symptoms', u'.'], [u'being', u'unable', u'to', u'stick', u'at', u'tasks', u'that', u'are', u'borings', u'or', u'clip', u'down', u'.'], [u'but', u'it', u's', u'recognise', u'that', u'symptoms', u'of', u'prevail', u'from', u'childhood', u'into', u'a', u'person', u's', u'teenage', u'and', u'then', u'adulthood', u'.'], [u'carelessnesses', u'and', u'want', u'of', u'to', u'detail', u'.'], [u'continually', u'starting', u'new', u'tasks', u'before', u'finish', u'old', u'ones', u'.'], [u'being', u'unable', u'to', u'postponement', u'their', u'bend', u'.'], [u'some', u'specialists', u'have', u'suggest', u'the', u'followers', u'list', u'of', u'symptoms', u'associated', u'with', u'in', u'adults', u'.'], [u'the', u'symptoms', u'of', u'can', u'be', u'categorize', u'into', u'two', u'types', u'of', u'behavioral', u'jobs', u'.'], [u'interrupt', u'conversations', u'.'], [u'constantly', u'fidget', u'.'], [u'having', u'difficulty', u'coordinate', u'tasks', u'.'], [u'the', u'main', u'signs', u'of', u'and', u'impulsivenesses', u'are', u'.']],[u'attention deficit hyperactivity disorder']) |
25 | ([[u'adult', u'symptoms', u'of', u'likewise', u'begin', u'to', u'be', u'far', u'more', u'subtle', u'than', u'childhood', u'symptoms', u'.'], [u'being', u'unable', u'to', u'stick', u'at', u'tasks', u'that', u'are', u'borings', u'or', u'clip', u'down', u'.'], [u'but', u'it', u's', u'recognise', u'that', u'symptoms', u'of', u'prevail', u'from', u'childhood', u'into', u'a', u'person', u's', u'teenage', u'and', u'then', u'adulthood', u'.'], [u'carelessnesses', u'and', u'want', u'of', u'to', u'detail', u'.'], [u'continually', u'starting', u'new', u'tasks', u'before', u'finish', u'old', u'ones', u'.'], [u'being', u'unable', u'to', u'postponement', u'their', u'bend', u'.'], [u'some', u'specialists', u'have', u'suggest', u'the', u'followers', u'list', u'of', u'symptoms', u'associated', u'with', u'in', u'adults', u'.'], [u'the', u'symptoms', u'of', u'can', u'be', u'categorize', u'into', u'two', u'types', u'of', u'behavioral', u'jobs', u'.'], [u'interrupt', u'conversations', u'.'], [u'constantly', u'fidget', u'.'], [u'having', u'difficulty', u'coordinate', u'tasks', u'.'], [u'the', u'main', u'signs', u'of', u'and', u'impulsivenesses', u'are', u'.']],[u'attention deficit hyperactivity disorder']) |
26 | 26 | ||
27 | The next plots show 1) the number of facts in each sample and 2) the number of words in each fact: |
27 | The next plots show 1) the number of facts in each sample and 2) the number of words in each fact: |
28 | 28 | ||
29 | 29 | ||
30 | 30 | ||
31 | Facts x sample | Words x fact |
31 | Facts x sample | Words x fact
|
32 | :-------------------------:|:-------------------------: |
32 | :-------------------------:|:-------------------------:
|
33 | <img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/facts_by_disease.png" width="350"> | <img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/word_by_fact.png" width="350" > |
33 | <img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/facts_by_disease.png?raw=true" width="350"> | <img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/word_by_fact.png?raw=true" width="350" > |
34 | 34 | ||
35 | ## Training a RNN |
35 | ## Training a RNN |
36 | 36 | ||
37 | A GRU network is trained on the dataset. The GRU consists of two layers of 128 units with a dropout of 0.5. The output is a softmax layer. Pre-trained word vectors are used as embedding layer. |
37 | A GRU network is trained on the dataset. The GRU consists of two layers of 128 units with a dropout of 0.5. The output is a softmax layer. Pre-trained word vectors are used as embedding layer.
|
38 | The network works very well on the training and test dataset but is suboptimal on self-made data. There are various possible ways to solve that which must be studied. |
38 | The network works very well on the training and test dataset but is suboptimal on self-made data. There are various possible ways to solve that which must be studied. |
39 | 39 | ||
40 | Accuracy curves on the validation set are shown in the next image |
40 | Accuracy curves on the validation set are shown in the next image |
41 | 41 | ||
42 |  |
42 |  |
43 | 43 | ||
44 | An example of output for data from the dataset is shown next. It is interesting to notice that the network outputs related diseases as the most probable 5, meaning that is understanding that those diseases are related (in this case all are psychological disorders). |
44 | An example of output for data from the dataset is shown next. It is interesting to notice that the network outputs related diseases as the most probable 5, meaning that is understanding that those diseases are related (in this case all are psychological disorders). |
45 | 45 | ||
46 | ``` |
46 | ```
|
47 | Predictions for data: |
47 | Predictions for data:
|
48 | [[u'is', u'easily', u'distract', u'.'], [u'is', u'oftentimes', u'short', u'in', u'daily', u'activities', u'.'], [u'hyperactivity', u'symptoms', u'.'], [u'fidgetinesses', u'with', u'custodies', u'or', u'pess', u'or', u'wriggles', u'in', u'place', u'.'], [u'leaves', u'place', u'when', u'remain', u'sit', u'is', u'expect', u'.'], [u'runs', u'about', u'or', u'rises', u'in', u'inappropriate', u'situations', u'.'], [u'has', u'jobs', u'playing', u'or', u'workings', u'softly', u'.'], [u'is', u'oft', u'on', u'the', u'turn', u'acts', u'as', u'if', u'drive', u'by', u'a', u'motor', u'.'], [u'negotiations', u'excessively', u'.'], [u'impulsivity', u'symptoms', u'.'], [u'blurts', u'out', u'replies', u'before', u'enquiries', u'have', u'been', u'complete', u'.'], [u'has', u'trouble', u'look', u'crook', u'.'], [u'interrupts', u'or', u'irrupts', u'on', u'others', u'butts', u'into', u'conversations', u'or', u'games', u'.']] |
48 | [[u'is', u'easily', u'distract', u'.'], [u'is', u'oftentimes', u'short', u'in', u'daily', u'activities', u'.'], [u'hyperactivity', u'symptoms', u'.'], [u'fidgetinesses', u'with', u'custodies', u'or', u'pess', u'or', u'wriggles', u'in', u'place', u'.'], [u'leaves', u'place', u'when', u'remain', u'sit', u'is', u'expect', u'.'], [u'runs', u'about', u'or', u'rises', u'in', u'inappropriate', u'situations', u'.'], [u'has', u'jobs', u'playing', u'or', u'workings', u'softly', u'.'], [u'is', u'oft', u'on', u'the', u'turn', u'acts', u'as', u'if', u'drive', u'by', u'a', u'motor', u'.'], [u'negotiations', u'excessively', u'.'], [u'impulsivity', u'symptoms', u'.'], [u'blurts', u'out', u'replies', u'before', u'enquiries', u'have', u'been', u'complete', u'.'], [u'has', u'trouble', u'look', u'crook', u'.'], [u'interrupts', u'or', u'irrupts', u'on', u'others', u'butts', u'into', u'conversations', u'or', u'games', u'.']]
|
49 | Disease: attention deficit hyperactivity disorder |
49 | Disease: attention deficit hyperactivity disorder
|
50 | 5 most prob. diseases: [u'attention deficit hyperactivity disorder', u'oppositional defiant disorder', u'seasonal affective disorder', u'anorexia nervosa', u'language disorder children'] |
50 | 5 most prob. diseases: [u'attention deficit hyperactivity disorder', u'oppositional defiant disorder', u'seasonal affective disorder', u'anorexia nervosa', u'language disorder children']
|
51 | ``` |
51 | ``` |
52 | 52 | ||
53 | Next, we test the neural network on self-made symptoms, some examples are shown next. While in the first case the network correctly identify the disease, in the second case does not. |
53 | Next, we test the neural network on self-made symptoms, some examples are shown next. While in the first case the network correctly identify the disease, in the second case does not. |
54 | 54 | ||
55 | ``` |
55 | ```
|
56 | [u'oily skin', u'painful touch skin', u'face affected almost everywhere', u'chest affected', u'some blackheads', u'a lot of papules', u'papules', u'nodules', u'cysts'] |
56 | [u'oily skin', u'painful touch skin', u'face affected almost everywhere', u'chest affected', u'some blackheads', u'a lot of papules', u'papules', u'nodules', u'cysts']
|
57 | Disease: acne |
57 | Disease: acne
|
58 | 5 most prob. diseases: [u'acne', u'genital warts', u'scarlet fever', u'erythema multiforme', u'measles'] |
58 | 5 most prob. diseases: [u'acne', u'genital warts', u'scarlet fever', u'erythema multiforme', u'measles'] |
59 | 59 | ||
60 | [u'pulsating feeling in stomach', u'persistent back pain', u'abdominal pain', u'severe pain in the middle abdomen', u'dizziness', u'clammy skin', u'tachycardia', u'loss of consciousness'] |
60 | [u'pulsating feeling in stomach', u'persistent back pain', u'abdominal pain', u'severe pain in the middle abdomen', u'dizziness', u'clammy skin', u'tachycardia', u'loss of consciousness']
|
61 | Disease: abdominal aortic aneurysm |
61 | Disease: abdominal aortic aneurysm
|
62 | 5 most prob. diseases: [u'pulmonary actinomycosis', u'mucormycosis', u'sleeping sickness', u'acute myeloid leukemia', u'bile duct obstruction'] |
62 | 5 most prob. diseases: [u'pulmonary actinomycosis', u'mucormycosis', u'sleeping sickness', u'acute myeloid leukemia', u'bile duct obstruction']
|
63 | ``` |
63 | ``` |
64 | 64 | ||
65 | 65 |