Diff of /README.md [000000] .. [5c3b8b]

Switch to unified view

a b/README.md
1
# Deep Medical Diagnosis
2
3
QA solving using DL methods applied to medical diagnosis.
4
5
## Dataset
6
7
We have a dataset of 2350 diseases with their symptoms. The data format is similar to the dataset from 
8
babi task (Facebook). For example, for the disease Achasia:
9
10
![Achasia](http://www.diegoacuna.me/factlist.png)
11
12
### Exploratory analysis of the dataset
13
14
We have 2350 diseases on the dataset. Augmentation has been applied to the original dataset by replacing words by their synonyms and deleting and permuting
15
facts randomly. The final dataset consists of:
16
17
* Vocabulary size: 21520
18
* Story max length: 1647
19
* Number of training stories: 133093
20
* Number of test stories: 59394
21
22
23
Here's how a sample looks like (input, disease):
24
25
([[u'adult', u'symptoms', u'of', u'likewise', u'begin', u'to', u'be', u'far', u'more', u'subtle', u'than', u'childhood', u'symptoms', u'.'], [u'being', u'unable', u'to', u'stick', u'at', u'tasks', u'that', u'are', u'borings', u'or', u'clip', u'down', u'.'], [u'but', u'it', u's', u'recognise', u'that', u'symptoms', u'of', u'prevail', u'from', u'childhood', u'into', u'a', u'person', u's', u'teenage', u'and', u'then', u'adulthood', u'.'], [u'carelessnesses', u'and', u'want', u'of', u'to', u'detail', u'.'], [u'continually', u'starting', u'new', u'tasks', u'before', u'finish', u'old', u'ones', u'.'], [u'being', u'unable', u'to', u'postponement', u'their', u'bend', u'.'], [u'some', u'specialists', u'have', u'suggest', u'the', u'followers', u'list', u'of', u'symptoms', u'associated', u'with', u'in', u'adults', u'.'], [u'the', u'symptoms', u'of', u'can', u'be', u'categorize', u'into', u'two', u'types', u'of', u'behavioral', u'jobs', u'.'], [u'interrupt', u'conversations', u'.'], [u'constantly', u'fidget', u'.'], [u'having', u'difficulty', u'coordinate', u'tasks', u'.'], [u'the', u'main', u'signs', u'of', u'and', u'impulsivenesses', u'are', u'.']],[u'attention deficit hyperactivity disorder'])
26
27
The next plots show 1) the number of facts in each sample and 2) the number of words in each fact:
28
29
30
31
 Facts x sample                   | Words x fact
32
:-------------------------:|:-------------------------:
33
<img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/facts_by_disease.png" width="350">  | <img src="https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/word_by_fact.png" width="350" >
34
35
## Training a RNN 
36
37
A GRU network is trained on the dataset. The GRU consists of two layers of 128 units with a dropout of 0.5. The output is a softmax layer. Pre-trained word vectors are used as embedding layer. 
38
The network works very well on the training and test dataset but is suboptimal on self-made data. There are various possible ways to solve that which must be studied.
39
40
Accuracy curves on the validation set are shown in the next image
41
42
![Accuracy](https://github.com/jgpavez/MedicalDiagnosis/blob/master/plots/accuracy.png)
43
44
An example of output for data from the dataset is shown next. It is interesting to notice that the network outputs related diseases as the most probable 5, meaning that is understanding that those diseases are related (in this case all are psychological disorders).
45
46
```
47
Predictions for data:
48
[[u'is', u'easily', u'distract', u'.'], [u'is', u'oftentimes', u'short', u'in', u'daily', u'activities', u'.'], [u'hyperactivity', u'symptoms', u'.'], [u'fidgetinesses', u'with', u'custodies', u'or', u'pess', u'or', u'wriggles', u'in', u'place', u'.'], [u'leaves', u'place', u'when', u'remain', u'sit', u'is', u'expect', u'.'], [u'runs', u'about', u'or', u'rises', u'in', u'inappropriate', u'situations', u'.'], [u'has', u'jobs', u'playing', u'or', u'workings', u'softly', u'.'], [u'is', u'oft', u'on', u'the', u'turn', u'acts', u'as', u'if', u'drive', u'by', u'a', u'motor', u'.'], [u'negotiations', u'excessively', u'.'], [u'impulsivity', u'symptoms', u'.'], [u'blurts', u'out', u'replies', u'before', u'enquiries', u'have', u'been', u'complete', u'.'], [u'has', u'trouble', u'look', u'crook', u'.'], [u'interrupts', u'or', u'irrupts', u'on', u'others', u'butts', u'into', u'conversations', u'or', u'games', u'.']]
49
Disease: attention deficit hyperactivity disorder
50
5 most prob. diseases: [u'attention deficit hyperactivity disorder', u'oppositional defiant disorder', u'seasonal affective disorder', u'anorexia nervosa', u'language disorder children']
51
```
52
53
Next, we test the neural network on self-made symptoms, some examples are shown next. While in the first case the network correctly identify the disease, in the second case does not.
54
55
```
56
[u'oily skin', u'painful touch skin', u'face affected almost everywhere', u'chest affected', u'some blackheads', u'a lot of papules', u'papules', u'nodules', u'cysts']
57
Disease: acne
58
5 most prob. diseases: [u'acne', u'genital warts', u'scarlet fever', u'erythema multiforme', u'measles']
59
60
[u'pulsating feeling in stomach', u'persistent back pain', u'abdominal pain', u'severe pain in the middle abdomen', u'dizziness', u'clammy skin', u'tachycardia', u'loss of consciousness']
61
Disease: abdominal aortic aneurysm
62
5 most prob. diseases: [u'pulmonary actinomycosis', u'mucormycosis', u'sleeping sickness', u'acute myeloid leukemia', u'bile duct obstruction']
63
```
64
65