Diff of /ReadMe.txt [000000] .. [1be9b6]

Switch to unified view

a b/ReadMe.txt
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  CHAPTER 1 INTRODUCTION
24
25
1. Introduction
26
This chapter gives an introduction to the project so that the idea about the overall project is understood well, it also contains the details about the problem statement and the aims and objectives of the given project.
27
28
   1.1 Overview 
29
30
A major challenge facing healthcare organizations (hospitals, medical centers) is predicting the diseases with greater accuracy and at an early stage. 
31
Here a system is proposed which will predict cancer of patients at its earlier stage by using genomic expression instead of only clinical expressions which will help us to achieve better accuracy. Gene data gives a better advantage since it has the potential ability to indicate cancer at an earlier stage which can be used to train the model more efficiently thus producing overall result more accurately. Different supervised learning algorithms like a highly versatile support vector machine (SVM) algorithm, Naive Bayes theorem, Decision tree and nearest neighbors approach to predict cancer of the patient are being used here. Using these methods, classification of patients will be done to predict whether a patient is suffering from cancer or not.
32
33
Over a long period of time, innovation on effective cancer treatment is in progress. Scientists applied different approaches such as screening at an early stage, in order to predict cancer type before the symptoms started to develop. The approach which was used by them was multi omics data viz biological data analysis. With the advancement of new technologies in the field of medicine, vast quantities of cancer data have been collected and are available for medical research. These datasets of new technologies are based on genomic data. However, the accurate prediction of a disease at an early stage is one of the most interesting and challenging tasks for physicians. 
34
35
36
37
38
39
40
41
42
43
1.2 Dataset used
44
45
Gene Expression profiling is used in the proposed system. It is nothing but genomic data. It is the measurement of activity of 'n' number of genes at a single point of time to create a thorough picture of cellular function. 
46
A Laboratory tool called a microarray helps in detecting various gene expressions simultaneously. The microscopic slides that have hundreds of tiny spots printed in specific positions are said to be DNA microarrays. Each spot in microscopic slides is known as DNA sequence or gene. The DNA molecules on such slides acts as probes that helps in detecting gene expression. These molecules are also known as transcriptome or RNA transcripts. 
47
In this microarray analysis procedure the RNA molecules of healthy individual and cancer patient are accumulated at one place. These samples are then converted into DNA samples of complementary version (cDNA). Each sample is labelled with different colors. The two accumulated samples are then combined on the microscopic slides. This process is called as Hybridization. After hybridization process scanning of microarray takes place by which expression of each sample or gene will be found. If the mutation of gene is greater than experimental sample then the spot will turn red otherwise green. If the mutation is equal then it turns yellow. In this way gene expression profile is generated. 
48
49
50
51
52
53
1.3 Methods 
54
55
A lot of research is being done on breast cancer. Researchers have developed breast cancer risk models which give probability of cancer occurrence. They make use of Clinical Data. There are few models which provide such risk probability. International breast cancer intervention study model (IBIS), Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm model (BOADICEA), the BRCAPRO model and the Breast Cancer Risk Assessment Tool (BCRAT) also known as the Gail model.
56
IBIS and BOADICEA models are trained with around 19,000 samples and accuracy obtained by these models were 71% and 70% respectively. Whereas BRCAPRO and BCRAT models underestimated the risk and had an accuracy of about 68% and 60%.  
57
Different methods are used to build such predictive models. Machine learning provides algorithms which can help in building such models. Machine Learning comprises different types of learning like supervised learning, unsupervised learning, semi-supervised learning etc. Supervised learning is used when the datasets consist of labelled output. Unsupervised learning is used when datasets does not have output label with it and semi-supervised learning is used when datasets consists of both labelled and unlabeled values. Here the datasets used to train models have labelled values, so a supervised learning method is used here. Different supervised learning methods are available and such methods are used for prediction. The prediction models built in this study is using Support Vector Machine (SVM), Naïve Bayes, Decision Tree and K-Nearest neighbors (KNN). All these are supervised learning algorithms.
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
     CHAPTER 2 LITERATURE REVIEW
84
85
86
87
2. Literature Review
88
89
A literature review is a text of a scholarly paper, which includes the current knowledge including substantive findings, as well as theoretical and methodological contributions to a particular topic. Literature reviews are secondary sources, and do not report new or original experimental work.
90
91
       
92
   2.1 Review of Literature
93
94
1. "Predicting Cancer Prognosis Using Functional Genomics Data Sets"
95
Jishnu Das et al. have compared various computational methods that have used different functional genomics datasets.  They identify the molecular patterns that can be used for predicting prognosis of various human cancer tumors. Furthermore, they have outlined the challenges and how such approaches can be useful in solving those [1].
96
2.  "Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy"
97
Cai Huang et al. have designed a software platform which predicts cancer from gene expression profiles. They used SVM based algorithm and for regularization they used Recursive Feature Elimination. Their main finding was that the model works best when it uses all probe-set expression profiles of individual patient tumors. They have achieved more than 75% accuracy [2]. 
98
3. "Machine learning applications in cancer prognosis and prediction"
99
Konstantina Kouroua Themis et al. have evaluated all the prominent available ML models. This includes ANNs, BNs, SVMs and DTs. This paper aims to validate the best approaches available so that they can be considered in everyday clinical practice [3].
100
4. "Predicting stage-specific cancer related genes and their dynamic modules by integrating multiple datasets"
101
Chaima Aouiche et al. have proposed a structure to identify stage specific cancer related genes by integrating multiple datasets. Also they have built a network by taking each sample pathway as vertices and relationships between genes as edges [4]. 
102
5. "Deep Learning Methods for Predicting Disease Status Using Genomic Data"
103
Qianfan Wu et al. have studied four articles that predicted cancer using genomic expression. These deep learning methods outperformed existing models such as prediction based on transcript-wise screening and prediction based on principal component analysis [5].
104
6. "Dermatologist-level classification of skin cancer with deep neural networks"
105
Esteva A et al. used Convolutional Neural Networks to classify skin cancer. They just used skin lesion images and disease labels to train the mode. The model showed great potential [6].
106
107
7.  "ImageNet large scale visual recognition challenge"
108
Russakovsky O et al. analyzed the past 5 years of Image classification competition and drew useful patterns and predicted the future development of image classification and its usefulness in disease prediction [7].
109
110
8. "A practical guide to support vector classification"
111
Hsu C-W et al. have explained in detail Support Vector Classification and its potential in disease prediction [8].
112
113
9. " An Overview of Prognostics Markers in Breast Cancer "
114
Gu Deshpande et al. all the currently used biomarkers for cancer prediction and concluded that these aren't enough. He then studied some more biomarkers which can increase the reliability of model if integrated with the existing biomarkers [9].
115
116
10.  "A review of feature selection techniques in bioinformatics"
117
Sayes Y et al. have performed feature selection techniques by providing basic taxonomy of feature selection, discussing their use, and providing a variety of applications in both common as well as bioinformatics [10].
118
11. "Minimum redundancy maximum relevance feature selection approach for temporal gene expression data"
119
120
Radovic M et al. have proposed a temporal minimum redundancy-maximum relevance feature selection approach. The proposed system was able to handle multivariate temporal data without previous data flattening. Redundancy between the gene was computed using a dynamical time wrapping approach [11].
121
122
12. "Highly-accurate metabolomic detection of early-stage ovarian cancer"
123
124
125
Gaul DA et al. have proposed a system using linear support vector machine. The results which were achieved provided evidence for the importance of lipid and fatty acid metabolism in OC and this can be used for clinical significant diagnostic tests. [12]
126
127
13. "Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines" 
128
129
Guan W et al. have developed a system for ovarian cancer in which they developed new approaches for automatic classification of metabolic data. They have used SVM and cross fold validation technique which provided them highly accurate results [13].
130
131
14. "Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell".
132
133
Hoadley K. et al. have performed an integrative analysis using five genome wide platforms. In this paper methods such as Classification along with Correlation was used inorder to obtain better results [14].
134
135
15. "Computational models for predicting drug responses in cancer research "
136
137
Azuaje F. et al. have developed a model in which matching of tumor characteristics to the most effective therapy available and thus providing the patient with suitable precise medicine [15].
138
139
16. "From molecular mechanisms of leukemia induction to treatment of chronic myelogenous leukemia"
140
Salesse S. et al have Performed molecular mechanisms of leukemia induction to treatment of chronic myelogenous leukemia. In this paper they proposed a system with better accuracy [16].
141
142
17.  "Database resource of the national genomics data center"
143
Wenming Zhao. et al have provided a suite of genomic database resources. With the help of NGDC databases of genomic data a large number of requirements of data was made available publicly for study and research purposes[17].
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
CHAPTER 3 METHODOLOGIES AND IMPLEMENTATION
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
3. Methodologies and Implementation
182
In this chapter all the methodologies which are used to build this project are presented here. Along with methodologies corresponding implementation of project is given here 
183
3.1 Design Details
184
185
The aim of proposed methodology is accurate prediction of cancer using genomic data. Cancer is a complex disease and complete causes behind cancer development are not yet fully discovered. Also, cancer treatment causes lots of expenses during treatment, and it increases as the tumor grows. So by predicting cancer at an earlier stage, heavy expenses of medication can also be reduced.
186
The methodology of the proposed model is divided into four phases as shown in fig 1.
187
188
189
       Figure 3.1.1 - Phases of prediction model
190
The phases are described below. 
191
i) High Dimensional Input features - 
192
Here microarray gene expression is extracted from online open source repositories [17-18]. The National Center for Biotechnology Information (NCBI) provides access to biomedical and genomic information. The datasets consist of 17,818 genes and 590 samples (including 61 normal tissue samples and 529 breast cancer tissue samples).
193
ii) Feature Selection/Dimensionality Reduction - 
194
Since there are many genes, the model trained using all such genes may cause overfitting. Also, there are various genes which are not affecting the DNA mutation. To address this issue, major breast cancer causing genes are selected. There are 22 such major cancer causing genes namely BRCA1, BRCA2, ATM, BARD1, BRIP1, CDH1, CHEK2, MRE11A, MSH6, NBN, PALB2, PMS2, PTEN, RAD50, RAD51c, STK11, TP53, CASP8, CTLA4, CYP19A1, FGFR2, LSP1, MAP3K1 [19]. 
195
iii) Low Dimensional features - 
196
The dataset having 22 dimensions is preprocessed first. All the field values are numeric values. However, there were many fields where the values were not present, so these values were replaced with mean values.
197
iv) Prediction Models and Classifiers -
198
After preprocessing, the dataset is obtained having 530 samples having 22 features (genes). Support Vector Machine algorithm is performed first on weka tool. This tool has various inbuilt machine learning algorithms. It also preprocesses the data and trains the model and plots various graphs. Initially, the dataset was passed to weka tool for model building. Later, SVM was implemented using python 3 on google colab to build the model. Along with SVM, Naive Bayes algorithm based model is also built using python 3.
199
200
201
202
203
204
205
206
207
208
209
210
3.2 Algorithms:
211
i. Support Vector machine (SVM)
212
    The structured support vector machine is a machine learning algorithm that generalizes the Support Vector Machine (SVM) classifier. Whereas the SVM classifier supports binary classification, multiclass classification and regression, the structured SVM allows training of a classifier for general structured output labels.
213
ii. Decision Tree
214
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
215
Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning
216
   iii. Naïve Bayes
217
In machine learning, naïve Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naïve) independence assumptions between the features. They are among the simplest Bayesian network models. 
218
219
   iv. K - Nearest neighbors 
220
    k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until function evaluation.
221
222
3.3 Implementation
223
224
225
226
227
            Figure 3.3.1 - SVM  model trained on weka 
228
229
230
231
232
233
234
Figure 3.3.2 - Expected and Observed results for Cancer in SVM Model
235
236
237
238
                Figure 3.3.3- UI for model 2
239
240
241
242
243
244
                Figure 3.3.4 - Input values for genes
245
246
247
248
249
250
       Figure 3.3.5 - Result displayed based on model trained
251
252
253
      
254
      
255
256
       Figure 3.3.6 - Dataset with same values as given to UI 
257
    
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
    
292
    
293
    
294
    
295
    CHAPTER 4 PROJECT ANALYSIS
296
297
298
4. Project Analysis
299
    This chapter should give detail design of the project. It includes Block diagram of proposed system and UML diagrams (Use case diagram, Data flow diagram, Sequence diagram etc.) as applicable to the project.
300
4.1 Project TimeLine
301
302
303
304
Figure 4.1 Project Timeline 1
305
306
307
308
Figure 4.2 Project Timeline 2
309
310
311
312
Figure 4.3 Project Timeline 3
313
314
315
316
4.2 Task Distribution
317
318
319
320
Table 4.2 Task Distribution
321
322
323
TASK LIST
ASSIGNED TO
STATUS
324
Defining Project
Saurabh Sharma
325
Complete

Neel Shah


Rishiraj Singh

326
Literature Review
Saurabh Sharma
327
Complete

Neel Shah


Rishiraj Singh

328
Survey Paper
Saurabh Sharma
329
Complete

Neel Shah


Rishiraj Singh

330
Project Plan
Saurabh Sharma
331
Complete

Neel Shah


Rishiraj Singh

332
Project Analysis
Saurabh Sharma
333
Complete

Neel Shah


Rishiraj Singh

334
Input Page Design
Saurabh Sharma
335
Complete

Neel Shah


Rishiraj Singh

336
Documentation of Synopsis
Saurabh Sharma
337
Complete

Neel Shah


Rishiraj Singh

338
Dataset formatting
Saurabh Sharma
339
Complete

Neel Shah


Rishiraj Singh

340
Implementation
Saurabh Sharma
341
Complete

Neel Shah


Rishiraj Singh

Testing
Saurabh Sharma
Complete
342
343
Neel Shah


Rishiraj Singh

344
Final Report
Saurabh Sharma
345
Complete

Neel Shah


Rishiraj Singh

346
Final Presentation
Saurabh Sharma
347
Complete

Neel Shah


Rishiraj Singh

348
4.3 Development Methodology
349
This section describes the project as per the various stages of the Software Development life cycle. The model of software development life cycle used in this project is the waterfall method. The Waterfall Method is comprised of a series of very definite phases, as shown below in figure 4.7, each one run intended to be started sequentially only after the last has been completed, with one or more tangible deliverables produced at the end of each phase of the waterfall method of SDLC. Essentially, it starts with a heavy, documented, requirements planning phase that outlines all the requirements for the project, followed by sequential phases of design, coding, test-casing, optional documentation, verification (alpha-testing), validation (beta-testing), and finally deployment/release.
350
351
352
353
Figure 4.4 Waterfall Model
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
       CHAPTER 5 SYSTEM REQUIREMENTS
376
377
378
5. System Requirements 
379
    The motto of this chapter is to identify the platform needed to run the proposed system. Team will
380
Study the hardware as well as software requirements that will help in order to develop the system.
381
382
5.1 Hardware Requirements
383
Processor: Intel (r) Core(TM) i3-7100U 
384
Main Memory (RAM): 8 GB Cache Memory:   8 MB
385
Monitor:    13.3" Color Monitor
386
Keyboard:   108 keys
387
Mouse:  Optical Mouse
388
Hard Disk:  32GB or more
389
System Requirements: 64-bit OS, x64-based processor
390
391
392
393
5.1 Software Requirements
394
395
Front End/Language: html, bootstrap
396
Back End/Database:  Python3, Flask
397
Platform:   Google Colab
398
Operating System:   Windows 7/Windows 8/ Windows 10
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
CHAPTER 6 TESTING
424
425
6. Testing
426
This chapter gives information about the test results
427
6.1 Test Approach
428
Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing also provides an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation.
429
6.1.1 Black box testing
430
In black box testing we test the system at random for some random functionalities and depending on the output that we get we come to the conclusion that whether the system we have built is right or wrong. Internal system design is not considered in this type of testing. Tests are based on requirements and functionality. The number of modules and number of java files required for each module is checked.
431
6.1.2 White box testing
432
This testing is based on knowledge of the internal logic of an application code. Also known as Glass box Testing. Internal software and code working should be known for this type of testing. Tests are based on coverage of code statements, branches, paths, conditions. All the modules are tested for their logic whether it functions properly or not. Code is checked by inserting different inputs to check its functionality.
433
6.1.3 Unit testing
434
Testing of individual software components or modules. Each module was runs separately to check the output. Unit testing focuses first on the modules, independently of one another, to locate errors. This enables the tester to detect errors in coding and logical errors that is contained within that module alone. Those resulting from the interaction between modules are initially avoided. Here we test each module individually and integrate the overall system. Unit testing focuses verification efforts even in the smallest unit of software design in each module.
435
6.1.4 Integration testing
436
Integration testing is the testing process in software testing to verify that when two or more modules are interact and produced result satisfies with its original functional requirement or not. Integrated testing will start after completion of unit testing.
437
438
6.1.5 User Acceptance Testing
439
User acceptance testing of the system is the key factor for the success of any system. A system under consideration is tested for user acceptance by constantly keeping in touch with the prospective system at the time of development and making change whenever required. This is done with regard to the input screen design and output screen design. Here we will test whether the proposed system is having well defined UI so that the citizens can interface the application more easily.
440
6.1.6 Functional Testing
441
Functional testing is a technique in which all the functionalities of the program are tested to check whether all the functions that were proposed during the planning phase is full filled. This is also to check that if all the functions proposed are working properly. This is further done in two phases One before the integration to see if all the unit components work properly. Second to see if they still work properly after they have been integrated to check if some functional compatibility issues arise.
442
6.2 Test Cases
443
A test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a software testing objective. In this project, our test cases are listed below in the table.
444
445
                Fig 6.1- Test case for model 1 
446
SR NO
INPUTS
EXPECTED OUTPUT
OBSERVED OUTPUT
1.
Dataset row number = 51
0
0
2.
Dataset row number = 657
0
1
3. 
Dataset row number = 709
1
1
4.
Dataset row number = 719
1
0
5.
Missing values
Error
Error
447
                    Fig 6.2 - Test case for model 2
448
SR NO
INPUTS
EXPECTED OUTPUT
OBSERVED OUTPUT
1.
Dataset row no = 1
1
0
2.
Dataset row no = 4
0
0
3. 
Dataset row no = 16
1
1
4.
Dataset row no = 34
0
1
5.
Missing values
Error
Error
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
    CHAPTER 7 RESULT ANALYSIS
472
473
474
475
476
7. Result Analysis
477
In this chapter the obtained result are analyzed and comparison between different algorithms are done on the basis of few parameters and data is visualized.
478
479
7.1 Evaluation Parameters
480
481
482
1. Accuracy 
483
484
The accuracy of a machine learning classification algorithm is one way to measure how often the algorithm classifies a data point correctly. Accuracy is the number of correctly predicted data points out of all the data points.
485
486
2. Precision 
487
488
 Precision, or the positive predictive value, refers to the fraction of relevant instances among the total retrieved instances. 
489
       Precision = TP / (TP + FP) 
490
491
3. Recall
492
493
 Recall, also known as sensitivity, refers to the fraction of relevant instances retrieved over the total amount of relevant instances.
494
        Recall = TP / (TP + FN)
495
496
497
498
499
4. F1 Score
500
501
The F score, also called the F1 score or F measure, is a measure of a test's accuracy. The F score is defined as the weighted harmonic mean of the test's precision and recall. F1 Score is calculated as,
502
       
503
504
505
506
7.2 Result
507
508
509
510
511
512
Table 7.2.1  Comparison of performance of Machine learning algorithms for model 1
513
514
515
516
Sr no
Algorithm used
Accuracy
Precision
Recall
F1 Score
1
SVM
0.9768
0.99
0.96
0.97
2
Naïve Bayes
0.9259
0.94
0.91
0.92
3
Decision Tree
0.9898
0.96
0.95
0.96
4
KNN
0.9305
1.0
0.86
0.92
517
    
518
519
520
521
Figure 7.2.1-  Scatter plotfor brca1, brca2
522
523
Figure 7.2.2- Scatter plot for brca2, tp53
524
525
526
527
528
529
530
Figure 7.2.3- Scatter plot for tp53 and brca1
531
532
533
534
535
Figure 7.2.4- Line chart for 50 rows
536
537
538
539
Figure 7.2.5- Histogram for brca1
540
541
542
543
544
545
Figure 7.2.6- Histogram for brca2
546
547
548
549
Figure 7.2.7- Histogram for tp53
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
 CHAPTER 8 CONCLUSION
575
576
577
8.1 Conclusion 
578
579
From the above study, it is clear that the cancer prognosis is possible in most cases using machine learning on high dimensional genomic data. Conventional cancer prediction models don't accurately predict cancer at an early stage. By using genomic data this void can be filled as it helps in early prediction.  The microarray gene expression represents the mutation of genes, so if such genes are mutated then chances of tumour growing increases and eventually causing cancer. Thus, due to such microarray gene expression early prediction of cancer is feasible.
580
581
582
8.2 Future Scope
583
584
In this Application, four machine learning models for prediction of cancer were implemented. However, this is a partial system. For early prediction of cancer, more dimensions of the individual sample may be required. These dimensions can be the lifestyle of the individual, hereditary etc. Acquisition of such dimensional datasets and combining it with gene expression will be the future task and based on such datasets machine learning models can be built.
585
586
587
BIBLOGRAPHY
588
Journal Paper
589
590
591
1. Jishnu Das, Kaitlyn M Gayvert, and Haiyuan Yu "Predicting Cancer Prognosis Using Functional Genomics Data Sets" Published online 2014 Nov 2. doi: 10.4137/CIN.S14064 PMCID: PMC4218897 PMID: 25392695
592
593
1. Cai Huang, Evan A. Clayton, Lilya V. Matyunina, L. DeEtte McDonald, Benedict B. Benigno,FredrikVannberg, and John F. McDonald, " Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy" Published online 2018 Nov 6. doi: 10.1038/s41598-018-34753-5
594
595
1. Konstantina Kouroua Themis, P .Exarchosab Konstantinos,  P. Exarchosa Michalis V .Karamouzisc Dimitrios, I .Fotiadisab " Machine learning applications in cancer prognosis and prediction " Published online  doi.org/10.1016/j.csbj.2014.11.005 15 November 2014.
596
597
1. Chaima Aouiche, Bolin Chen, and Xuequn Shang "Predicting stage-specific cancer related genes and their dynamic modules by integrating multiple datasets" BMC Bioinformatics. 2019; 20(Suppl 7): 194. Published online 2019 May 1. doi: 10.1186/s12859-019-2740-6 PMCID: PMC6509867 PMID: 31074385
598
599
1. Qianfan Wu, Adel Boueiz,and Weiliang Qiu " Deep Learning Methods for Predicting Disease Status Using Genomic Data" Published online 2018 Dec 11 PMCID: PMC6530791 NIHMSID: NIHMS1024586 PMID: 31131151
600
601
1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542: 115-118. doi: 10.1038/nature21056 [PubMed] 
602
603
1. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vision. 2015;115: 211-252. 
604
605
1. Hsu C-W, Chang C-C, Lin C-J. A practical guide to support vector classification, Technical Report Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan, 2003
606
607
1. Gu Deshpande and Ramji Rai An Overview of Prognostics Markers in Breast Cancer Med J Armed Forces India. 1999 Apr; 55(2): 129-132. Published online 2017 Jun 26. doi: 10.1016/S0377-1237(17)30268-X PMCID: PMC5531823 PMID: 28775603
608
609
1. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23: 2507-2517. doi: 10.1093/bioinformatics/btm344 [PubMed]
610
611
1. Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18: 9 doi: 10.1186/s12859-016-1423-9 [PMC free article][PubMed]
612
613
1. Gaul DA, Mezencev R, Long TQ, Jones CM, Benigno BB, Gray A, et al. Highly-accurate metabolomic detection of early-stage ovarian cancer. Sci Reports. 2015;5: 16351. [PMC free article] [PubMed] 
614
615
1. Guan W, Zhou M, Hampton CY, Benigno BB, Walker LD, Gray A, et al. Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics. 2009;10: 259-274. doi: 10.1186/1471-2105-10-259 [PMC free article] [PubMed]
616
617
1. Hoadley KA, Yau C, Wolf DM, Cherniack AD, Tamborero D, Ng S. et al.Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014;158: 929-44. doi: 10.1016/j.cell.2014.06.049[PMC free article] [PubMed]
618
619
1. Azuaje F. Computational models for predicting drug responses in cancer research. Brief Bioinform. 2016; pii: bbw065 (Epub ahead of print). [PMC free article][PubMed] 
620
621
1. Salesse S, Verfaillie CM. BCR/ABL: from molecular mechanisms of leukemia induction to treatment of chronic myelogenous leukemia. Oncogene. 2002;21: 8547-59. doi: 10.1038/sj.onc.1206082 [PubMed]
622
623
1. Wenming Zhao, Yiming Bao, Shunmin He, Guoqing Zhang et al.(2020) "Database resource of the national genomics data center"
624
625
1. Xie, Haozhe; Li, Jie; Jatkoe, Tim; Hatzis, Christos (2017), "Gene Expression Profiles of Breast Cancer", Mendeley Data, v1
626
627
1. National Center for Biotechnology Information. Accessed on: Feb 13, 2020. Available: https://www.ncbi.nlm.nih.gov/guide/genes-expression
628
629
Breastcancer.org. Accesses on:  Feb 13, 2020. Available: https://www.breastcancer.org/risk/factors/genetics.
630
631
632
633
634
Websites
635
636
637
Breastcancer.org. Accesses on:  Feb 13, 2020. Available: https://www.breastcancer.org/risk/factors/genetics
638
639
640
641
PUBLICATIONS & CERTIFICATES
642
643
1. "Abstractive text summarization using artificial intelligence", 2nd International Conference on Advances in Science & Technology (ICAST 2019) SSRN, Elsevier - Abstract id - 3370795.
644
2. Participated and won the National Level Project Competition KJSIEIT - INTECH '19
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
        
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707