Switch to unified view

a/README.md b/README.md
1
# AD-Genetics-Prediction
1
# AD-Genetics-Prediction
2
A large-scale phenotype-based AD disease gene prediction
2
A large-scale phenotype-based AD disease gene prediction
3
3
4
## Introduction
4
## Introduction
5
Alzheimer’s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale post-marketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network from FAERS with protein-protein interaction to prioritize the AD risk genes using network-based ranking algorithm.
5
Alzheimer’s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale post-marketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network from FAERS with protein-protein interaction to prioritize the AD risk genes using network-based ranking algorithm.
6
6
7
## Methods
7
8
9
<p align="center">
10
  <img src="./figures/methods.jpg" width="400">
11
</p>
12
13
## Documentation for modules
8
## Documentation for modules
14
1. __*assoc_rules:*__ Provides classes for generate and processing association rules from FAERS using FP-Growth algorithm
9
1. __*assoc_rules:*__ Provides classes for generate and processing association rules from FAERS using FP-Growth algorithm
15
1. __*network:*__ Provides classes for DCN and DCN0PPI construction
10
1. __*network:*__ Provides classes for DCN and DCN0PPI construction
16
1. __*graph_algorithm:*__ Provides classes for graph algorithms used in this project, including random walk with restart (rwr),random graph generation and *do novo* predition of AD risk genes
11
1. __*graph_algorithm:*__ Provides classes for graph algorithms used in this project, including random walk with restart (rwr),random graph generation and *do novo* predition of AD risk genes
17
1. __*util:*__ Provides utility classes used in this project
12
1. __*util:*__ Provides utility classes used in this project
18
13
19
## Results
14
## Results
20
This folder contains four files listed below.
15
This folder contains four files listed below.
21
16
22
1. __*fares_comm_net_lift_final_abbr:*__ DCN network file that contains 1,538 disease nodes and 21,312 edges, which was extracted from FAERS using Association Rule Mining
17
1. __*fares_comm_net_lift_final_abbr:*__ DCN network file that contains 1,538 disease nodes and 21,312 edges, which was extracted from FAERS using Association Rule Mining
23
  
18
  
24
    Format: 
19
    Format: 
25
    dis1_UMLS|dis1_name|dis1_SOC|dis1_SOC_idx|dis2_UMLS|dis2_name|dis2_SOC|dis2_SOC_idx|conf
20
    dis1_UMLS|dis1_name|dis1_SOC|dis1_SOC_idx|dis2_UMLS|dis2_name|dis2_SOC|dis2_SOC_idx|conf
26
21
27
    * *dis1_UMLS:* UMLS code for disorder 1
22
    * *dis1_UMLS:* UMLS code for disorder 1
28
    * *dis1_name:* Name for disorder 1
23
    * *dis1_name:* Name for disorder 1
29
    * *dis1_SOC:* System Organ Class (MedDRA) for disorder 1
24
    * *dis1_SOC:* System Organ Class (MedDRA) for disorder 1
30
    * *dis1_SOC_idx:* SOC index for disorder 1
25
    * *dis1_SOC_idx:* SOC index for disorder 1
31
    * *dis2_UMLS:* UMLS code for disorder 2
26
    * *dis2_UMLS:* UMLS code for disorder 2
32
    * *dis2_name:* Name for disorder 2
27
    * *dis2_name:* Name for disorder 2
33
    * *dis2_SOC:* System Organ Class (MedDRA) for disorder 2
28
    * *dis2_SOC:* System Organ Class (MedDRA) for disorder 2
34
    * *dis2_SOC_idx:* SOC index for disorder 2
29
    * *dis2_SOC_idx:* SOC index for disorder 2
35
    * *conf:* Confidence for disease pair relationship. Since this is an undirected and unweighted graph, value is set to 1.0.
30
    * *conf:* Confidence for disease pair relationship. Since this is an undirected and unweighted graph, value is set to 1.0.
36
31
37
2.  __*DCN_PPI_net.txt:*__ The heterogeneous network file that contains 19,398 nodes (1,538 disease nodes and 17,860 gene nodes) and 1,401,358 edges.
32
2.  __*DCN_PPI_net.txt:*__ The heterogeneous network file that contains 19,398 nodes (1,538 disease nodes and 17,860 gene nodes) and 1,401,358 edges.
38
33
39
    Format: UMLS_ID or gene symbol|UMLS_ID or gene symbol|weight
34
    Format: UMLS_ID or gene symbol|UMLS_ID or gene symbol|weight
40
    
35
    
41
    Note: The edge weigh is set to 1.0 since the network is undirected and unweighted.
36
    Note: The edge weigh is set to 1.0 since the network is undirected and unweighted.
42
37
43
3.  __*disUMLS_name.txt:*__ A disease node mapping file from UMLS ID name to disease concept name
38
3.  __*disUMLS_name.txt:*__ A disease node mapping file from UMLS ID name to disease concept name
44
39
45
    Format: UMLS_ID|disease_name
40
    Format: UMLS_ID|disease_name
46
41
47
4.  __*AD_novel_genes.csv:*__ A file that contains novel AD risk genes we predicted from DCN_PPI network
42
4.  __*AD_novel_genes.csv:*__ A file that contains novel AD risk genes we predicted from DCN_PPI network
48
43
49
    Format: Rank,Gene
44
    Format: Rank,Gene
50
45
51
46
52
47