A large-scale phenotype-based AD disease gene prediction
Alzheimer’s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale post-marketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network from FAERS with protein-protein interaction to prioritize the AD risk genes using network-based ranking algorithm.
This folder contains four files listed below.
fares_comm_net_lift_final_abbr: DCN network file that contains 1,538 disease nodes and 21,312 edges, which was extracted from FAERS using Association Rule Mining
Format:
dis1_UMLS|dis1_name|dis1_SOC|dis1_SOC_idx|dis2_UMLS|dis2_name|dis2_SOC|dis2_SOC_idx|conf
DCN_PPI_net.txt: The heterogeneous network file that contains 19,398 nodes (1,538 disease nodes and 17,860 gene nodes) and 1,401,358 edges.
Format: UMLS_ID or gene symbol|UMLS_ID or gene symbol|weight
Note: The edge weigh is set to 1.0 since the network is undirected and unweighted.
disUMLS_name.txt: A disease node mapping file from UMLS ID name to disease concept name
Format: UMLS_ID|disease_name
AD_novel_genes.csv: A file that contains novel AD risk genes we predicted from DCN_PPI network
Format: Rank,Gene