TY - JOUR
T1 - Analysis for Disease Gene Association Using Machine Learning
AU - Sikandar, Misba
AU - Sohail, Rafia
AU - Saeed, Yousaf
AU - Zeb, Asim
AU - Zareei, Mahdi
AU - Khan, Muhammad Adnan
AU - Khan, Atif
AU - Aldosary, Abdallah
AU - Mohamed, Ehab Mahmoud
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - To recognize the basis of disease, it is essential to determine its underlying genes. Understanding the association between underlying genes and genetic disease is a fundamental problem regarding human health. Identification and association of genes with the disease require time consuming and expensive experimentations of a great number of potential candidate genes. Therefore, the alternative inexpensive and rapid computational methods have been proposed that can identify the candidate gene associated with a disease. Most of these methods use phenotypic similarities due to the fact that genes causing same or similar diseases have less variation in their sequence or network properties of protein-protein interactions based on-premises that genes lie closer in protein interaction network that causes the similar or same disease. However, these methods use only basic network properties or topological features and gene sequence information or biological features as a prior knowledge for identification of gene-disease association, which restricts the identification process to a single gene-disease association. In this study, we propose and analyze some novel computational methods for the identification of genes associated with diseases. Some advance topological and biological features that are overlooked currently are introducing for identifying candidate genes. We evaluate different computational methods on disease-gene association data from DisGeNET in a 10-fold cross-validation mode based on TP rate, FP rate, precision, recall, F-measure, and ROC curve evaluation parameters. The results reveal that various computational methods with advanced feature set outperform previous state-of-the-art techniques by achieving precision up to 93.8%, recall up to 93.1%, and F- measure up to 92.9%. Significantly, we apply our methods to study four major diseases: Thalassemia, Diabetes, Malaria, and Asthma. Simulation results show that the proposed Deep Extreme Learning Machine (DELM) gives more accurate results as compared to previously published approaches.
AB - To recognize the basis of disease, it is essential to determine its underlying genes. Understanding the association between underlying genes and genetic disease is a fundamental problem regarding human health. Identification and association of genes with the disease require time consuming and expensive experimentations of a great number of potential candidate genes. Therefore, the alternative inexpensive and rapid computational methods have been proposed that can identify the candidate gene associated with a disease. Most of these methods use phenotypic similarities due to the fact that genes causing same or similar diseases have less variation in their sequence or network properties of protein-protein interactions based on-premises that genes lie closer in protein interaction network that causes the similar or same disease. However, these methods use only basic network properties or topological features and gene sequence information or biological features as a prior knowledge for identification of gene-disease association, which restricts the identification process to a single gene-disease association. In this study, we propose and analyze some novel computational methods for the identification of genes associated with diseases. Some advance topological and biological features that are overlooked currently are introducing for identifying candidate genes. We evaluate different computational methods on disease-gene association data from DisGeNET in a 10-fold cross-validation mode based on TP rate, FP rate, precision, recall, F-measure, and ROC curve evaluation parameters. The results reveal that various computational methods with advanced feature set outperform previous state-of-the-art techniques by achieving precision up to 93.8%, recall up to 93.1%, and F- measure up to 92.9%. Significantly, we apply our methods to study four major diseases: Thalassemia, Diabetes, Malaria, and Asthma. Simulation results show that the proposed Deep Extreme Learning Machine (DELM) gives more accurate results as compared to previously published approaches.
KW - Disease gene association
KW - biological features
KW - computational approaches
KW - electron-ion interaction pseudopotential (EIIP)
KW - privacy
KW - protein-protein interaction network (PPIN)
KW - topological features
UR - http://www.scopus.com/inward/record.url?scp=85091309988&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3020592
DO - 10.1109/ACCESS.2020.3020592
M3 - Article
AN - SCOPUS:85091309988
SN - 2169-3536
VL - 8
SP - 160616
EP - 160626
JO - IEEE Access
JF - IEEE Access
M1 - 9181557
ER -