TY - JOUR
T1 - Splice site recognition - deciphering Exon-Intron transitions for genetic insights using Enhanced integrated Block-Level gated LSTM model
AU - Sha, Mohemmed
AU - Parveen Rahamathulla, Mohamudha
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/7/15
Y1 - 2024/7/15
N2 - Bioinformatics is a contemporary interdisciplinary area focused on analyzing the growing number of genome sequences. Gene variants are differences in DNA sequences among individuals within a population. Splice site recognition is a crucial step in the process of gene expression, where the coding sequences of genes are joined together to form mature messenger RNA (mRNA). These genetic variants that disrupt genes are believed to be the primary reason for neuro-developmental disorders like ASD (Autism Spectrum Disorder) is a neuro-developmental disorder that is diagnosed in individuals, families, and society and occurs as the developmental delay in one among the hundred genes that are associated with these disorders. Missense variants, premature stop codons, or deletions alter both the quality and quantity of encoded proteins. Predicting genes within exons and introns presents main challenges, such as dealing with sequencing errors, short reads, incomplete genes, overlapping, and more. Although many traditional techniques have been utilized in creating an exon prediction system, the primary challenge lies in accurately identifying the length and spliced strand location classification of exons in conjunction with introns. From now on, the suggested approach utilizes a Deep Learning algorithm to analyze intricate and extensive genomic datasets. M−LSTM is utilized to categorize three binary combinations (EI as 1, IE as 2, and none as 3) using spliced DNA strands. The M−LSTM system is able to sequence extensive datasets, ensuring that long information can be stored without any impact on the current input or output. This enables it to recognize and address long-term connections and problems with rapidly increasing gradients. The proposed model is compared internally with Naïve Bayes and Random Forest to assess its efficacy. Additionally, the proposed model's performance is forecasted by utilizing probabilistic parameters like recall, F1-score, precision, and accuracy to assess the effectiveness of the proposed system.
AB - Bioinformatics is a contemporary interdisciplinary area focused on analyzing the growing number of genome sequences. Gene variants are differences in DNA sequences among individuals within a population. Splice site recognition is a crucial step in the process of gene expression, where the coding sequences of genes are joined together to form mature messenger RNA (mRNA). These genetic variants that disrupt genes are believed to be the primary reason for neuro-developmental disorders like ASD (Autism Spectrum Disorder) is a neuro-developmental disorder that is diagnosed in individuals, families, and society and occurs as the developmental delay in one among the hundred genes that are associated with these disorders. Missense variants, premature stop codons, or deletions alter both the quality and quantity of encoded proteins. Predicting genes within exons and introns presents main challenges, such as dealing with sequencing errors, short reads, incomplete genes, overlapping, and more. Although many traditional techniques have been utilized in creating an exon prediction system, the primary challenge lies in accurately identifying the length and spliced strand location classification of exons in conjunction with introns. From now on, the suggested approach utilizes a Deep Learning algorithm to analyze intricate and extensive genomic datasets. M−LSTM is utilized to categorize three binary combinations (EI as 1, IE as 2, and none as 3) using spliced DNA strands. The M−LSTM system is able to sequence extensive datasets, ensuring that long information can be stored without any impact on the current input or output. This enables it to recognize and address long-term connections and problems with rapidly increasing gradients. The proposed model is compared internally with Naïve Bayes and Random Forest to assess its efficacy. Additionally, the proposed model's performance is forecasted by utilizing probabilistic parameters like recall, F1-score, precision, and accuracy to assess the effectiveness of the proposed system.
KW - Block gate
KW - Deep learning
KW - Divergent gate
KW - Exon
KW - Genome
KW - Intron
KW - LSTM
KW - Merge gate
KW - Mutations
KW - Splicing
UR - http://www.scopus.com/inward/record.url?scp=85189890600&partnerID=8YFLogxK
U2 - 10.1016/j.gene.2024.148429
DO - 10.1016/j.gene.2024.148429
M3 - Article
C2 - 38575098
AN - SCOPUS:85189890600
SN - 0378-1119
VL - 915
JO - Gene
JF - Gene
M1 - 148429
ER -