Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature

Saeed N. Asiri

doi:10.1055/s-0045-1809617

Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature

Saeed N. Asiri

Preventive Dental Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Objectives Artificial intelligence (AI)-based solutions offer potential remedies to the issues encountered in conventional reference identification methods. However, the effectiveness of these AI models in assisting orthodontic experts in discovering relevant material is unknown. The purpose of this study was to assess the validity of ChatGPT and Google Gemini in delivering references for orthodontic literature studies. Materials and Methods This study utilized ChatGPT models (3.5 and 4) and Gemini to search for topics in orthodontics and specific subdomains. To verify the existence and precision of the cited references, several reputable sources were employed, including PubMed, Google Scholar, and Web of Science. Statistical Analysis Descriptive statistics were employed to present the data numerically and as percentages, focusing on three aspects: completeness, accuracy, and fabrication. Reliability analysis was conducted using Cronbach’s α and the results were visually presented in the form of the correlation heat map. Results Out of all references, only 15.76% were correct, whereas 71.92% were fake or fabricated references and 12.32% were inaccurate references. Gemini had the significantly highest proportion of correct references (36.36%), followed by GPT 3.5 (15.76%) and GPT 4 (0.95%) (p-value < 0.01). The reliability score of 0.418 indicate low-to-moderate consistency in the accuracy of the references. Conclusion While Gemini showed better performance than GPT models, significant limitation remains in all three models in reference generations. These findings advocate for balanced and cautious use of AI tools in academic research related to orthodontics, emphasizing human validation of the references and training of dental professionals and researchers in efficient use of AI tools.

Original language	English
Journal	European Journal of General Dentistry
DOIs	https://doi.org/10.1055/s-0045-1809617
State	Accepted/In press - 2025

Keywords

artificial intelligence
chatbot
literature
orthodontics
references

Access to Document

10.1055/s-0045-1809617

Cite this

@article{8fd97f5f27a74493a8195366dce8eedb,

title = "Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature",

abstract = "Objectives Artificial intelligence (AI)-based solutions offer potential remedies to the issues encountered in conventional reference identification methods. However, the effectiveness of these AI models in assisting orthodontic experts in discovering relevant material is unknown. The purpose of this study was to assess the validity of ChatGPT and Google Gemini in delivering references for orthodontic literature studies. Materials and Methods This study utilized ChatGPT models (3.5 and 4) and Gemini to search for topics in orthodontics and specific subdomains. To verify the existence and precision of the cited references, several reputable sources were employed, including PubMed, Google Scholar, and Web of Science. Statistical Analysis Descriptive statistics were employed to present the data numerically and as percentages, focusing on three aspects: completeness, accuracy, and fabrication. Reliability analysis was conducted using Cronbach{\textquoteright}s α and the results were visually presented in the form of the correlation heat map. Results Out of all references, only 15.76\% were correct, whereas 71.92\% were fake or fabricated references and 12.32\% were inaccurate references. Gemini had the significantly highest proportion of correct references (36.36\%), followed by GPT 3.5 (15.76\%) and GPT 4 (0.95\%) (p-value < 0.01). The reliability score of 0.418 indicate low-to-moderate consistency in the accuracy of the references. Conclusion While Gemini showed better performance than GPT models, significant limitation remains in all three models in reference generations. These findings advocate for balanced and cautious use of AI tools in academic research related to orthodontics, emphasizing human validation of the references and training of dental professionals and researchers in efficient use of AI tools.",

keywords = "artificial intelligence, chatbot, literature, orthodontics, references",

author = "Asiri, \{Saeed N.\}",

note = "Publisher Copyright: {\textcopyright} 2025. The Author(s).",

year = "2025",

doi = "10.1055/s-0045-1809617",

language = "English",

journal = "European Journal of General Dentistry",

issn = "2278-9626",

publisher = "Georg Thieme Verlag",

}

TY - JOUR

T1 - Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature

AU - Asiri, Saeed N.

PY - 2025

Y1 - 2025

N2 - Objectives Artificial intelligence (AI)-based solutions offer potential remedies to the issues encountered in conventional reference identification methods. However, the effectiveness of these AI models in assisting orthodontic experts in discovering relevant material is unknown. The purpose of this study was to assess the validity of ChatGPT and Google Gemini in delivering references for orthodontic literature studies. Materials and Methods This study utilized ChatGPT models (3.5 and 4) and Gemini to search for topics in orthodontics and specific subdomains. To verify the existence and precision of the cited references, several reputable sources were employed, including PubMed, Google Scholar, and Web of Science. Statistical Analysis Descriptive statistics were employed to present the data numerically and as percentages, focusing on three aspects: completeness, accuracy, and fabrication. Reliability analysis was conducted using Cronbach’s α and the results were visually presented in the form of the correlation heat map. Results Out of all references, only 15.76% were correct, whereas 71.92% were fake or fabricated references and 12.32% were inaccurate references. Gemini had the significantly highest proportion of correct references (36.36%), followed by GPT 3.5 (15.76%) and GPT 4 (0.95%) (p-value < 0.01). The reliability score of 0.418 indicate low-to-moderate consistency in the accuracy of the references. Conclusion While Gemini showed better performance than GPT models, significant limitation remains in all three models in reference generations. These findings advocate for balanced and cautious use of AI tools in academic research related to orthodontics, emphasizing human validation of the references and training of dental professionals and researchers in efficient use of AI tools.

AB - Objectives Artificial intelligence (AI)-based solutions offer potential remedies to the issues encountered in conventional reference identification methods. However, the effectiveness of these AI models in assisting orthodontic experts in discovering relevant material is unknown. The purpose of this study was to assess the validity of ChatGPT and Google Gemini in delivering references for orthodontic literature studies. Materials and Methods This study utilized ChatGPT models (3.5 and 4) and Gemini to search for topics in orthodontics and specific subdomains. To verify the existence and precision of the cited references, several reputable sources were employed, including PubMed, Google Scholar, and Web of Science. Statistical Analysis Descriptive statistics were employed to present the data numerically and as percentages, focusing on three aspects: completeness, accuracy, and fabrication. Reliability analysis was conducted using Cronbach’s α and the results were visually presented in the form of the correlation heat map. Results Out of all references, only 15.76% were correct, whereas 71.92% were fake or fabricated references and 12.32% were inaccurate references. Gemini had the significantly highest proportion of correct references (36.36%), followed by GPT 3.5 (15.76%) and GPT 4 (0.95%) (p-value < 0.01). The reliability score of 0.418 indicate low-to-moderate consistency in the accuracy of the references. Conclusion While Gemini showed better performance than GPT models, significant limitation remains in all three models in reference generations. These findings advocate for balanced and cautious use of AI tools in academic research related to orthodontics, emphasizing human validation of the references and training of dental professionals and researchers in efficient use of AI tools.

KW - artificial intelligence

KW - chatbot

KW - literature

KW - orthodontics

KW - references

UR - http://www.scopus.com/inward/record.url?scp=105012980627&partnerID=8YFLogxK

U2 - 10.1055/s-0045-1809617

DO - 10.1055/s-0045-1809617

M3 - Article

AN - SCOPUS:105012980627

SN - 2278-9626

JO - European Journal of General Dentistry

JF - European Journal of General Dentistry

ER -

Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this