TY - JOUR
T1 - Extending ImageNet to Arabic using Arabic WordNet
AU - Alsudais, Abdulkareem
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/3
Y1 - 2022/3
N2 - This paper investigates the extension of ImageNet and its millions of English-labeled images to Arabic using Arabic WordNet. The primary finding is the identification of Arabic synsets for 1219 of the 21,841 synsets used in ImageNet, which represents 1.1 million images. By leveraging the parent-child structure of synsets in ImageNet, this dataset is extended to 10,462 synsets (and 7.1 million images) that have an Arabic label, which is either a match or a direct hypernym, and to 17,438 synsets (and 11 million images) when a hypernym of a hypernym is included. Samples evaluated suggest that generating Arabic labels for images in ImageNet using hypernyms does indeed produce meaningful results. The precision values for seven evaluated samples exceeded 90%. Moreover, when all the images in the samples were combined, the precision value equaled 93%. For the entire ImageNet, when all hypernyms for a node are considered, an Arabic synset is found for all but four synsets. This represents the major contribution of this work: a dataset of 14,195,756 images that have Arabic labels. The resulting dataset presents Arabic labels for 99.9% of the images in ImageNet.
AB - This paper investigates the extension of ImageNet and its millions of English-labeled images to Arabic using Arabic WordNet. The primary finding is the identification of Arabic synsets for 1219 of the 21,841 synsets used in ImageNet, which represents 1.1 million images. By leveraging the parent-child structure of synsets in ImageNet, this dataset is extended to 10,462 synsets (and 7.1 million images) that have an Arabic label, which is either a match or a direct hypernym, and to 17,438 synsets (and 11 million images) when a hypernym of a hypernym is included. Samples evaluated suggest that generating Arabic labels for images in ImageNet using hypernyms does indeed produce meaningful results. The precision values for seven evaluated samples exceeded 90%. Moreover, when all the images in the samples were combined, the precision value equaled 93%. For the entire ImageNet, when all hypernyms for a node are considered, an Arabic synset is found for all but four synsets. This represents the major contribution of this work: a dataset of 14,195,756 images that have Arabic labels. The resulting dataset presents Arabic labels for 99.9% of the images in ImageNet.
KW - Arabic computer vision
KW - Arabic WordNet
KW - Computer vision
KW - ImageNet
KW - Language and computer vision
KW - Linked data
UR - http://www.scopus.com/inward/record.url?scp=85124346517&partnerID=8YFLogxK
U2 - 10.1007/s11042-022-11981-6
DO - 10.1007/s11042-022-11981-6
M3 - Article
AN - SCOPUS:85124346517
SN - 1380-7501
VL - 81
SP - 8835
EP - 8852
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 6
ER -