TY - GEN
T1 - Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets
AU - Padmaja, S.
AU - Nikitha, M.
AU - Bandu, Sasidhar
AU - Sameen Fatima, S.
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2021
Y1 - 2021
N2 - Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.
AB - Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.
KW - Code-mixed tweets
KW - Natural language processing
KW - Sentiment extraction
UR - https://www.scopus.com/pages/publications/85112710592
U2 - 10.1007/978-981-16-1502-3_48
DO - 10.1007/978-981-16-1502-3_48
M3 - Conference contribution
AN - SCOPUS:85112710592
SN - 9789811615016
T3 - Smart Innovation, Systems and Technologies
SP - 487
EP - 493
BT - Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics
A2 - Satapathy, Suresh Chandra
A2 - Bhateja, Vikrant
A2 - Favorskaya, Margarita N.
A2 - Adilakshmi, T.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 4th International Conference on Smart Computing and Informatics, SCI 2020
Y2 - 9 October 2020 through 10 October 2020
ER -