Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets

S. Padmaja; M. Nikitha; Sasidhar Bandu; S. Sameen Fatima

doi:10.1007/978-981-16-1502-3_48

Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets

S. Padmaja
, M. Nikitha
, Sasidhar Bandu
, S. Sameen Fatima

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.

Original language	English
Title of host publication	Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics
Editors	Suresh Chandra Satapathy, Vikrant Bhateja, Margarita N. Favorskaya, T. Adilakshmi
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	487-493
Number of pages	7
ISBN (Print)	9789811615016
DOIs	https://doi.org/10.1007/978-981-16-1502-3_48
State	Published - 2021
Externally published	Yes
Event	4th International Conference on Smart Computing and Informatics, SCI 2020 - Hyderabad, India Duration: 9 Oct 2020 → 10 Oct 2020

Publication series

Name	Smart Innovation, Systems and Technologies
Volume	224
ISSN (Print)	2190-3018
ISSN (Electronic)	2190-3026

Conference

Conference	4th International Conference on Smart Computing and Informatics, SCI 2020
Country/Territory	India
City	Hyderabad
Period	9/10/20 → 10/10/20

Keywords

Code-mixed tweets
Natural language processing
Sentiment extraction

Access to Document

10.1007/978-981-16-1502-3_48

Cite this

Padmaja, S., Nikitha, M., Bandu, S., & Sameen Fatima, S. (2021). Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets. In S. C. Satapathy, V. Bhateja, M. N. Favorskaya, & T. Adilakshmi (Eds.), Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics (pp. 487-493). (Smart Innovation, Systems and Technologies; Vol. 224). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-1502-3_48

Padmaja, S. ; Nikitha, M. ; Bandu, Sasidhar et al. / Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets. Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics. editor / Suresh Chandra Satapathy ; Vikrant Bhateja ; Margarita N. Favorskaya ; T. Adilakshmi. Springer Science and Business Media Deutschland GmbH, 2021. pp. 487-493 (Smart Innovation, Systems and Technologies).

@inproceedings{67fe2dd3fe9a4b6e91036d83ff3b1607,

title = "Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets",

abstract = "Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.",

keywords = "Code-mixed tweets, Natural language processing, Sentiment extraction",

author = "S. Padmaja and M. Nikitha and Sasidhar Bandu and \{Sameen Fatima\}, S.",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 4th International Conference on Smart Computing and Informatics, SCI 2020 ; Conference date: 09-10-2020 Through 10-10-2020",

year = "2021",

doi = "10.1007/978-981-16-1502-3\_48",

language = "English",

isbn = "9789811615016",

series = "Smart Innovation, Systems and Technologies",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "487--493",

editor = "Satapathy, \{Suresh Chandra\} and Vikrant Bhateja and Favorskaya, \{Margarita N.\} and T. Adilakshmi",

booktitle = "Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics",

address = "Germany",

}

Padmaja, S, Nikitha, M, Bandu, S & Sameen Fatima, S 2021, Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets. in SC Satapathy, V Bhateja, MN Favorskaya & T Adilakshmi (eds), Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics. Smart Innovation, Systems and Technologies, vol. 224, Springer Science and Business Media Deutschland GmbH, pp. 487-493, 4th International Conference on Smart Computing and Informatics, SCI 2020, Hyderabad, India, 9/10/20. https://doi.org/10.1007/978-981-16-1502-3_48

Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets. / Padmaja, S.; Nikitha, M.; Bandu, Sasidhar et al.
Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics. ed. / Suresh Chandra Satapathy; Vikrant Bhateja; Margarita N. Favorskaya; T. Adilakshmi. Springer Science and Business Media Deutschland GmbH, 2021. p. 487-493 (Smart Innovation, Systems and Technologies; Vol. 224).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets

AU - Padmaja, S.

AU - Nikitha, M.

AU - Bandu, Sasidhar

AU - Sameen Fatima, S.

PY - 2021

Y1 - 2021

N2 - Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.

AB - Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.

KW - Code-mixed tweets

KW - Natural language processing

KW - Sentiment extraction

UR - https://www.scopus.com/pages/publications/85112710592

U2 - 10.1007/978-981-16-1502-3_48

DO - 10.1007/978-981-16-1502-3_48

M3 - Conference contribution

AN - SCOPUS:85112710592

SN - 9789811615016

T3 - Smart Innovation, Systems and Technologies

SP - 487

EP - 493

BT - Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics

A2 - Satapathy, Suresh Chandra

A2 - Bhateja, Vikrant

A2 - Favorskaya, Margarita N.

A2 - Adilakshmi, T.

PB - Springer Science and Business Media Deutschland GmbH

T2 - 4th International Conference on Smart Computing and Informatics, SCI 2020

Y2 - 9 October 2020 through 10 October 2020

ER -

Padmaja S, Nikitha M, Bandu S, Sameen Fatima S. Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets. In Satapathy SC, Bhateja V, Favorskaya MN, Adilakshmi T, editors, Smart Computing Techniques and Applications - Proceedings of the 4th International Conference on Smart Computing and Informatics. Springer Science and Business Media Deutschland GmbH. 2021. p. 487-493. (Smart Innovation, Systems and Technologies). doi: 10.1007/978-981-16-1502-3_48

Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this