SRPOL - A lexicon based framework for sentiment strength of Serbian texts

Identifying the semantic orientation or polarity of words is one of the most important topics in sentiment analysis tasks.

In this work, we propose a new lexicon based approach for text polarity detection using sentiment triggers which are adding contextual semantic during the analysis. Lexicon SentiWords.SR is based on the English version lexicon SentiWords containing roughly 155,000 English words associated with a sentiment score in the range [-1, 1]. SentiWords.SR contains ~15,000 words (e.g. lemma & PoS pairs) which are derived upon extensive evaluation of the translated lexicon.

The existing word polarity dictionary in Serbian has been extended containing approximately 15,000 words annotated with polarity strength.

Serbian sentiment framework (SRPOL), relying on the new lexicon and the following sentiment triggers:

Adverb Modifiers

have a purpose to modify the polarity intensity of an upcoming sentiment-laden word, but not to change its orientation.

$$\mathit{"\mathbf{Veoma}\ (\rightarrow MOD=1.2)\ \mathbf{dobar}\ (p=+0.43)\ film..." \xrightarrow[]{1.2 \times (+0.43) } +0.52}$$

Negations

SRPOL considers negations for the upcoming phrase which could include adverb and negation modifiers in addition to the first upcoming standard sentiment-laden word. It identifies negation signals such as the Serbian words ne, ni or nije (eng. 'not') which reverse the score:

$$\mathit{"Film\ \mathbf{nije}\ (\rightarrow\$ NEG)\ \mathbf{zanimljiv}" (p=+0.53) \xrightarrow[\times (-1)]{+0.53 } -0.53}$$

Exclamation Marks

exclamation mark increase the perceived sentiment by an average of 6% for one, and of 18% for the sequence of more than two exclamation marks.

$$\mathit{"Odličan film\mathbf{!}"(\leftarrow\times 1.06)} \xrightarrow[\times 1.06]{p=+0.57}+0.60$$

Elongated Words

Words that contains a repeating character or group of characters more than two times, emphasizing that word has been identified as sentiment trigger

$$\mathit{"Tako\ \mathbf{dooooosadan} (p=-0.24)..." \xrightarrow[1.27]{\times 1.05^{chr(o)} } "Tako\ \mathbf{dosadan}..."(p=-0.30)}$$

Emoticons and Emojis

Sentiment of emojis intensities have been utilized by the Emoji Sentiment Ranking v1.0 lexicon.

$$\mathit{"Divan (p=+0.4)\ film (p=+0.14)}$$

🙂

$$\mathit{(p=+0.678)" \xrightarrow[]{} +0.41}$$

Text Segmentation

The primary goal of splitting text into segments is to help in improving the polarity scoring for the long text with mixed sentiments detected on the containing segments. SRPOL assess the polarity score for each segment (sentence) of a particular text and using a majority rule approach, predicts sentiment score for the given text:

$$P_{text} = \frac{\sum_{i}w_{i} * P_{s}^{i}}{\sum_{i}w_{i}}, i \in [1, S]$$

$$w_{i}=\sum_{m}^{S}|sign(P_{s}^{i}) = sign(P_{s}^{m})|$$

$$P_{s} = \frac{\sum_{i}P_{w}^{i}}{k}, i \in [1, k], k \in [1, S]$$

Further Work

Increase the number of the words in the SentiWords.SR lexicon by using advanced machine learning methods
Evaluate other possible sentiment triggers
Evaluate other segmentation techniques and methods for the final score calculation

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRPOL - A lexicon based framework for sentiment strength of Serbian texts

Adverb Modifiers

Negations

Exclamation Marks

Elongated Words

Emoticons and Emojis

Text Segmentation

Further Work

About

Uh oh!

Releases

Packages

License

milena-sosic/SRPOL

Folders and files

Latest commit

History

Repository files navigation

SRPOL - A lexicon based framework for sentiment strength of Serbian texts

Adverb Modifiers

Negations

Exclamation Marks

Elongated Words

Emoticons and Emojis

Text Segmentation

Further Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages