The Detection of Online Textual Hate Speech
Hate speech is a pervasive societal problem with real-life consequences, especially when spread on social media platforms. Its detection has been attempted by both manual and automated methods. Manual methods involve cyber-security personnel filtering through large quantities of text to detect and flag or delete the aberrant ones. This method is not scalable due to the pervasiveness of the internet. On the other hand, automated methods involve using hand-crafted rules rooted in language syntax or machine learning techniques. The hand-crafted rules are incapable of accounting for the nuances of hate speech and thus are not feasible. Machine learning techniques have propelled the detection task by providing speed and some increase in accuracy.
Albeit this advancement from machine learning techniques, most of which have framed the detection task as a text classification task, there exists a need for an improvement in the data and the features used to build these models. Also, there is a need to rethink and reframe what the models detect in each sentence; the presence of hate or its target.
This thesis aims to improve the detection of hate speech on social media platforms by addressing the challenges with the data, constructing new informative features and providing better explainability by detecting which protected category a hateful instance is attacking. This thesis develops a novel token replacement-based data augmentation method by proposing new sources of synonyms to provide better substitute word options and novel methods of selecting the best-substituted words in a sentence to maintain the original meaning. It also investigates the effectof homogeneity and heterogeneity in the replacement and classification embedding setting. The results show that the new sources of synonyms and the use of a heuristic to select best-substituted words advanced the performance. Further advancement is achieved when the replacement and classification embedding are kept homogeneous.
This thesis constructs new syntactic features that preserve the sentence’s contextual information, thereby improving the detection of closely related offensive language and hate speech. The proposed methods show improvement in the detection task and strongly convey the importance of discerning the context surrounding a sentence. This context can change a sentence from hateful to merely offensive.
This thesis, continuing to include ethical and platform-independent contextual information into the features, explores the construction of semantic features by harnessing word embeddings and emotional information. The results show the ability of the proposed methods to construct enriching features that improve the detection of hate speech. It also demonstrates the influence of negative emotions on hate speech.
This thesis also focused on detecting the target a hateful sentence is attacking instead of a mere hate speech/non-hate speech detection. This reduction in granularity promises to include more explainability in the detection paradigm. This method also presents the detection problem as semi-supervised, allowing the use of both labelled and unlabelled data.The results show the ability of the proposed methods to effectively detect which protected category the hateful sentence is targeting.
To sum up, the thesis develops multiple approaches that improve the effectiveness of detecting hate speech. The proposed methods are evaluated on publicly available English language hate speech datasets. When incorporated into hate speech detection systems deployed on social media platforms, these methods can help safely curb the proliferation of hateful and harmful messages.