Microblogging websites such as Twitter have gained popularity as an effective and quick means of expressing opinions, sharing news and promoting information and updates. As a result, data generated on Twitter has become a vital and rich source for tasks such as sentiment mining or newsgathering. However, a significant portion of such data is either biased, untruthful, spam or non-credible in general. Consequently, filtering out non-credible tweets when performing data analyses tasks on Twitter becomes a crucial task. In this work, we present a credibility model for content on Twitter. Unlike previous work that focused on English content or factual tweets, our work analyses the credibility of any tweet type and targets Arabic tweets, a challenging language for NLP in general. We focus on Arabic tweets due to the recent popularity of Twitter in the Arab world and due to the presence of a large portion of non-credible tweets in Arabic. We build a binary credibility classifier that classifies a tweet that belongs to a given topic as either credible or non-credible. Our classifier relies on an exhaustive set of features extracted from both the author of the tweet (user-based) and the tweet itself (content-based). To achieve our objective, we collected 36,155,670 tweets through Twitter streaming API over a period of two weeks and created an index to search our tweet collection. Three topics about the Syrian revolution were retrieved from the collection and given to annotators to annotate. Unlike previous work, we provided annotators with a unique interface that provided a real context for each tweet, such as the author profile and a Web search about the content of the tweet, which we deemed useful for annotators to judge the credibility of a tweet. Overall, 3,393 tweets were annotated for credibility using this interface. Next, we extracted 22 user-based features; such as the expertise of the author on the topic of the tweet, time spacing between her previous tweets, count of followers, etc. In addition, 22 content-based features were extracted including sentiment, count of retweets, count of URLs, etc. Finally, we trained a set of classifiers based on all the features we extracted and using our annotated corpus of tweets as training data. We evaluated our credibility classifiers using a series of carefully designed experiments. Using cross validation on our three different topics and on a combined dataset that contains all the tweets from all the topics, our classifiers surpassed the accuracy of a number of baseline approaches by significant margins. We then applied feature reduction and normalization, which resulted in an additional marginal improvement in accuracy. Finally, to test the robustness of our chosen set of features, we evaluated our model using different training and testing sets. Our classifiers continued to consistently surpass the accuracy of the baseline approaches. Furthermore, we analyzed our feature set by comparing the accuracy of the classifier when trained on user-based features only versus content-based features only. Overall, content-based features only generated better accuracies than user-based features only when tested on multiple topics.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error