-
oa Automated essay scoring using structural and grammatical features
-
View Affiliations Hide Affiliations
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2012 Issue 1, Oct 2012, Volume 2012, CSP33
Abstract
Automated essay scoring is a research field which is continuously gaining popularity. Grading essays by hand is expensive and time consuming, automated scoring systems can yield fast, effective and affordable solutions that would make it possible to grade essays and other sophisticated testing tools. This study has been conducted on a dataset of thousands of English essay sets belonging to eight different categories provided by the Hewlett Foundation. Each category corresponds to the same question or problem statement. The score of each essay of the training set is provided in this dataset by human raters. Several features have been determined to predict the final grade. First, the number of occurrences of the 100 most frequent words in English is computed in each essay. Then, the list of average scores associated to each compounding word in the training set is determined. From this list several statistical values are considered as separate feature including the minimum, maximum, mean and median values, variance, skewness and kurtosis. These statistical features are also computed for the list of average scores associated to each compounding bigram (sequence of 2 words). Moreover, each word in the essays has been tagged using the NLTK toolkit into its grammatical role (verb, noun, adverb…etc). The number of occurrences of each grammatical role has also been used as a separate feature. All those features have been combined using different classifiers with random forests generally preferred. This system participated in the Automated Essay Scoring Contest sponsored by the Hewlett Foundation. The results have been evaluated using the quadratic weighted kappa error metric, which measures the agreement between the human rater and the automatic rater. This metric typically varies from 0 (only random agreement) to 1 (complete agreement). This method scored 0.76519 and ranked 13th out of 156 teams: http://www.kaggle.com/c/asap-aes/leaderboard. The proposed system combines structural and grammatical features to automatically grade essays and achieves promising performance. There is ongoing work on the extension of the developed method for short essay scoring as well as grading an unseen category of essays.