One Million Posts Corpus
The “One Million Posts” corpus is an annotated data set consisting of user comments posted to an Austrian newspaper website (in German language). The dataset comprises approx. one million posts approx. 11K of which are manually annotated with the following categories: sentiment (negative/neutral/positive), off-Topic (yes/no), inappropriate (yes/no), discriminating (yes/no), feedback to the article author (yes/no), user personal stories (yes/no), arguments used (yes/no).
Publications
-
Dietmar Schabus, Marcin Skowron, Martin Trapp. "One Million Posts: A Data Set of German Online Discussions." In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 1241-1244. Tokyo, Japan, August 2017. DOI: 10.1145/3077136.3080711. [Preprint]
-
Dietmar Schabus and Marcin Skowron. "Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website." In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pp. 1602-1605. Miyazaki, Japan, May 2018.
Authors
- Dietmar Schabus
- Marcin Skowron
- Martin Trapp
Licence
Sponsor
- Version
1.0.0 - Release date
01 August 2017 - Language
German - Modality
Text - Licence
CC BY-SA-NC 4.0 - Contact
Marcin Skowron