Andrew S. Gordon: Story Resource Page

The Story Resource Page

Collecting and analyzing millions of personal stories of everyday life

1. Overview

Over the last several years, my students and I have been developing technologies for collecting, analyzing, and reasoning with millions of personal stories extracted from Internet weblogs.

For an overview of these efforts, see the following papers.

Gordon, A. (2008) Story Management Technologies for Organizational Learning. International Conference on Knowledge Management, Special Track on Intelligent Assistance for Self-Directed and Organizational Learning, Graz, Austria, September 3-5, 2008. pdf
Gordon, A. and Swanson, R. (2008) Envisioning With Weblogs. International Conference on New Media Technology, Special Track on Knowledge Acquisition From the Social Web, Graz, Austria. September 3-5, 2008. pdf

2. Large-scale story corpora

To facilitate the distribution of large-scale story corpora, our group has identified individual blog posts that contain personal stories within existing large-scale corpora of posts. Most recently, we identified nearly one million personal stories in the ICWSM 2009 Spinn3r Blog Dataset, which we call the ICWSM 2009 Story Subset. Information about obtaining the The ICWSM 2009 Spinn3r Blog Dataset is available here. To identify the story subset once you have this dataset, please use one of the following:

Version 1.0 (5/29/09) Initial release
Version 1.1 (6/15/09) Fixed 1-off errors in both index file and java extractor
Version 2.0 (12/21/09) Results using new classifier from Reid Swanson's PhD Dissertation
Version 2.1 (5/26/11) Made some syntax changes to the python extractor to achieve compatibility with Python 2.6+

If you use this ICWSM 2009 Story Subset in your research, please send Andrew Gordon an email, and be sure to cite the following paper:

Gordon, A. and Swanson, R. (2009) Identifying Personal Stories in Millions of Weblog Entries. Third International Conference on Weblogs and Social Media, Data Challenge Workshop, San Jose, CA, May 20, 2009. pdf

3. Reasoning with weblog stories

One of the primary aims of our work is to solve the problem of knowledge acquisition in commonsense reasoning. Below are papers that describe some of our previous attempts to exploit commonsense knowledge that exists within millions of weblog stories.

Gordon, A., Bejan, C., and Sagae, K. (2011) Commonsense Causal Reasoning Using Millions of Personal Stories. Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), August 7–11, 2011, San Francisco, CA. pdf
Gordon, A. (2010) Mining Commonsense Knowledge From Personal Stories in Internet Weblogs. Proceedings of the First Workshop on Automated Knowledge Base Construction, Grenoble, France, May 17-19, 2010. pdf
Gerber, M., Gordon, A., & Sagae, K. (2010) Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR) NAACL 2010 Workshop, Los Angeles, CA, June 6, 2010. pdf

4. Related work

Several groups have pursued related work in automated story extraction from text. Some of the more recent work achieves higher accuracy than our efforts, and may be more promising as a basis for future work.

Joshua Eisenberg and Mark Finlayson (2017) A Simpler and More Generalizable Story Detector using Verb and Character Features. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. link
Joshua D. Eisenberg, W. Victor H. Yarlott, and Mark A. Finlayson (2016) Comparing Extant Story Classifiers: Results & New Directions. Seventh International Workshop on Computational Models of Narrative, Krakow, Poland. link
Betul Ceran, Ravi Karad, Steven Corman, and Hasan Davulcu. 2012. A Hybrid Model and Memory Based Story Classifier. In Proceedings of the 3rd In- ternational Workshop on Computational Models of Narrative (CMN’12), pages 60–64, Istanbul, Turkey. link