The Story Resource Page
Collecting and analyzing millions of personal stories of everyday life
1. Overview
Over the last several years, my students and I have been developing technologies for collecting, analyzing, and reasoning with millions of personal stories extracted from Internet weblogs.
For an overview of these efforts, see the following 2 papers. The first focuses on the knowledge management aspects of this research, while the second focuses on the artificial intelligence opportunities.
- Gordon, A. (2008) Story Management Technologies for Organizational Learning. International Conference on Knowledge Management, Special Track on Intelligent Assistance for Self-Directed and Organizational Learning, Graz, Austria, September 3-5, 2008. pdf
- Gordon, A. and Swanson, R. (2008) Envisioning With Weblogs. International Conference on New Media Technology, Special Track on Knowledge Acquisition From the Social Web, Graz, Austria. September 3-5, 2008. pdf
2. Large-scale story corpora
To facilitate the distribution of large-scale story corpora, our group has identified individual blog posts that contain personal stories within existing large-scale corpora of posts. Most recently, we identified nearly one million personal stories in the ICWSM 2009 Spinn3r Blog Dataset, which we call the ICWSM 2009 Story Subset. Information about obtaining the The ICWSM 2009 Spinn3r Blog Dataset is available here. To identify the story subset once you have this dataset, please use one of the following:
- Version 1.0 (5/29/09) Initial release
- Version 1.1 (6/15/09) Fixed 1-off errors in both index file and java extractor
- Version 2.0 (12/21/09) Results using new classifier from Reid Swanson's PhD Dissertation
- Version 2.1 (5/26/11) Made some syntax changes to the python extractor to achieve compatibility with Python 2.6+
If you use this ICWSM 2009 Story Subset in your research, please send Andrew Gordon an email, and be sure to cite the following paper:
- Gordon, A. and Swanson, R. (2009) Identifying Personal Stories in Millions of Weblog Entries. Third International Conference on Weblogs and Social Media, Data Challenge Workshop, San Jose, CA, May 20, 2009. pdf
3. Reasoning with weblog stories
One of the primary aims of our work is to solve the problem of knowledge acquisition in commonsense reasoning. Below are papers that describe some of our previous attempts to exploit commonsense knowledge that exists within millions of weblog stories.
- Gordon, A., Bejan, C., and Sagae, K. (2011) Commonsense Causal Reasoning Using Millions of Personal Stories. Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), August 7–11, 2011, San Francisco, CA. pdf
- Gordon, A. (2010) Mining Commonsense Knowledge From Personal Stories in Internet Weblogs. Proceedings of the First Workshop on Automated Knowledge Base Construction, Grenoble, France, May 17-19, 2010. pdf
- Gerber, M., Gordon, A., & Sagae, K. (2010) Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR) NAACL 2010 Workshop, Los Angeles, CA, June 6, 2010. pdf
4. Contact
For information about this research, please contact Andrew S. Gordon (gordon @ ict.usc.edu) of the USC Institute for Creative Technologies.