Andrew S. Gordon

The Story Resource Page

Collecting and analyzing millions of personal stories of everyday life

1. Overview

Over the last several years, my students and I have been developing technologies for collecting, analyzing, and reasoning with millions of personal stories extracted from Internet weblogs.

For an overview of these efforts, see the following papers.

2. Large-scale story corpora

To facilitate the distribution of large-scale story corpora, our group has identified individual blog posts that contain personal stories within existing large-scale corpora of posts. Most recently, we identified nearly one million personal stories in the ICWSM 2009 Spinn3r Blog Dataset, which we call the ICWSM 2009 Story Subset. Information about obtaining the The ICWSM 2009 Spinn3r Blog Dataset is available here. To identify the story subset once you have this dataset, please use one of the following:

If you use this ICWSM 2009 Story Subset in your research, please send Andrew Gordon an email, and be sure to cite the following paper:

3. Reasoning with weblog stories

One of the primary aims of our work is to solve the problem of knowledge acquisition in commonsense reasoning. Below are papers that describe some of our previous attempts to exploit commonsense knowledge that exists within millions of weblog stories.

4. Related work

Several groups have pursued related work in automated story extraction from text. Some of the more recent work achieves higher accuracy than our efforts, and may be more promising as a basis for future work.