Human Communication and Machine Learning:
Learning multimodal computational models of human interactions
Description | Schedule | Bibliography | Requirements | Notices
Course: CSCI 599, Fall 2010
Time: Thursday 2pm-4:50pm
Classroom: KAP 165
URL: http://people.ict.usc.edu/~morency/courses/fall2010/
Instructor: Professor Louis-Philippe Morency, morency@ict.usc.edu, 310-448-5323
Office: ICT 333
Recommended preparation: CSCI 542 or CSCI 567 or CSCI 573 or equivalent. Students should have proper academic background in probability, statistic and linear algebra. Previous experience in machine learning is suggested but not obligatory. This course is not a replacement for the Machine Learning course (CSCI 567).
Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal displays and signals. Human interpersonal behaviors have long been studied in linguistic, communication, sociology and psychology. The recent advances in machine learning, pattern recognition and signal processing enabled a new generation of computational tools to analyze, recognize and predict human communication behaviors during social interactions. This new research direction have broad applicability, including the improvement of human behavior recognition, the synthesis of natural animations for robots and virtual humans, the development of intelligent tutoring systems, and the diagnoses of social disorders (e.g., autism spectrum disorder).
The objectives of this course are:
(1) To give a general overview of human communicative behaviors (language, vocal and nonverbal) and show a parallel with computer science subfields (natural language processing, speech processing and computer vision);
(2) To understand the multimodal challenge of human communication (e.g. speech and gesture synchrony) and learn about multimodal signal processing;
(3) To understand the social aspect of human communication and its implication on statistical and probabilistic modeling;
(4) To learn about recent advances in machine learning and pattern recognition to analyze, recognize and predict human communicative behaviors;
(5) To give students practical experience in computational study of human social communication through a course project.
Each class will be three hours including two short pauses. The first two hours will consist of lectures given by Prof. Morency or one of the guest lecturers. The last hour will be a discussion about the assigned research papers. Two students will be assigned to lead each discussion.
Required:
· Reading material will be based on published technical papers available via the ACM/IEEE/Springer digital libraries or freely available online. All USC students have automatic access to these digital archives.
Optional:
· Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Francesco Camastra and Alessandro Vinciarelli, Springer, 2008, DOI: 10.1007/978-1-84800-007-0 (freely available on SpringerLink for USC students)
· Multimodal Processing and Interaction, Gros, Potamianos and Maragos, SpringerLink, 2008, DOI: 10.1007/978-0-387-76316-3 (freely available on SpringerLink for USC students)
· Nonverbal Communication in Human Interaction (7th edition), Mark Knapp and Judith Hall, Wadsworth, 2010
· Speech and Language Processing (2nd edition), Daniel Jurafsky and James Martin, Pearson, 2008
|
Classes |
Lectures (2:00pm-3:50pm) |
Readings for discussion sessions (4:00pm-4:50pm) |
Discussion leaders |
|
August 26 |
Introduction · A multi-modal, multi-party, multi-label dynamic problem · Human communication dynamics · Applications and domains · Mid-term and final projects |
|
|
|
Sept 2 |
Communication models · Emitter-receiver models · Communicative signals: signs and symbols · Common ground · Datasets and sensing tools |
Introduction · Morency et al. (2010), Human Communication dynamics · Vinciarelli et al. (2009), Social Signal Processing · Carletta (2007), AMI dataset |
· Mohit Goenka · Khawaja Shams |
|
Sept 9
|
Verbal messages · Language models and N-grams · Boundaries, fillers and disfluencies · Syntax and part-of-speech tagging · Sphinx, hTK and syntax parsers |
Communication models · Krauss et al. (2002), The psychology of Verbal Communication · Clark and Brennan (1991) Grounding in Communication · Pentland (2008), Honest Signals, Ch. 1 · (optional) Taylor (2009) Text-to-speech Synthesis, Chapter 2 |
· Lucy Abramyan · Congkai Sun |
|
Sept 16
|
Vocal messages · Phonetics and phonology · Rhythm, stress and Intonation · Audio representation · Praat and OpenEar |
Verbal messages · Jurafsky and Martin (2008), Speech and Language Processing, 4.1-4.4, 5.1-5.3 and 12.1-12.2 · Kim and Hovy (2004) Determining the sentiment of opinions · Liu et al. (2004) Metadata extraction |
· Saurabh Dhupar · Lucy Abramyan |
|
Sept 23 ** Draft project proposals due. |
Visual messages · Gesture, gaze, posture and proxemics · Facial expressions · Image and video representation · Watson, FaceAPI, AAM and EyeAPI |
Vocal messages · Taylor (2009), Text-to-speech Synthesis, Sections 6.1-6.5 and 9.1-9.2 · Ang et al. (2002), Prosodic-based detection of annoyance and frustration · Ward and Tsukahara (2000) · (optional) Jurafsky and Martin (2008), Speech and Language Processing, Ch. 7, Sect. 7.1-7.4 |
· Elaine Short · Paul Rundle |
|
Sept 30
|
Conversational messages · Discourse analysis · Turn-taking and backchannel · Semantics and pragmatics · Speech and dialogue acts |
Visual messages · Kramer (2008) Nonverbal communication · Kendon (1995) Gesture studies · Argyle and Dean (1965) Eye-Contact, Distance and Affiliation |
· Prateek Joshi · Arjun Gupta |
|
Oct 7
|
Multimodal representation · Speech and gestures · Signals and symbols · Multimodal fusion · Statistical analysis
|
Conversational messages · Duncan (1974) Signals for speaking turns · Stolcke et al (2000) Dialogue act modeling · Bohus and Horvitz (2010), Computational Turn-taking · (optional) Jurafsky and Martin (2008), Speech and Language Processing, Sect. 17.2-17.3 and 21.1-21.4 |
· Christopher Wienberg · Khawaja Shams · Sanil Ghatpande |
|
Oct 14 ** Project proposals due. |
Multimodal processing · Audio-visual recognition · Hidden Markov Models · Multi-streams, coupled, factorial and asynchronous HMMs |
Multimodal representation · Gros et al. (2008), Multimodal processing and Interaction, Chapter 1 · McNeill (1985) Gestures · (optional) Ambady and Rosenthal (1992) Thin slicing |
· Victor Luo · Ross Mead |
|
Oct 21
|
Multimodal Behavior analysis · Dimensionality reduction · Data clustering · Dynamic time warping · Feature selection
|
Multimodal processing · Nefian et al. (2002) Audio-visual speech recognition · McCowan et al (2005) Multimodal group actions · (optional) Chapter 10 of Machine Learning for Audio, Image and Video · (optional) Dupont and Luettin (2000) Audio-visual speech recognition |
· Kamlesh Lakshminarayanan · Payal Doshi
|
|
Nov 1 ** Course rescheduled. Monday 5-8pm
|
Affective messages and personality traits · Emotion and cognitive modeling · Big five personality dimensions · Social behaviors
|
Multimodal Behavior analysis · Wu et al. (2004) Multimedia data analysis · Zhou et al. (2010) Unsupervised discovery of facial events · (optional) Chapter 6 and 11 of Machine Learning for Audio, Image and Video |
· Chung-Cheng Chiu · Sanjay Verghese
|
|
Nov 4 ** Mid-term projects due. |
Multimodal behavior recognition (1/2) · Bootstrapping and Co-training · Nearest-neighbor · Decision trees · Support vector machines |
Affective messages and personality traits · Gratch and Marsella (2005), Emotion Psychology · Barrick and Mount (1991), Big Five personality |
· Joshua Doubleday · Derya Ozkan
|
|
Nov 15 ** Course rescheduled. Monday 5-8pm |
Behavior recognition (2/2) · Conditional random fields · Latent-dynamic CRF · Dynamic Bayesian networks |
Multimodal behavior recognition (1/2) · Christoudias et al. (2006) Co-adaptation of audio-visual speech and gestures · Kapoor and Picard (2005) Multimodal affect recognition · W. Lin and A. Hauptmann (2002) SVM-based multimodal classifiers · (optional) P. Verlinde and G. Chollet (1999) Decision fusion paradigms · (optional) Chapter 9 of Machine Learning for Audio, Image and Video |
· Bostjan Kaluza · Vandana Moparthi |
|
Nov 18 |
Subjective and quantitative evaluations · Coder agreement, kappa · User studies
|
Multimodal behavior recognition (2/2) · El Kaliouby and Robinson (2005) Real-Time Inference of Complex Mental States · Morency et al. (2008) Context-based recognition · (optional) Sutton and McCallum (2007) Conditional Random Fields · (optional) Morency et al (2007) Latent-dynamic CRF · (optional) Tong et al. (2009) A unified probabilistic framework for facial action modeling |
· Lixing Huang · Victor Luo |
|
Nov 25
|
** Thanksgiving ** |
|
|
|
Dec 2 ** 2-6pm ** Final projects due 12/6 at 8pm |
Final project presentations |
|
|
Primary readings
Introduction
Communication models
Verbal messages
Vocal messages
Visual messages
Conversational messages
Multimodal representation
Multimodal processing
Multimodal behavior analysis
29. Y.Wu, E. Chang, K. Chang and J. Smith (2004) Optimal multimodal fusion for multimedia data analysis, Proceedings of the 12th annual ACM international conference on Multimedia (ACM Multimedia)
Affective messages
Multimodal behavior recognition (1/2)
35. A. Kapoor and R. Picard (2005) Multimodal affect recognition in learning environments, Proceedings of the 13th annual ACM international conference on Multimedia (ACM Multimedia)
36. W. Lin and A. Hauptmann (2002) News video classification using SVM-based multimodal classifiers and combination strategies , Proceedings of the 10th annual ACM international conference on Multimedia (ACM Multimedia)
Multimodal behavior recognition (2/2)
Complementary readings
· Grading breakdown
o Attendance and participation 10% (1 free absence)
o Reading assignments 15%
o Leading class discussion 15%
o Course project: mid-term report 20%
§ Draft project proposal: 4%
§ Revised project proposal: 4%
§ Mid-term report: 12%
o Course project: final report and presentation 40%
o
· Attendance
o Students are expected to attend every class (1 free absence allowed) and participate actively during the group discussions.
· Reading assignments
o The reading assignment for each class will consist of 2-4 research papers (posted online at least one week before the class). These papers are specially selected to complement the lectures and show state-of-the-are research.
o Sunday before each class, 1-3 questions will be posted online.
o Students must send their answers by 5pm the day before the class. The answers will be part of the group discussion.
· Group discussions
o Each student will be leading the group discussion twice during the semester. A signup sheet will be available during the first class.
o Students can lead the discussion individually or pair with another student. The pairing should be different for the second group discussion.
o Since all students are expected to read the research papers, the discussion should bring something new and interactive to the class. This includes: example datasets, simple implementation of the algorithms, demo, new challenging questions and applications.
· Course project:
o The goal of this course is to analyze human communicative behaviors in social settings using state-of-the-art statistical and probabilistic models. The course project is specifically design to give students practical experience in computational study of human social communication.
o Students can perform the project individually or in team of two. The mid-term and final report will need to outline the tasks of each participant. Team projects will be expected to include a deeper analysis than individual projects.
o Mid-term report: The mid-term report will present a qualitative analysis of the selected dataset and communicative behaviors. The report should include correct transcription and annotations of the language, vocal and nonverbal behaviors. Using standard statistical tools and qualitative observations, the students should highlight the challenges with this dataset (and communicative behaviors) and suggest an approach to solve them.
o Final report and presentation: Using the same dataset as the mid-term report, the final report will include a quantitative analysis of the human communicative behaviors. The final report should be phrase as a research paper describing either a comparative study of different statistical and probabilistic approaches or a new technique for behavior modeling.
Any student requesting academic accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure the letter is delivered to me (or to TA) as early in the semester as possible. DSP is located in STU 301 and is open 8:30 a.m.–5:00 p.m., Monday through Friday. The phone number for DSP is (213) 740-0776.
Statement on Academic Integrity
USC seeks to maintain an optimal learning environment. General principles of academic honesty include the concept of respect for the intellectual property of others, the expectation that individual work will be submitted unless otherwise allowed by an instructor, and the obligations both to protect one’s own academic work from misuse by others as well as to avoid using another’s work as one’s own. All students are expected to understand and abide by these principles. Scampus, the Student Guidebook, contains the Student Conduct Code in Section 11.00, while the recommended sanctions are located in Appendix A: http://www.usc.edu/dept/publications/SCAMPUS/gov/. Students will be referred to the Office of Student Judicial Affairs and Community Standards for further review, should there be any suspicion of academic dishonesty. The Review process can be found at: http://www.usc.edu/student-affairs/SJACS/.