Human Communication and Machine Learning:

Learning multimodal computational models of human interactions

 

Description  |  Schedule  |  Bibliography  |  Requirements  |  Notices

 

Course: CSCI 599, Fall 2010

Time: Thursday 2pm-4:50pm

Classroom: KAP 165

URL: http://people.ict.usc.edu/~morency/courses/fall2010/

Instructor: Professor Louis-Philippe Morency, morency@ict.usc.edu, 310-448-5323

Office: ICT 333

 

Recommended preparation: CSCI 542 or CSCI 567 or CSCI 573 or equivalent. Students should have proper academic background in probability, statistic and linear algebra. Previous experience in machine learning is suggested but not obligatory. This course is not a replacement for the Machine Learning course (CSCI 567).

 

Introduction and Purposes

Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal displays and signals. Human interpersonal behaviors have long been studied in linguistic, communication, sociology and psychology. The recent advances in machine learning, pattern recognition and signal processing enabled a new generation of computational tools to analyze, recognize and predict human communication behaviors during social interactions. This new research direction have broad applicability, including the improvement of human behavior recognition, the synthesis of natural animations for robots and virtual humans, the development of intelligent tutoring systems, and the diagnoses of social disorders (e.g., autism spectrum disorder).

 

The objectives of this course are:

(1) To give a general overview of human communicative behaviors (language, vocal and nonverbal) and show a parallel with computer science subfields (natural language processing, speech processing and computer vision);

 (2) To understand the multimodal challenge of human communication (e.g. speech and gesture synchrony) and learn about multimodal signal processing;

(3) To understand the social aspect of human communication and its implication on statistical and probabilistic modeling;

(4)  To learn about recent advances in machine learning and pattern recognition to analyze, recognize and predict human communicative behaviors;

(5) To give students practical experience in computational study of human social communication through a course project.

 

 

Course format

Each class will be three hours including two short pauses. The first two hours will consist of lectures given by Prof. Morency or one of the guest lecturers. The last hour will be a discussion about the assigned research papers. Two students will be assigned to lead each discussion.

 

 

Course Material

Required:

·        Reading material will be based on published technical papers available via the ACM/IEEE/Springer digital libraries or freely available online.  All USC students have automatic access to these digital archives.

Optional:

·        Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Francesco Camastra and Alessandro Vinciarelli, Springer, 2008, DOI: 10.1007/978-1-84800-007-0 (freely available on SpringerLink for USC students)

·        Multimodal Processing and Interaction, Gros, Potamianos and Maragos, SpringerLink, 2008, DOI: 10.1007/978-0-387-76316-3 (freely available on SpringerLink for USC students)

·        Nonverbal Communication in Human Interaction (7th edition), Mark Knapp and Judith Hall, Wadsworth, 2010

·        Speech and Language Processing (2nd edition), Daniel Jurafsky and James Martin, Pearson, 2008

 

Course Topics and Readings

    ** Topics and readings may change based on student interest **

Classes

Lectures

(2:00pm-3:50pm)

Readings for discussion sessions

(4:00pm-4:50pm)

Discussion leaders

 August 26

Introduction

·       A multi-modal, multi-party, multi-label dynamic problem

·       Human communication dynamics

·       Applications and domains

·       Mid-term and final projects

 

 

 

 Sept 2

Communication models

·     Emitter-receiver models

·     Communicative signals: signs and symbols

·     Common ground

·     Datasets and sensing tools

Introduction

·     Morency et al. (2010), Human Communication dynamics

·     Vinciarelli et al. (2009), Social Signal Processing

·     Carletta (2007), AMI dataset

 

·     Mohit Goenka

·     Khawaja Shams

 Sept 9

 

Verbal messages

·     Language models and N-grams

·     Boundaries, fillers and disfluencies

·     Syntax and part-of-speech tagging

·     Sphinx, hTK and syntax parsers

Communication models

·     Krauss et al. (2002), The psychology of Verbal Communication

·     Clark and Brennan (1991) Grounding in Communication

·     Pentland (2008), Honest Signals, Ch. 1

·     (optional) Taylor (2009) Text-to-speech Synthesis, Chapter 2

 

·     Lucy Abramyan

·     Congkai Sun

 Sept 16

 

Vocal messages

·     Phonetics and phonology

·     Rhythm, stress and Intonation

·     Audio representation

·     Praat and OpenEar

Verbal messages

·     Jurafsky and Martin (2008), Speech and Language Processing, 4.1-4.4, 5.1-5.3 and 12.1-12.2

·     Kim and Hovy (2004) Determining the sentiment of opinions

·     Liu et al. (2004) Metadata extraction

 

·     Saurabh Dhupar

·     Lucy Abramyan

 Sept 23

** Draft project proposals due.

Visual messages

·     Gesture, gaze,  posture and proxemics

·     Facial expressions

·     Image and video representation

·     Watson, FaceAPI, AAM and EyeAPI

Vocal messages

·     Taylor (2009), Text-to-speech Synthesis, Sections 6.1-6.5 and 9.1-9.2

·     Ang et al. (2002), Prosodic-based detection of annoyance and frustration

·     Ward and Tsukahara (2000)

·     (optional) Jurafsky and Martin (2008), Speech and Language Processing, Ch. 7, Sect. 7.1-7.4

 

·     Elaine Short

·     Paul Rundle

Sept 30

 

Conversational messages

·     Discourse analysis

·     Turn-taking and backchannel

·     Semantics and pragmatics

·     Speech and dialogue acts

Visual messages

·     Kramer (2008) Nonverbal communication

·     Kendon (1995) Gesture studies

·     Argyle and Dean (1965) Eye-Contact, Distance and Affiliation

 

·     Prateek Joshi

·     Arjun Gupta

Oct 7

 

Multimodal representation

·     Speech and gestures

·     Signals and symbols

·     Multimodal fusion

·     Statistical analysis

 

Conversational messages

·     Duncan (1974) Signals for speaking turns

·     Stolcke et al (2000) Dialogue act modeling

·     Bohus and Horvitz (2010), Computational Turn-taking

·     (optional) Jurafsky and Martin (2008), Speech and Language Processing, Sect. 17.2-17.3 and 21.1-21.4

 

·     Christopher Wienberg

·     Khawaja Shams

·     Sanil Ghatpande

Oct 14

** Project proposals due.

Multimodal processing

·     Audio-visual recognition

·     Hidden Markov Models

·     Multi-streams, coupled, factorial and asynchronous HMMs

Multimodal representation

·     Gros et al. (2008), Multimodal processing and Interaction, Chapter 1

·     McNeill (1985) Gestures

·     (optional) Ambady and Rosenthal (1992) Thin slicing

 

·     Victor Luo

·     Ross Mead

Oct 21

 

Multimodal Behavior analysis

·     Dimensionality reduction

·     Data clustering

·     Dynamic time warping

·     Feature selection

 

 

Multimodal processing

·     Nefian et al. (2002) Audio-visual speech recognition

·     McCowan et al (2005) Multimodal group actions

·     (optional) Chapter 10 of Machine Learning for Audio, Image and Video

·     (optional) Dupont and Luettin (2000) Audio-visual speech recognition

 

·     Kamlesh Lakshminarayanan

·     Payal Doshi

 

Nov 1

** Course rescheduled.

Monday

5-8pm

 

Affective messages and personality traits

·     Emotion and cognitive modeling

·     Big five personality dimensions

·     Social behaviors

 

Multimodal Behavior analysis

·     Wu et al. (2004) Multimedia data analysis

·     Zhou et al. (2010) Unsupervised discovery of facial events

·     (optional) Chapter 6 and 11 of Machine Learning for Audio, Image and Video

 

·     Chung-Cheng Chiu

·     Sanjay Verghese

 

Nov 4

** Mid-term projects due.

Multimodal behavior recognition (1/2)

·     Bootstrapping and Co-training

·     Nearest-neighbor

·     Decision trees

·     Support vector machines

Affective messages and personality traits

·     Gratch and Marsella (2005), Emotion Psychology

·     Barrick and Mount (1991), Big Five personality

 

·     Joshua Doubleday

·     Derya Ozkan

 

Nov 15

** Course rescheduled.

Monday

5-8pm

Behavior recognition (2/2)

·     Conditional random fields

·     Latent-dynamic CRF

·     Dynamic Bayesian networks

Multimodal behavior recognition (1/2)

·     Christoudias et al. (2006) Co-adaptation of audio-visual speech and gestures

·     Kapoor and Picard (2005) Multimodal affect recognition

·     W. Lin and A. Hauptmann (2002) SVM-based multimodal classifiers

·     (optional) P. Verlinde and G. Chollet (1999) Decision fusion paradigms

·     (optional) Chapter 9 of Machine Learning for Audio, Image and Video

 

·     Bostjan Kaluza

·     Vandana Moparthi

 Nov 18

Subjective and quantitative evaluations

·     Coder agreement, kappa

·     User studies

 

Multimodal behavior recognition (2/2)

·     El Kaliouby and Robinson (2005) Real-Time Inference of Complex Mental States

·     Morency et al. (2008) Context-based recognition

·     (optional) Sutton and McCallum (2007) Conditional Random Fields

·     (optional) Morency et al (2007) Latent-dynamic CRF

·     (optional) Tong et al. (2009) A unified probabilistic framework for facial action modeling

 

·     Lixing Huang

·     Victor Luo

Nov 25

 

** Thanksgiving **

 

 

Dec 2

** 2-6pm

** Final projects due 12/6 at 8pm

Final project presentations

 

 

 

Bibliography

Primary readings

Introduction

  1. Morency, L.-P., Modeling Human Communication Dynamics, IEEEE Signal Processing Magazine, September 2010 [USC blackboard]
  2. A. Vinciarelli, M. Pantic and H. Bourlard, Social Signal Processing: Survey of an Emerging Domain, in Image and Vision Computing Journal, vol. 27, no. 12, pp. 1743-1759, December 2009
  3. Jean Carletta (2007) Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus, Language Resources and Evaluation, Springer

Communication models

  1. Krauss, R.M. (2002). The psychology of verbal communication. In, N. Smelser & P. Baltes (eds.), International Encyclopedia of the Social and Behavioral Sciences. London: Elsevier.
  2. Clark and Brennan (1991) Grounding in Communication, [USC blackboard]
  3. Pentland, Honest Signals, Chapter 1 [USC blackboard]
  4. (Optional) Paul Taylor (2009), Test-to-speech synthesis, Chapter 2 [USC blackboard]

Verbal messages

  1. Jurafsky and Martin (2008), Speech and Language Processing, Sections 4.1-4.4, 5.1-5.3 and 12.1-12.2 [USC blackboard]
  2. Soo-Min Kim and Eduard Hovy (2004) Determining the Sentiment of Opinions, Proceedings of the COLING conference, Geneva
  3. Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, Barbara Peskin, and Mary Harper. 2004. The ICSI-SRI-UW Metadata Extraction System, ICSLP 2004

Vocal messages

  1. Paul Taylor (2009), Test-to-speech synthesis, Sections 6.1-6.5 and 9.1-9.2 [USC blackboard]
  2. Ang, Jeremy ; Dhillon, Rajdip ; Krupski, Ashley ; Shriberg, Elizabeth ; Stolcke, Andreas (2002): Prosody-based automatic detection of annoyance and frustration in human-computer dialog, In ICSLP-2002, 2037-2040
  3. N.Ward & W. Tsukahara (2000), Prosodic features which cue back-channel responses in English and Japanese, Journal of Pragmatics, Vol. 23, 2000, pp. 1177-1207
  4. (Optional) Jurafsky and Martin (2008), Speech and Language Processing, Chapter 7, Sections 7.1-7.4 [USC blackboard]

Visual messages

  1. Krämer, N. C. (2008). Nonverbal Communication. In J. Blascovich & C. Hartel (eds.), Human behavior in military contexts (pp. 150-188). Washington: The National Academies Press. [USC blackboard]
  2. Adam Kendon, An Agenda for Gesture Studies, This article appeared in Volume 7 (3) of the Semiotic Review of Books.
  3. Michael Argyle and Janet Dean, Eye-contact, distance and Affiliation, Sociometry, Vol. 28, No. 3, pp. 289-304, 1965

Conversational messages

  1. Duncan (1974) Some Signals and Rules for Taking  Speaking Turns  in Conversations
  2. Andreas Stolcke , Noah Coccaro , Rebecca Bates , Paul Taylor , Carol Van Ess-Dykema , Klaus Ries , Elizabeth Shriberg , Daniel Jurafsky , Rachel Martin , Marie Meteer, Dialogue act modeling for automatic tagging and recognition of conversational speech, Computational Linguistics, v.26 n.3, p.339-373, September 2000 
  3. Bohus, D., Horvitz, E., (2010) - Computational Models for Multiparty Turn-Taking, Microsoft Technical Report MSR-TR-2010-115
  4. (optional) Jurafsky and Martin (2008), Speech and Language Processing, Sections 17.1-17.4 and 21.1-21.4 [USC blackboard]

Multimodal representation

  1. Gros, Potamianos and Maragos (2008) Multimodal Processing and Interaction, SpringerLink, Chapter 1 [SpringerLink or USC blackboard]
  2. McNeill, D. (1985). "So you think gestures are nonverbal?" In Psychological Review 92:350-371.
  3.  (optional) Nalini Ambady and Robert Rosenthal (1992) Thin-slices of Expressive Behavior as Predictor of Interpersonal Consequences: A Meta-Analysis, Psychological Bulletin, Vol. 111, No. 2, 256-274

Multimodal processing

  1. A. Nefian, L. Liang, X. Pi, X. Liu and K. Murphy, (2002) Dynamic Bayesian networks for audio-visual speech recognition, EURASIP Journal on Applied Signal Processing, Volume 2002, Issue 1
  2. I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, M., D. Zhang,  “Automatic analysis of multimodal group actions in meetings”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 27, pp. 305–317, 2005
  3. (optional) Francesco Camastra and Alessandro Vinciarelli (2008) Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Chapter 10
  4. (optional) S. Dupont and J. Luettin (2000) Audio-visual speech modeling for continuous speech recognition, IEEE Transactions on multimedia, 2000

Multimodal behavior analysis

29.   Y.Wu, E. Chang, K. Chang and J. Smith (2004) Optimal multimodal fusion for multimedia data analysis, Proceedings of the 12th annual ACM international conference on Multimedia (ACM Multimedia)

  1. F. Zhou, F. De la Torre and J. F. Cohn (2010), Unsupervised Discovery of Facial Events, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  2. (Optional) Francesco Camastra and Alessandro Vinciarelli (2008) Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Chapter 6 and Chapter 11

Affective messages

  1. Gratch & Marsella, 2005 Lessons from Emotion Psychology for the Design of Lifelike Characters
  2. Mr Barrick, Mk Mount (1991) The Big Five Personality Dimensions And Job Performance: A Meta-Analysis - Personnel Psychology

Multimodal behavior recognition (1/2)

  1. C. Christoudias, K. Saenko, L.-P. Morency,  and T. Darrell (2006) Co-Adaptation of Audio-Visual Speech and Gesture Classifiers, International Conference on Multimodal Interactions

35.   A. Kapoor and R. Picard (2005) Multimodal affect recognition in learning environments, Proceedings of the 13th annual ACM international conference on Multimedia (ACM Multimedia)

36.   W. Lin and A. Hauptmann (2002) News video classification using SVM-based multimodal classifiers and combination strategies , Proceedings of the 10th annual ACM international conference on Multimedia (ACM Multimedia)

  1. (optional) P. Verlinde and G. Chollet (1999), Comparing decision fusion paradigms using k-NN based classifiers, decision trees and logistic regression in a multi-modal identity verification application, Proceedings of the International Conference on Audio and Video-Based Biometric Person Authentication
  2. (Optional) Francesco Camastra and Alessandro Vinciarelli (2008) Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Chapter 9

Multimodal behavior recognition (2/2)

  1. Rana El Kaliouby and Peter Robinson (2005) Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures, Proceedings of the workshop on Real-Time Vision for Human-Computer Interaction
  2. L.-P. Morency, I. de Kok, & J. Gratch, “Context-based recognition during human interactions: Automatic feature selection and encoding dictionary,” In proceedings of 10th International Conference on Multimodal Interfaces (ICMI 2008), October 2008
  3. (optional) C. Sutton and A. McCallum (2007) An Introduction to Conditional Random Fields for Relational Learning, Introduction to Statistical Relation Learning, Chapter 4
  4. (optional) Y. Tong, J. Chen & Q. Ji (2010), “A Unified Probabilistic Framework for Spontaneous Facial Action Modeling and Understanding”, IEEE Transactions on Pattern Analysis and Machine Intelligence.
  5. (optional) L.-P. Morency, A. Quattoni and Trevor Darrell (2007),  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition, Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2007

 

Complementary readings

  1. Cole et al. (1996), Discourse and Dialogue, Chapter 6
  2. Brown & Yule (1983) Discourse Analysis, Chapter 1
  3. T.Kawahara, Z.Q.Chang, and K.Takanashi (2010), Analysis on prosodic features of Japanese reactive tokens in poster conversations. . In Proc. Int'l Conf. Speech Prosody
  4. Kendon (1994) "Do Gestures Communicate? A Review." In Research on Language and Social Interaction 27(3):175-200
  5. Allan (1995), Outline of English syntax, Chapter 2
  6. A. K. Noulas, G. Englebienne, B. J.A. Krose (2009) “Multimodal Speaker Diarization,” Preprint submitted to Computer Vision and Image Understanding,
  7. Z. Zeng, M. Pantic, G.I. Roisman and T.S. Huang, 'A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions (pdf file)', in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39-58, January 2009
  8. Nalini Ambady and Robert Rosenthal, Psychological Bulletin, 1992, Vol. 111, No. 2, 256-274
  9. W. Jiang, C. Cotton, S.-F. Chang, D. Ellis, & A. Loui (2009), “Short-Term Audio-Visual Atoms for Generic Video Concept Classification,” Proc. ACM MultiMedia-09, pp. 5-14
  10. Stivers, Tanya, Nicholas J. Enfield, Penelope Brown, Christina Englert, Makoto Hayashi, Trine Heinemann, Gertie Hoymann, Federico Rossano, Jan Peter De Ruiter, Kyung-Eun Yoon and Stephen C. Levinson (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences 106(26): 10587-10592.
  11. C. Rolf, E. Kjell, S. Marc, “Perceptual judgments of pitch range,” In Proceedings of the Speech Prosody 2004 Conference, Nara, Japan, March 23-26, 2004, pp. 689-692.
  12. De Ruiter, Jan Peter; Mitterer, Holger; Enfield, N. J. (2006) Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language, 8 (3), 515-535
  13. C. Sutton, A. McCallum, “An Introduction to Conditional Random Fields for Relational Learning”, Book chapter in Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006.
  14. M. Pantic (2009), Machine analysis of facial behaviour: Naturalistic and dynamic behaviour.  Philosophical Transactions of Royal Society B, vol. 364, pp. 3505-3513
  15. I. H. Witten, E. Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, 2nd Edition ed. San Francisco: Morgan Kaufmann, 2005.
  16. S. Duncan Jr., & N. T. Collier, “C-Quence: A tool for analyzing qualitative sequential data,”.Behavior Research Methods, Instruments, and Computers, Vol. (34), 2002, pp. 108-116.
  17. M. S. Magnusson, “Discovering hidden time patterns in behavior: T-patterns and their detection,” Behavior Research Methods, Instruments, & Computers, Vol. 32, 2000, pp. 93-110.
  18. K. Shockley, “Cross recurrence quantification of interpersonal postural activity,” In M. A. Riley & G. C. Van Orden (Eds.), Tutorials in contemporary nonlinear methods for the behavioral sciences (pp. 142-177). Retrieved February 26, 2010
  19. C. Mario Christoudias, Raquel Urtasun, Ashish Kapoor & Trevor Darrell, “Co-training with Noisy Perceptual Observations”, Conference on Computer Vision and Pattern Recognition, 2009
  20. L.-P. Morency, I. de Kok, & J. Gratch, “Context-based recognition during human interactions: Automatic feature selection and encoding dictionary,” In proceedings of 10th International Conference on Multimodal Interfaces (ICMI 2008), October 2008
  21. L. Chen, R. Travis, F. Parrill, X. Han, J. Tu, Z. Huang, I. Kimbara, H. Welji, M. Harper, F. Quek, D. McNeill, S. Duncan, R. Tuttle, and T. Huang, “VACE Multimodal Meeting Corpus,”  Proceedings  of MLMI 2005 Workshop, Edinburgh, July 2005.
  22. J. Garofolo, C. Laprum, M. Michel, V. Stanford, E. Tabassi, “The NIST Meeting Room Pilot Corpus”, In: Proc. of Language Resource and Evaluation Conference, 2004.
  23. C. Busso, Z. Deng, U. Neumann, and S. Narayanan, "Natural head motion synthesis driven by acoustic prosodic features," Computer Animation and Virtual Worlds, Vol. 16, no. 3-4, pp. 283-290, July 2005.
  24. Bohus, D., Horvitz, E. (2009). Dialog in the Open World: Platform and Applications, in Proceedings of ICMI'09, Boston, MA 
  25. S. Kohonen, “Turn-taking in conversation: overlaps and interruptions in intercultural talk”, Cahiers 10, Vol. 1, 2004, pp. 15–32.
  26. S. Duncan Jr., (1969). “Nonverbal communication,” Psychological Bulletin, Vol. 72, 1969, pp. 118-137.
  27. K. Drummond & R. Hopper, “Back channels revisited: acknowledgement tokens and speakership incipiency,” Research on Language and Social Interaction, Vol. (26), 1993, pp. 157-177.
  28. C. Goodwin, “Conversational Organization: Interaction between Hearers and Speakers,” New York, NY: Academic Press., 1981.
  29. Speaking while monitoring addressees for understanding. Clark, H. H. & Krych, M. A. (2004) Journal of Memory and Language, 50(1), 62-81
  30. Timing in conversation: the anticipation of turn endings. Magyari, L. & De Ruiter, J.P. In: Proceedings of the 12th Workshop on the Semantics and Pragmatics of Dialogue (LONDIAL 2008)
  31. Modeling Dominance in Group Conversations from Nonverbal Activity Cues. D. B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez, IEEE Trans. on Audio, Speech, and Language Processing, Vol. 17, No. 3, pp. 501-513. Mar. 2009
  32. Coordinating with each other in a material world. Clark, H. H. Discourse Studies 7, pp 507-525, 2005.
  33. Universals and cultural variation in turn-taking in conversation. Stivers, Tanya, Nicholas J. Enfield, Penelope Brown, Christina Englert, Makoto Hayashi, Trine Heinemann, Gertie Hoymann, Federico Rossano, Jan Peter De Ruiter, Kyung-Eun Yoon and Stephen C. Levinson (2009). Proceedings of the National Academy of Sciences 106(26): 10587-10592.
  34. Recognition of para-linguistic information and its application to spoken dialogue system. Shinya Fujie, Yasushi Ejiri, Yosuke Matsusaka, Hideaki Kikuchi, Tetsunori Kobayashi. IEEE ASRU2003 (Automatic Speech Recognition and Understanding Workshop), pp.231-236, Dec. 2003.
  35. To Signal Is Human. Alex (Sandy) Pentland. American Scientist 98, pp 204-211. 2010.
  36. Modeling Group Discussion Dynamics. Wen Dong, Ankur Mani, Alex Pentland Bruno Lepri, Fabio Pianesi. 2009 Submitted to the IEEE Transactions on Autonomous Mental Development. also MIT Media Lab, Vision and Modeling Group Technical Report 628

 

Grades

·        Grading breakdown

o   Attendance and participation 10% (1 free absence)

o   Reading assignments 15%

o   Leading class discussion  15%

o   Course project: mid-term report 20%

§  Draft project proposal: 4%

§  Revised project proposal: 4%

§  Mid-term report: 12%

o   Course project: final report and presentation 40%

o    

·        Attendance

o   Students are expected to attend every class (1 free absence allowed) and participate actively during the group discussions.

·        Reading assignments

o   The reading assignment for each class will consist of 2-4 research papers (posted online at least one week before the class). These papers are specially selected to complement the lectures and show state-of-the-are research.

o   Sunday before each class, 1-3 questions will be posted online.

o   Students must send their answers by 5pm the day before the class. The answers will be part of the group discussion.

·        Group discussions

o   Each student will be leading the group discussion twice during the semester. A signup sheet will be available during the first class.

o   Students can lead the discussion individually or pair with another student. The pairing should be different for the second group discussion.

o   Since all students are expected to read the research papers, the discussion should bring something new and interactive to the class. This includes: example datasets, simple implementation of the algorithms, demo, new challenging questions and applications.

·        Course project:

o   The goal of this course is to analyze human communicative behaviors in social settings using state-of-the-art statistical and probabilistic models. The course project is specifically design to give students practical experience in computational study of human social communication.

o   Students can perform the project individually or in team of two. The mid-term and final report will need to outline the tasks of each participant. Team projects will be expected to include a deeper analysis than individual projects.

o   Mid-term report: The mid-term report will present a qualitative analysis of the selected dataset and communicative behaviors. The report should include correct transcription and annotations of the language, vocal and nonverbal behaviors. Using standard statistical tools and qualitative observations, the students should highlight the challenges with this dataset (and communicative behaviors) and suggest an approach to solve them.

o   Final report and presentation: Using the same dataset as the mid-term report, the final report will include a quantitative analysis of the human communicative behaviors.  The final report should be phrase as a research paper describing either a comparative study of different statistical and probabilistic approaches or a new technique for behavior modeling.

 

Statement for Students with Disabilities

Any student requesting academic accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure the letter is delivered to me (or to TA) as early in the semester as possible. DSP is located in STU 301 and is open 8:30 a.m.–5:00 p.m., Monday through Friday. The phone number for DSP is (213) 740-0776.

 

Statement on Academic Integrity

USC seeks to maintain an optimal learning environment. General principles of academic honesty include the concept of respect for the intellectual property of others, the expectation that individual work will be submitted unless otherwise allowed by an instructor, and the obligations both to protect one’s own academic work from misuse by others as well as to avoid using another’s work as one’s own. All students are expected to understand and abide by these principles. Scampus, the Student Guidebook, contains the Student Conduct Code in Section 11.00, while the recommended sanctions are located in Appendix A: http://www.usc.edu/dept/publications/SCAMPUS/gov/. Students will be referred to the Office of Student Judicial Affairs and Community Standards for further review, should there be any suspicion of academic dishonesty. The Review process can be found at: http://www.usc.edu/student-affairs/SJACS/.