Skip to content
Skip to navigation menu

About the data

Content

The database currently consists of 30 conversations between pairs of speakers, each lasting approximately five minutes, making up 300 minutes of audio-video data. There were 16 speakers in total, 12 male and 4 female, between the ages of 25 and 56 (hence, some speakers were paired multiple times with different conversational partners).

Currently, 8 conversations have annotations for facial expressions, verbal and non-verbal utterances, and transcribed speech. This adds up to 40 minutes of conversations currently annotated, or 80 minutes if the two sides of the conversation are used independently.

Annotation of the remaining 22 conversations is ongoing. The community's assistance is sought to help us complete this task.

Currently available for download are:

  • 2D video for 30 conversations
  • Manual annotations (performed using ELAN) for 8 conversations consisting of:
    • Backchannel
    • Frontchannel
    • Agree
    • Disagree
    • Utterance (verbal and non-verbal)
    • Happy (smile or laugh)
    • Surprise
    • Thinking
    • Confusion
    • Head Nodding
    • Head Shake
    • Head Tilt
  • Extracted audio-visual features for 8 conversations consisting of:
    • Active Shape Model facial parameters (raw, smoothed and derivatives)
    • Audio features:
      • pitch
      • intensity
      • the first two formant frequencies

Data acquisition

Two 3dMD dynamic scanners captured 3D video, a Basler A312fc firewire CCD camera captured 2D colour video at standard video frame rate, and a microphone placed in front of the participant, out of view of the camera, captured sound (at 44.1KHz).

To ensure all audio and video could be reliably synchronized, each speaker had a handheld buzzer and LED (light emitting diode) device, used to mark the beginning of each recording session. A single button controlled both devices and simultaneously activated the buzzer and LED. No equipment was altered between the recording sessions, except for the height of the chair to ensure the speaker's head was clearly visible by the cameras.

The database is described in the following paper:

A.J. Aubrey, D. Marshall, P.L. Rosin, J. Vandeventer, D.W. Cunningham, C. Wallraven, "Cardiff Conversation Database (CCDb): A Database of Natural Dyadic Conversations", V & L Net Workshop on Language for Vision, 2013.

About us

The CCDb was created as part of a research project 'Actions and Events in Images and Videos' in the Welsh Government funded Research Institute of Visual Computing, (RIVIC), is the collaborative amalgamation of research programmes between the computer science departments in Aberystwyth, Bangor, Cardiff and Swansea Universities.

Professor David Marshall and Professor Paul Rosin were the lead investigators on the project. They are both Professors in the Visual Computing research group in the School of Computer Science & Informatics.

Professor Christian Wallraven and Professor Douglas Cunningham were RIVIC funded Senior Visiting Research Fellows on this project.

The EPSRC Vision & Language (V & L) has funded a travel grant for Research Assistant, Dr Andrew Aubrey to facilitate some aspects of this work.

Jason Vandeventer is a School of Computer Science & Informatics PhD student.

Want to help?

To annotate the whole database requires a massive effort, and is time consuming. Therefore, the community’s assistance is sought to help complete the task of annotating and validating the remaining conversations. Please contact Professor David Marshall or Professor Paul Rosin if you would like to help.