Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 512

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 527

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 534

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 570

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 103

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 61

Deprecated: Assigning the return value of new by reference is deprecated in /mnt/tb/scilsdata/www/ on line 1109
Multimedia Grand Challenge 2010
ACM MM 2010 Header Image

Nokia Challenge:

Radvision Challenge:

Google Challenge:

Radvision Challenge:

Google Challenge:

HP Challenge:

Yahoo! Challenge:

Yahoo! Challenge:

3DLife/Technicolor Challenge:

And the winner is…

Thanks to all participants, jury members, and the audience, the 2010 Multimedia Grand Challenge is now history. It was again great fun, and good to see that all participants put so much effort in crafting an inspiring and interesting pitch. Participation is more important than winning, but nonetheless our industry partners selected 3 winners. The winners are (presenters in bold):

1. Jana Machajdik, Allan Hanbury, Julian Stöttinger. Understanding Affect in Images.

2. Wei Song, Dian Tjondronegoro, Ivan Himawan. ROI-based Content Adaptation for Mobile Device Usage of Video Conferencing.

3. Julien Law-To, Gregory Grefenstette, Jean-Luc Gauvain, Guillaume Gravier, Lori Lamel, Julien Despres. Introducing topic segmentation and segmented-based browsing tools into a content based video retrieval system.

Congratulations to the winners and hope to see you all again next year!

-Cees & Malcolm

Multimedia Grand Challenge Program

The finalists for the multimedia grand challenge are known. We received many good submissions this year. We have given preference to contributions that have been backed by an accepted paper from the regular program and included only the most suitable ones that

  • Significantly address one of the industry grand challenges.
  • Depict working, presentable systems or demos.
  • Describe why the system presents a novel and interesting solution.

The 2010 Multimedia Grand Challenge will feature the following presentations:

Title Authors Challenge
A low-cost performance analysis and coaching system for tennis Philip T Kelly; Petros Daras; Noel. E. O’Connor; Juan Diego Pérez-Moneo Agapito 3DLife
Human Body Tracking of Tennis Players using Hierarchical Particle Filtering and Variable Windows Adolfo López-Méndez; Marcel Alcoverro; Montse Pardas; Josep R. Casas 3DLife
Audience Dependent Photo Collection Summarization Pere Obrador, Rodrigo de Oliveira, Nuria Oliver CeWe
Social Game Epitome versus Automatic Visual Analysis Peter Vajda; Ivan Ivanov; Lutz Goldmann; Touradj Ebrahimi HP
Learning-to-Photograph towards HP Challenge Bin Cheng, Bingbing Ni, Shuicheng Yan, Qi Tian HP
Using Android and Indoor Localization for Diaries Eladio Martin; Oriol Vinyals; Gerald Friedland; Ruzena Bajcsy Google diaries
Improving Personal Diaries Using Social Audio Features Michael Kuhn, Roger Wattenhofer, Samuel Welten Google diaries
Google Challenge 2010: Efficient Genre-specific Semantic Video Indexing Jun Wu; Marcel Worring Google video genre
Video Classification based on Contextual Visual Vocabulary Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, Shuqiang Jiang, Wen Gao Google video genre
VIRaL: Visual Image Retrieval and Localization Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias Nokia
ROI-based Content Adaptation for Mobile Device Usage of Video Conferencing Wei Song, Dian Tjondronegoro, Ivan Himawan Radvision adapt
Gaze Awareness and Interaction Support in Presentations: Video Conference Experience Grand Challenge Statement Kar-Han Tan; Dan Gelb; Ramin Samadani; Ian N Robinson; Bruce Culbertson; John Apostolopoulos Radvision video conf
Rendering Panorama in Mobile Video Conferencing Shu Shi; Zhengyou Zhang Radvision video conf
Multi-Scale Entropy analysis of Dominance in Social Creative Activities Donald Glowinski; Paolo Coletta; Maurizio Mancini Radvision video conf
Searching and Browsing Social Images through iAVATAR Aixin Sun; Sourav Bhowmick Yahoo classification
Understanding Affect in Images Jana Machajdik, Allan Hanbury, Julian Stöttinger Yahoo classification
Introducing topic segmentation and segmented-based browsing tools into a content based video retrieval system Julien Law-To; Gregory Grefenstette; Jean-Luc Gauvain; Guillaume Gravier; Lori Lamel; Julien Yahoo segmentation
Towards Yahoo! Challenge: A Generic Event Detection and Segmentation System for Video Navigation and Search Tianzhu Zhang, Changsheng Xu, Guangyu Zhu, Si Liu, Hanqing Lu Yahoo segmentation

Extra prize money announcement

We are happy to inform you that this year’s Grand Challenge will award three prizes:

  • Gold medal — 1500 USD
  • Silver medal — 1000 USD
  • Bronze medal — 500 USD

Researchers are encouraged to submit working systems in response to the challenges to win the grand Challenge competition! A number of solutions (perhaps 10-20) will be selected as finalists and invited to describe their work, demonstrate their solution and argue for the paper’s success in the Grand Challenge Session at ACM Multimedia 2010 in Florence.  Each finalist will have several minutes to present their case.  Final winners will be chosen by industry scientists, engineers and business luminaries.

Prepare your submissions according to these guidelines, and submit your grand challenge participation before August 1st.

See you in Florence.

Malcolm & Cees.

The challenge is on!

What problems do Google, Yahoo!, HP, Radvision, CeWe, Nokia and other companies see driving the future of multimedia? The Multimedia Grand Challenge is a set of problems and issues from these industry leaders, geared to engage the Multimedia research community towards solving relevant, interesting and challenging questions in the multimedia industry’s 2-5 year horizon. The Grand Challenge was first presented as part of ACM Multimedia 2009. and it will be presented again in slightly modified form at ACM Multimedia 2010. Researchers are encouraged to submit working systems in response to these challenges to win the grand Challenge competition!

CeWe Challenge 2010: Automatic Theme Identification of Photo Sets for Digital Print Products

With the advent of digital photography, the number of photos taken has increased tremendously. While only recently, in the analogue days, a small number of films documented a 2-weeks holiday, we are nowadays taking and storing hundreds or even thousands of digital photos. As in the analogue days a common way of preserving memories or making them available to others are individual photo products like posters, calendars or photo books. Today also the processes of designing these products have become digital and thus many companies are offering digitally designed counterparts of the former analogue photoproducts. Popular examples are photo books like the CEWE PHOTOBOOK. In these books the digital photos are arranged over the pages of the book in different designs.

Concrete problem

When a user wants to create a CEWE PHOTOBOOK it can be designed with the help of a dedicated software application. This software first assists the user in selecting and laying out the photos from the selected on the pages and add titles and captions. The user may also design the individual page with regard to the number of images, their position, size, rotation and so on. A concept not only present in the CEWE PHOTOBOOK software is the use of different styles for different kinds of photo books. Such a style influences how photos are laid out on the pages, which backgrounds are chosen or which text font is used for headings and captions. In the CEWE PHOTOBOOK software about 100 styles are available to suit different user tastes and different types of photo books. These styles are defined and designed by skilled designers being experts in both general layout principles and understanding photo book user’s needs. However, albeit being a huge enhancement over pure manual layout, the user still has to manually select the style for his photo set out of a large and potentially overwhelming set of styles.

The challenge

In order to further assist the process of photo book design CEWE is seeking for ways simplify this style selection step. The CEWE PHOTOBOOK software today provides different styles for different events (such as party, holiday, chronicle, or family events), for different seasons (such as Christmas, summer, Easter) but also meeting design styles (classical, funky, cute, …). These styles are organized in different categories. The challenge CEWE wants to address is to support users selecting and assigning the right style to an individual photo set. The goal is, however, not necessarily to automatically determine the one and only perfect style, but rather to provide the user with a reasonable selection of styles he or she can choose from. This selection should fit the user’s preferences, the images in the set and the current structure of the photo book, i.e., the distribution of the photos over the single pages.

Data Set and Evaluation

For researchers interested in the described challenge CEWE COLOR will provide selected photo sets together with a list of suitable styles for each set. These list of styles covers the styles currently offered in the CEWE PHOTOBOOK application. This list of styles is organized in a taxonomy, which will also be provided upon request. The developed methods for matching photo sets to photo book styles should be evaluated on the provided photo sets. Please contact Sabine Thieme ( for details.

About CeWe

CeWe Color is the Number One services partner for first-class trade brands on the European photographic market. CeWe supplies both stores and Internet retailers (e-commerce) with photographic products.

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.

3DLife Challenge 2010: Sports Activity Analysis in Camera Networks

This challenge is designed to facilitate exploration of some of the key research challenges facing the future media internet in a specific application domain, corresponding to sports. Advances in the availability and utility of cameras is rapidly changing the sporting landscape. In professional sports we are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast. High profile examples include the Hawk-Eye Officiating System as used in tennis and cricket or ESPN’s recent announcement to showcase 3D broadcast in its coverage of the 2010 FIFA World Cup. Whilst extremely valuable to the viewing experience, such technologies are really only feasible for high profile professional sports. On the other hand, advances in camera technology coupled with falling prices means that reasonable quality visual capture is now within reach of most local and amateur sporting and leisure organizations. Thus it becomes feasible for every field sports club, whether tennis, soccer, cricket or hockey, to install their own camera network at their local ground. In fact, the same goes for other leisure activities like dance, aerobics and performance art that take place in a constrained environment and that would benefit from visual capture. In these cases, the motivation is usually not for broadcast purposes, or for the technology to act as a “video referee” or adjudicator, but rather to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities.

This challenge focuses on exploring the limits of what is possible in terms of 2D and 3D data extraction from a low-cost camera network for sports. It hopefully provides opportunities for research in areas such as:

  • Content & context fusion for improved multimedia access;
  • 3D content generation leveraging emerging acquisition channels;
  • Immersive multimedia experiences;
  • Multimedia, multimodal & deformable objects search

More generally, the data-set and challenge will hopefully facilitate researchers wishing to address the broader issues posed by the increasing availability of such capture technologies, that brings many new exciting challenges (see for example the recent white paper by the Future Media Internet task force that outlines these challenges.

Tennis is chosen as a case study as it is a sporting environment that is relatively easy to instrument with cheap cameras and features a small number of actors (players) who exhibit explosive and rapid sophisticated motion. Video data from an AV network, corresponding to 9 cameras with built in mics, installed around an indoor court capturing real athletes is provided for experimentation purposes. The capture infrastructure is deliberately set-up to model what is feasible for a local tennis club using commercial off-the shelf components i.e. 720 x 680, MPEG-4 25Hz cameras that are not calibrated or synchronized and that share only limited overlapping fields of view. We are interested in submissions that explore the limits of what is possible from such a real-world capture scenario in terms of:

  • Player localization on court and tracking through multiple camera views;
  • Event-based analysis and human behaviour modeling using multiple views of the same event / activity: one example is robustly classifying every stroke as a serve, backhand, forehand, etc considering fusion across multiple camera views; another example is detecting the game structure automatically (point, game, match).
  • 3D reconstruction of the playing arena and/or the players or their actions; an example is using player location and stroke classification to animate an avatar of the player, even coarsely;
  • Longitudinal analysis of player activity and motion over an entire training session;
  • Novel visualization and feedback mechanisms of any analysis results.


Data features audio and video from up to 9 CCTV-like cameras placed at different points around a tennis court. Camera calibration data is provided. The dataset features 2 players involved in competitive training matches. Court measurements and relative camera placement details are also provided. In addition to video information, accelerometer data from inertial measurement units were also captured with each sequence. Two accelerometers units were placed on each player; one on the player’s dominant forearm, and one on their torso (chest). Each provides time-stamped accelerometer, gyroscope and magnetometer data at their location for the duration of the session.

The data set is now available at the following link:

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.

Alive and kicking

The Multimedia Grand Challenge is alive and kicking. Most challenges on the website still reflect the challenges of 2009, but we are in the process of updating them for the 2010 edition. Check the Nokia 2010 challenge.

In the meantime, we can already announce one innovation for 2010. We open the challenge for all papers (long, short, and demo) accepted for the ACM Multimedia conference. So if your (accepted) paper is related to one of the challenges, you are elegible for the prize money.  More details on this new procedure later.

Stay tuned!

Malcolm & Cees.

Google Challenge 2010: Robust, As-Accurate-As-Human Genre Classification for Video

A notion of browsing collections is naturally associated with videos. Having videos classified into a pre-existing hierarchy of genres is one way to make the browsing task easier. The goal of this task would be to take user generated videos (along with their sparse and noisy metadata) and automatically classify them into genres. A public genre hierarchy like ODP (Open Directory Project) can be used as a target for this task.

Evaluations can be based on purely video content features as well as a combination of content and metadata features. Features that bring in information from other public data sources can also be used (eg. Object detectors trained on a separate public dataset). Thinking of new (and surprising) features is recommended!

Any dataset that reflects a breath of content is acceptable, and of course, YouTube and Google Video are a recommended source. Particularly, the data should cover most of the common video genres. If the dataset consists of web videos, sharing a list of links and corresponding labels would be ideal for researchers to compare notes. You may want to consult the The YouTube Data API for retrieving video data.


We propose two types of evaluations for this challenge:

  • Offline (direct evaluation): Use a labeled test set to measure precision/recall for the ODP categories.
  • Online (indirect): Allow users a browse interface for your classifiers and measure how easily they can find some target concepts (e.g., find a basketball scoring scene). Note that the errors of the classifier can be compensated here since a video can appear in multiple categories, so one could conceive of training for different loss functions here.

The ideal target in this case would match the optimal score for human agreement on the dataset.  If 5 raters categorize each video and we have agreement in 92% of the cases, we expect the automatic classifier to hit the same agreement rate.

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.

Google Challenge 2010: Indexing and Fast Interactive Searching in Personal Diaries

The developing interest in recording digital diaries or archives of one’s life needs a good indexing and search capability to be useful or interesting. Diaries can be any combination of audio, video, geographic location, photos, phone logs, and whatever other multimedia data the user generates or accesses. To make the data accessible, it needs to be parsed into indexable, browsable, and searchable structures such as places, environments, episodes, actions, and events of various sorts, and clustered and tagged with categories, identities, and tags of whatever sort the user proposes. A UI is needed to browse and manually improve these structures and tags, and to search for things that the user knows about but that the system hasn’t yet learned a name for.

The challenge is to develop good schema, algorithms, UI, etc., that will be useful for diaries from audio-only through full-featured multimedia. Specializations to certain contexts, as well as generic systems, are all of interest.

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.

Radvision Challenge 2010: Video Conferencing To Surpass “In-Person” Meeting Experience

Video conferencing is part of a $5 Billion dollar real-time collaboration market that includes audio, video and web conferencing products and services.

The great challenge for Video conferencing vendors is to supply users with a meeting experience that equals or surpasses “in-person” meetings. It is assumed that when meeting experience will be good enough, or even better, the technology could potentially minimize the need for “physical” meetings (at least for business purposes). Such reduction would mean less travel, less cost (to people, organizations, and the planet), better efficiency and better communication.

This challenge focuses on developing new technologies and ideas to surpass the “in-person” meeting experience. In the process a set of subjective and objective measures to evaluate “meeting” experience will be developed. With these measures, alternative solutions could be compared to each other and to in-person meetings, and optimized accordingly.


Not required.


As noted above, we are hoping for new metrics, objective and subjective, to be developed that capture the meeting experience. It is desired to have a high correlation between the objective and subjective metrics, and that metrics are robust and reliable. Those metrics could be used to compare existing video conferencing solutions, in-person meetings and new technologies suggested.

About Radvision

Radvision (Nasdaq: RVSN) is the industry’s leading provider of products and technologies for unified visual communications over IP, 3G and emerging IMS/Next Generation networks - enabling high definition video-conferencing, converged video telephony services, and scalable desktop–based visual communications.

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.