This challenge is designed to facilitate exploration of some of the key research challenges facing the future media internet in a specific application domain, corresponding to sports. Advances in the availability and utility of cameras is rapidly changing the sporting landscape. In professional sports we are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast. High profile examples include the Hawk-Eye Officiating System as used in tennis and cricket or ESPN’s recent announcement to showcase 3D broadcast in its coverage of the 2010 FIFA World Cup. Whilst extremely valuable to the viewing experience, such technologies are really only feasible for high profile professional sports. On the other hand, advances in camera technology coupled with falling prices means that reasonable quality visual capture is now within reach of most local and amateur sporting and leisure organizations. Thus it becomes feasible for every field sports club, whether tennis, soccer, cricket or hockey, to install their own camera network at their local ground. In fact, the same goes for other leisure activities like dance, aerobics and performance art that take place in a constrained environment and that would benefit from visual capture. In these cases, the motivation is usually not for broadcast purposes, or for the technology to act as a “video referee” or adjudicator, but rather to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities.
This challenge focuses on exploring the limits of what is possible in terms of 2D and 3D data extraction from a low-cost camera network for sports. It hopefully provides opportunities for research in areas such as:
- Content & context fusion for improved multimedia access;
- 3D content generation leveraging emerging acquisition channels;
- Immersive multimedia experiences;
- Multimedia, multimodal & deformable objects search
More generally, the data-set and challenge will hopefully facilitate researchers wishing to address the broader issues posed by the increasing availability of such capture technologies, that brings many new exciting challenges (see for example the recent white paper by the Future Media Internet task force that outlines these challenges.
Tennis is chosen as a case study as it is a sporting environment that is relatively easy to instrument with cheap cameras and features a small number of actors (players) who exhibit explosive and rapid sophisticated motion. Video data from an AV network, corresponding to 9 cameras with built in mics, installed around an indoor court capturing real athletes is provided for experimentation purposes. The capture infrastructure is deliberately set-up to model what is feasible for a local tennis club using commercial off-the shelf components i.e. 720 x 680, MPEG-4 25Hz cameras that are not calibrated or synchronized and that share only limited overlapping fields of view. We are interested in submissions that explore the limits of what is possible from such a real-world capture scenario in terms of:
- Player localization on court and tracking through multiple camera views;
- Event-based analysis and human behaviour modeling using multiple views of the same event / activity: one example is robustly classifying every stroke as a serve, backhand, forehand, etc considering fusion across multiple camera views; another example is detecting the game structure automatically (point, game, match).
- 3D reconstruction of the playing arena and/or the players or their actions; an example is using player location and stroke classification to animate an avatar of the player, even coarsely;
- Longitudinal analysis of player activity and motion over an entire training session;
- Novel visualization and feedback mechanisms of any analysis results.
Dataset
Data features audio and video from up to 9 CCTV-like cameras placed at different points around a tennis court. Camera calibration data is provided. The dataset features 2 players involved in competitive training matches. Court measurements and relative camera placement details are also provided. In addition to video information, accelerometer data from inertial measurement units were also captured with each sequence. Two accelerometers units were placed on each player; one on the player’s dominant forearm, and one on their torso (chest). Each provides time-stamped accelerometer, gyroscope and magnetometer data at their location for the duration of the session.
The data set is now available at the following link:
http://www.cdvp.dcu.ie/tennisireland/TennisVideos/acm_mm_3dlife_grand_challenge/
Feel free to correspond with the challenge authors via the comments form below.
For private correspondence, consult the About page for contact details.







on Apr 6th, 2010 at 9:52 am
[...] challenge is on! The 3DLife Challenge 2010: “Sports Activity Analysis in Camera Networks” is now live within the ACM Multimedia Grand Challenge [...]
on Apr 26th, 2010 at 9:04 am
It is written : “Data will be available in early April 2010″
was the dataset published?
on Apr 26th, 2010 at 9:51 am
Hi,
Data release was delayed slightly due to some last minute problems with the capture session. However, the data set will be released within the next couple of days
Noel
on Apr 27th, 2010 at 10:39 am
All,
The data set for the 3DLife grand challenge has now been released. Go to the following URL, download and fill out the provided document, scan and send back to 3DLifeGrandChallenge@gmail.com and you’re good to go!
http://www.cdvp.dcu.ie/tennisireland/TennisVideos/acm_mm_3dlife_grand_challenge/
Noel & Phil
on Jun 1st, 2010 at 11:25 am
Hi,
The data set description in http://www.cdvp.dcu.ie/tennisireland/TennisVideos/acm_mm_3dlife_grand_challenge/ reads: “the start time of each video is synchronised via software at the start of each sequence”. What does it precisely mean?
a) The capture command for the drivers was issued by software at the same precise time (this does not guarantee synchronized start)
b) The headers of the video files have been timestamped with the exact time reference of the first frame? (where the clocks of the capturing devices synchronized?).
Thanks for your answers,
Josep