ACM MM 2009 Header Image

Accenture Challenge: Analysis of Video Footage Captured in Uncontrolled Environments

The proliferation of cameras has led to an explosion of video content. Often, it is necessary to analyze this corpus after (for) an event. An event might involve one or more objects (e.g. people, cars, etc.) and the objects’ interaction with each other. We might then want to search the corpus for similar events or objects that were part of the event. Often, we might not know the objects of interest until we see the events. Sample questions that may rise, then, include:

•    What are some categories of objects and a set of higher-level events that the objects could help identify?
•    Can the system identify key objects if given footage of an event?
•    Can we use the system or parts of it (recognition algorithms, event inference etc) to analyze real-time camera feeds?
•    In many cases, one needs to identify the ‘onset’ of an event, rather than the event itself.  How would a system find event onsets?

Application

We are looking for applications that can address the questions listed above. How would one build such an application/interface that was event and concept centric? What is the ideal interface to search and navigate the video corpus? Can we use the video of the event to allow users to select certain objects and then track/detect those objects?

Input – A video corpus such as data from surveillance cameras, and knowledge about the camera networks (if any). However since this data might be hard to get any available video datasets can be used.

Output – Categories of objects and events that we can identify based on these objects and their interactions, and a good representation for these objects or events. We would like to see what events we can reliably identify based on the objects, which may take domain or task specific knowledge into account. We would like to track higher level semantic events (in the context of the dataset) as opposed to visual events. E.g. In the context of a dataset of sports videos, we do not want to stop at tracking a soccer ball or a red shirt. Rather, we would define the event (at the least) as a soccer game between team X and Y. In the context of surveillance, we would not stop at detecting faces and silhouettes. Rather, we would like to define the event (which is a result of the objects and their interactions) at a higher level - e.g. an assault. More importantly, we would like to see what events the community can define in the context of surveillance.

Evaluation

We will look for performance in the following areas and the ability to work in uncontrolled environments.
•    Categories of objects and events
•    Ability for system to incorporate new objects
•    Precision and timing (retrieval of similar videos)
•    Application: Ease of use in identifying events and categorical assets
Systems can make assumptions regarding video quality (resolution etc) and comparison of these systems will be consistent with the assumptions they make.

Feel free to correspond with the challenge authors via the comments form below.

For private correspondence, consult the About page for contact details.

0 Comments on “Accenture Challenge: Analysis of Video Footage Captured in Uncontrolled Environments”

Leave a Comment