February 28, 2016 by Frederic Fol Leymarie

CVMP London, BFI 2015


Movement Description and Gesture Recognition for Live Media Arts - presented by DynAikon at the CVMP 2015 symposium, BFI London 2015.

Our research aims to develop novel techniques able to recognise different sequential gestures, to the level where they will describe and compute articulated movements in real time. In the context of live media arts, the research outcomes would change the paradigm of creating, learning, performing, designing for live media arts, by giving feedback on performance after analysing, in real time, the streaming video of the performance.

Understanding how we recognise and respond to rhythm patterns kinaesthetically, may support the development of movement description and gesture recognition in real time.

One of the main problems in computer vision is defining and describing movement. Traditional approaches mainly rely either on different body-mounted sensors or marker-oriented approaches. Here, the performers have to adapt their natural performance or require substantial training to adopt the tools in their performance. Our principal focus is the development of a system that will free performers from constraints of wearable technology, and allow them to perform in a natural way, without movement and coordination restrictions. Secondly, this will also allow greater precision in analysing the movements while tracking the performance, thus obtaining higher levels of details of movement, allowing in turn the performers to track their performance with greater ease.

Key problems in computer vision for the live media arts include the lack of realtime feedback on performance, the inability to seamlessly control body movement in response to audio and visual output, and the unavailability of consistent feedback and evaluation to the performers on their practice in real time. The main goals of our work are:

  • To make an analytical tool for performers that will act as a real-time evaluator and assessor by providing generative feedback on their performance. Expert performers are involved in the training of the evaluation system in order to generate valid ground truth-data. Our research in computer vision and information retrieval algorithms provide an algorithmic chain capable of: (i) analysing shapes in sequential video-frames, (ii) extracting vision based features that will be then used to detect the performers’ gestures and movements. Here, the input and feedback of expert performers will be used in order to build a vocabulary of gestures and to validate the results. This will create a feedback loop between performer and the system that will also be useful in the implementation of state-of-the-art machine learning techniques to improve the performance and accuracy of the system over time. In the context of live media arts practice, it will stimulate new approaches to creating, rehearsing, performing and evaluating work.

  • To make a creative tool for multi-media augmentation of live performance. Once the shape has been described and analysed, the system can be used as a tool to augment the surrounding and the stage with multi-media techniques (e.g. projections, interactive audio, responsive lighting, robotics) tightly synced with the movements of the performers and without affecting their freedom of movement.

  • To make a learning tool that will act as a virtual personal tutor to performers with different levels of experience. The system will be able to produce detailed descriptions of the performer’s shape and movements. This information, coupled with machine learning techniques, will allow the training of the system with the input of a number of expert performers. Once the system will reach a level judged acceptable by the experts, it will be possible to use it as a virtual tutor for performers with different levels of experience. The trainee will also be able to choose the preferred learning style, depending on the styles of the experts that initially trained the system. Moreover, colours and textures (e.g. costumes, stage materials, etc.) can be added to provide more information for improving accuracy.We present our most recent results in applying a novel representation of dynamic shapes when undergoing articulated movement. Our method is based on an adaptation of perception-based results including the codon features and medialness hotspots. In every frame of a video sequencce (from a single camera) we can reliably obtain a shape representation which expresses in a compact way the highest values of medialness (aka hot spots) and the most descriptive convexities and concavities along contours

The outcomes of the research may lead to applications in areas where a change of shape and colour is indicative of variations in characteristics of the observed object in motion or under transformation: VFX, gesture interface development, enhancing learning for disabilities (e.g improve coordination through feedback), medical procedures (enhanced feedback on surgical procedures, anomaly detection), scientific measurements and observation of dynamics processes (e.g. in graphonomics).