Research Projects

SCalable Room Acoustic Modelling (SCReAM)

Funder:

EPSRC (UK)

Institution:

University of Surrey (UK)

Budget:

£407k

Wesbite:

Click here (opens new tab)

My role:

I am the Principal Investigator of the project.

We spend the majority of our lives indoors. Within enclosed spaces, sound is reflected numerous times, leading to reverberation. We are accustomed to perceiving reverberation-we unconsciously use it to navigate the space, and, when absent, we notice. Similarly, our electronic devices, such as laptops, TVs or smart home devices, are exposed to reverberation and need to take into account its presence. Being able to predict, synthesise, and control reverberation is therefore important. This is done using room acoustic models.

Existing room acoustic models suffer from two main limitations. First, they were originally developed from very different starting points and for very different purposes, which has led to a highly fragmented research field where advancements in one area do not translate to advancements in other areas, slowing down research. Second, each model has a specific accuracy and a specific computational complexity, with some very accurate models taking several days to run (physical models), while others run in real-time but with low accuracy and only aim to create a pleasing reverberant sound (perceptual models). Thus, there is no single model that allows to scale continuously from one extreme to the other.

This project will overcome both limitations by defining a novel, unifying room acoustic model that combines appealing properties of all main types of models and that can scale on demand from a lightweight perceptual model to a full-scale physical model. Such a SCalable Room Acoustic Model (SCReAM) will bring benefits in many applications, ranging from consumer electronics and communications, to computer games, immersive media, and architectural acoustics. The model will be able to adapt in real time, enabling end-users to get the best possible auditory experience allowed by the available computing resources. Audio software developers will not need to update their development chains once more powerful machines become available, thus reducing costs. Electronic equipment, such as hands-free devices, smart loudspeakers, and sound reinforcement systems, will be able to build a more flexible internal representation of room acoustics, allowing them to reduce unwanted echoes, to remove acoustic feedback, and/or to improve the tonal balance of reproduced sound.

The research will be conducted at the University of Surrey with industrial support by Sonos (audio consumer electronics), Electronic Arts (computer games), Audio Software Development Limited (computer games audio consultancy), and Adrian James Acoustics (acoustics consultancy).

The Spatial Dynamics of Room Acoustics (SONORA)

Funder:

ERC (European Commission)

Institution:

KU Leuven (Belgium)

Budget:

€2m

Wesbite:

Click here (opens new tab)

My role:

Strategic Collaborator of the project. PI is Prof Toon van Waterschoot.

The research grant is related to a research project that will be executed by prof. van Waterschoot and his team over the next five years. The project “The Spatial Dynamics of Room Acoustics (SONORA)” departs from the question how sound propagation in rooms can be modelled in case sound sources and observers are moving. Even if we are faced with such dynamic acoustic scenarios in everyday life, the current scientific understanding of the spatial dynamics of room acoustics is limited. The primary objectives of the SONORA project are therefore to develop efficient models for spatially dynamic room acoustics, and subsequently use these models to design novel measurement protocols and signal processing algorithms that can be employed in dynamic acoustic scenarios. In the long term, these fundamentally new results will facilitate a broad variety of applications in hearing technology, virtual reality, human-machine interaction, musicology, and acoustic monitoring.

Environment-aware Listener-Optimized Binaural Enhancement of Speech (E-LOBES)

Funder:

EPSRC (UK)

Institution:

Imperial College London (UK), University College London (UK)

Budget:

£984k

Wesbite:

Click here (opens new tab)

My role:

I collaborated on the project during a 6-months research visit at Imperial. PI was Dr Mike Brookes, and Co-I was Prof Patrick Naylor.

Age-related hearing loss affects over half the UK population aged over 60. Hearing loss makes communication difficult and so has severe negative consequences for quality of life. The most common treatment for mild-to-moderate hearing loss is the use of hearing aids. However even with aids, hearing impaired listeners are worse at understanding speech in noisy environments because their auditory system is less good at separating wanted speech from unwanted noise. One solution for this is to use speech enhancement algorithms to amplify the desired speech signals selectively while attenuating the unwanted background noise.

It is well known that normal hearing listeners can better understand speech in noise when listening with two ears rather than with only one. Differences between the signals at the two ears allow the speech and noise to be separated based on their spatial locations resulting in improved intelligibility. Technological advances now make feasible the use of two hearing aids that are able to share information via a wireless link. By sharing information in this way, it becomes possible for the speech enhancement algorithms within the hearing aids to localize sound sources more accurately and, by jointly processing the signals for both ears, to ensure that the spatial cues that are present in the acoustic signals are retained. It is the goal of this project to exploit these binaural advantages by developing speech enhancement algorithms that jointly enhance the speech received by the two ears.

Most current speech enhancement techniques have evolved from the telecommunications industry and are designed to act only on monaural signals. Many of the techniques can improve the perceived quality of already intelligible speech but binary masking is one of the few techniques that has been shown to improve the intelligibility of noisy speech for both normal and hearing impaired listeners. In the binary masking approach regions of the time-frequency domain that contain significant speech energy are left unchanged while regions that contain little speech energy are muted. In this project we will extend existing monaural binary masking techniques to provide binaural speech enhancement while preserving the inter-aural time and level differences that are critical for the spatial separation of sound sources.

To train and tune our binaural speech enhancement algorithm we will also develop within the project an intelligibility metric that is able to predict the intelligibility of a speech signal for a binaural listener with normal or impaired hearing in the presence of competing noise sources. This metric is the key to finding automatically the optimum settings an individual listener's hearing aids in a particular environment.

The final evaluation and development of the binaural enhancement algorithm assess speech perception in noise in a panel of hearing-impaired listeners who will also be asked to assess the quality of the enhanced speech signals.

Dereverberation and Reverberation of Audio, Music and Speech (DREAMS)

Funder:

FP7 (European Commission)

Institution:

KU Leuven (Belgium), Imperial College London (UK), Aalborg University (Denmark), University of Oldenburg (Germany)

Budget:

€4.1m

Wesbite:

Click here (opens new tab)

My role:

Postdoctoral Fellow and coordinator of WP1.

The DREAMS Initial Training Network will investigate the problem of modeling, controlling, removing, and synthesizing acoustic reverberation with the aim of enhancing the quality and intelligibility of audio, music, and speech signals. The proposed research and training program builds upon four disciplines that are equally important in understanding and tackling the (de-)reverberation problem: room acoustics, signal processing, psychoacoustics, and speech and audio processing. The strong commitment of the private sector in the proposed ITN consortium, consisting of 4 academic and 8 industrial partners, illustrates the timeliness and importance of the (de-)reverberation problem in a wide variety of applications. However, carrying application-driven solutions is not the only objective of the DREAMS research program. Indeed, the aim is also to take a significant step forward and make fundamental scientific contributions in each of the four disciplines mentioned earlier. To this end, the envisaged ITN will host 12 early stage researchers and 4 experienced researchers, each performing an individual research project around one of four themes that reflect the most challenging open problems in the area of (de-)reverberation. The DREAMS ITN will be implemented such as to maximize the international and intersectoral experience of the research fellows, by defining relevant secondments in academia and industry, both in the host country and abroad. Moreover, experienced researchers will be expected to take on a supervisory role in coordinating one of the four research themes, with the aim of developing solid skills in leadership and research management. Finally, a training program of extremely high quality is proposed, with local as well as network-wide training, which relies on the scientific excellence of the involved partners and of invited external researchers, and which heavily depends on the input of the private sector.

Perceptual Soundfield Reconstruction (PSR)

Funder:

EPSRC (UK)

Institution:

King's College London (UK)

Budget:

£390k

Wesbite:

Click here (opens new tab)

My role:

I was a PhD student funded under the project. The PI was Prof Zoran Cvetkovic.

The project is concerned with the development of a new 5--10 channel audio technology which would improve over existing ones in terms of (a) realism, (b) accuracy and stability of the auditory perspective, (c) size of the sweet spot, and (d) the envelopment experience. Since the new technology aims to create a 360 degrees auditory perspective, the reproduction will take place over speakers positioned at vertices of a regular polygon. Each speaker will consist of two components, one which will radiate the direct sound field toward a listener, and another which will reproduce diffuse sound field by introducing additional scattering. The goal of the particular tasks, listed below, is to find optimal ways to capture sound field cues and render them using the proposed playback system in a manner which would provide the most convincing illusion of the original or desired sound field.(i) Optimal microphone arrays for the proposed play-back system will be investigated. Arrays considered will consist of microphones placed in the horizontal plane at the vertices of a regular polygon, with the number of microphones equal to the number of speakers. For each array, different diameters, in the range from near coincident up to somewhat beyond the optimal value, and different microphone directivity patterns will be considered. These studies will be repreated for a few diameters of the speaker configuration to investigate if the optimal array diameter depends on the size of the speaker lay-out, and if so to characterize that dependence. Possible dependencies between the optimal microphone directivity patterns and array diameters will be also investigated and characterized. Arrays will be evaluated in critical listening tests according to criteria (a)--(d) stated in the above. Experiments will be guided by simulations which would provide initial objective assessment of ITD and ILD cues generated within the listening area. In parallel, mathematical models of sound fields generated by the proposed technology will be investigated, which could provide some additional insight into the optimal microphone array design. (ii) The impact of play-back with cross-talk cancellation will be be systematically investigated. Existing cross-talk cancellation algorithms will be first used, and if necessary, new algorithms which are numerically efficient and effective in a range of listening environments will be developed. Then optimal microphone arrays for play back with cross-talk cancellation will be investigated, i.e. the work described under (i) will be repeated for reproduction with cross-talk cancellation. Finally, optimal systems with and without cross-talk cancellation will be compared.(iii) Algorithms for direct/diffuse sound field separation will be studied. When the number of instruments does not exceed the number of microphones, multichannel equalization techniques can be used to find dry source signals, which can then be convolved with direct/reverberant parts of room impulse responses to obtain direct/diffuse sound field components, respectively. Multichannel equalization in audio is, however, particularly challenging owing to excessively long impulse responses, and we will develop numerically efficient algorithms for multichannel equalization for audio applications. Then we will study psychoacoustic approximation to direct/diffuse sound field decomposition with no restriction on the number of sources. (iv) Combinations of near-coincident directional microphone arrays, for acquiring direct sound field cues, and widely spaced arrays based on omni-directional or bi-directional microphones, for acquiring diffuse sound field cues, will be systematically investigated in critical listening tests according to criteria (a)--(d). This approach will be evaluated in comparison with the approach described in (i)--(iii) where the same array is used for both sound field components.

Other Projects

Covid Listening Project

Institution:

University of Surrey (UK)

Wesbite:

Click here (opens new tab)

In the news:

One of the outputs of the project was featured on prime-time Italian national TV programme, diMartedi' (~2-3 million viewers).

My role:

I initated the project with the collaboration of Dr Milton Mermikides.

These uncertain times have brought into sharp focus the importance of scientific research and communication. Our understanding of the world is filtered through the filters of our conceptual and scientific knowledge, and the value of multi-modal representations of data is growing. In response to this we have set up the Covid-19 Listening Project dedicated to the sonic and musical representation of Covid-19 data.

In consultation with geneticist Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy), programming and music technology resources are employed in the communication of relevant genetic patterns in the disease. Such data as the structure of the Covid-19 genome, the points of mutation within it, protein coding, the comparative genetics in the phylogenetic tree of Covid-19 samples and their geographical distribution are translated into pitch, rhythms and harmonies to create rich and compelling communicative works.

A 42-minute choral piece and a genomic-spatial realisation have already been produced using these techniques.

National Gallery X

Wesbite:

Click here (opens new tab)

In the news:

The opening featured on the Financial Times: [article]. The event saw the participation of Sir Tim Berners Lee and Gabriele Finaldi, the National Gallery's director.

My role:

In collaboration with Prof Zoran Cvetkovic and Dr Ali Hossaini (https://pantar.com), the project's leader, I have helped set up the surround sound system and accompanying auralisation technology within NGX--the new experimental space of the National Gallery.

Working in partnership with King's College London, the National Gallery is setting out to create the sorts of new museum experiences technology could make possible in ten years' time. The challenge for NGX is to create these experiences today.

2019 marks 30 years since the birth of the internet and five years until the Gallery turns 200. It’s an important moment, where the next generation of technology could profoundly alter how art is created, presented and engaged with.

We are certain that new technology, such as 5G, advanced robotics and artificial intelligence, will change the world in the next ten years, but how will they change the Gallery, as it enters its third century?

How can contemporary artists, thinkers and creatives use these technologies in their practice and how can they help us understand old masters in new ways? How can they create new ways to access, engage and experience the Gallery and its collection?

Through a partnership with King’s College London, the National Gallery opened a studio at the Gallery. The studio will provide a space for residencies and events where artists and creatives can explore experimental technologies as well as critical arts, humanities and social science research on culture and the (digital) creative industries, at King’s. This exploration into current King’s research will be combined with the art and audiences of the Gallery to form a unique vision of the museum of the future.

Ouroboros

Wesbite:

Click here (opens new tab)

My role:

Together with Prof Zoran Cvetkovic and Keir Vine, I helped Dr Ali Hossaini (https://pantar.com), the project's leader, to set up a multichannel sound composition for the piece. The performance was toured at the Click Festival (Denmark) and around the world.

“Ouroboros: The History of the Universe,” a 3-D visual collage of vibrating mandalas, exploding galaxies, astronauts and corporate logos, among much more, on six screens, all in the service of reconnecting consciousness and cosmos.
- The New York Times

Ouroboros is as an attempt to reconcile science with spiritual longing. I call it a 'cathedral of science' because I want to erode the boundaries between art and science - the split between knowing and feeling that's led humanity towards extinction.

The Piano

Institution:

King's College London (UK)

Wesbite:

Click here (opens new tab)

My role:

Together with Prof Zoran Cvetkovic and Prof Huseyin Hacihabiboglu, I helped creating a multichannel sound experience around performances of piano superstar Yuja Wang.

Imagine watching a natural history documentary and feeling transported to the Amazon rainforest itself by the incredibly convincing realistic surround sound.

That’s just the hope of researchers who say they have made a major technology breakthrough which could transform the auditory landscapes of live entertainment, film, music, gaming and virtual reality.

The system has room simulators that actually sound like the spaces they are trying to emulate – whether it is a broadcast from the Royal Opera or an event at a massive sporting arena.

Current commercial surround sound systems do not really deliver, they say, whereas advanced solutions like wavefield synthesis –use hundreds of channels, making them impractical for mainstream use.

This technology is pragmatic, they say, and can be used either for recording and live broadcasting –using an array of microphones or in a ‘virtual’ form as a synthesised playback to create a convincing illusion of a certain space.

The King’s team behind the research has a number of EU and US patents for the technology and is currently in talks with major entertainment companies.

Project leader Professor Zoran Cvetkovic of King’s Department of Informatics said: ‘What we want to achieve is to reproduce the ‘real thing – a very crisp and clear auditory perspective which places you faithfully inside of that space – rather than somewhere else that sounds unreal.’

As visual technology, VR and 3D visuals rapidly advance, it is vital that sound matches what you see, he says.

*To explore the technology’s use, globally acclaimed concert pianist Yuja Wang visited King’s earlier this year in a ground-breaking experiment. With a team from renowned events company 59 Productions and members of King’s NMS, they recorded Ms Wang in an adapted space in the Great Hall and then invited an audience to witness its playback in an immersive, audio-visual experience.

Ms Wang said: ‘I’ve loved it. Playing old pieces in a very new way. It’s an experience, and experimental. They attached cameras all over me; on my arms, my shoulder, my chest.

‘The audio recordings are great quality. And they can manipulate the sound as if you are in a really small room or in the Carnegie Hall. I like the idea of being to hear what’s happening in and around the piano. And it’s King’s Audio Department who are doing all of that. Amazing.’


''I’ve loved it. Playing old pieces in a very new way. It’s an experience, and experimental.''
– Yuja Wang, Globally acclaimed concert pianist