Smart AI removes essential Attributes from Speech Recordings

0
521

Artificial intelligence is a concept that generated from a smart brain that decided to create a machine or device that performs all our needs as per our commands. Let’s take the example of Alexa/Siri, they activate their system with a voice command from their user. But one of the greatest setbacks of these AIs is that they can give out private attributes like gender in speech data.

In a recent study conducted by the researchers at the Chalmers University of Technology and the RISE Research Institutes of Sweden, they found voice recognition systems were able to give away private information that caused risk to the user. They developed a system that generates a filter like a system to obfuscate attributes like gender from a speech data and then generate new and private information independent of the filtered details, ensuring sensitive information remains hidden without sacrificing realism or utility.

Maintaining privacy with voice assistants might be a challenging task, given state-of-the-art AI techniques are accustomed to infer attributes like intention, gender, emotion, and identity from timbre, pitch, and speaker style. Recent reporting revealed that accidental voice assistant activations exposed private conversations; the danger is specified law firms, including Mishcon de Reya, have advised staff to mute smart speakers once they discuss client matters reception. Almost every voice recognition platforms like Google Assistant, Siri, Cortana, and allow users to delete recorded data, but this needs some — and in several cases substantial — effort.

The researchers were able to develop a Generative Adversarial Network (GAN) which is a two-part AI model. One part deals with a generator that creates samples and the other is a discriminator that differentiates between the generated samples and real-world samples. It plots speech recordings to Mel spectrograms, or representations of the spectrum of occurrences of the audio signal because it varies over time, and passes them through a filter that removes sensitive information and a generator that adds synthetic information in its place. PCMelGAN then inverts the Mel spectrogram output into audio within the kind of a raw waveform.

The experiment was performed on 10,000 samples wherein the speaker would say a word or gender that the algorithm would recognize as private information, it would then filter and make the voice groggy-like so it will become unclear for external sources to recognize.

Overall, it will be a huge success if it works as it is implemented, for safer and secure use of digitalized platforms.