Credit: UNSPLASH / CC0 public domain
Sound is a powerful source of information. By forming algorithms to identify separate sound signatures, the sound can reveal what a person is doing, whether it is cooking, vacuuming or washing dishes. And although it is precious in certain contexts, the use of sound to identify activities is accompanied by confidentiality problems, as microphones can reveal sensitive information.
To allow audio detection without compromising privacy, researchers from Carnegie Mellon University have developed a disc filter, called Kirigami, which can detect and remove human discourse segments collected by audio sensors before being used for the recognition of activities.
“The data contained in sound can help feed valuable applications such as activities recognition, health surveillance and even environmental detection. However, this data can also be used to invade people’s privacy,” said Sudershan Boovaraghavan, who obtained his doctorate. of the Department of Software and Companies (S3D) systems at the CMU computer school. “Kirigami can be installed on a variety of sensors with a microphone deployed in the field to filter the word before the data is sent from the sensor, thus protecting people’s privacy.”
Many existing techniques to preserve confidentiality in audio detection involve modifying or transforming data, including certain frequencies of the audio spectrum or computer training to ignore human speech. Although these methods are effective enough to make conversations indecipherable for humans, generative AI has complicated questions. Vocal recognition programs like Whisper by Openai can reconstruct fragments of conversations from treated audio which were once imperative.
“Given the large amount of data that these models have, some of the previous techniques would leave enough residual information, small extracts, which can help recover part of the content of speech,” said Yuvraj Agarwal, associate professor of S3D, Human-Computer Institute Institute (HCII) and the Department of Electricity and IT engineering. “Kirigami can prevent these models from having access to these extracts.”
In today’s world, devices like intelligent speakers that prioritize privacy can mainly listen to everything that people say. Although the most aggressive option for preserving confidentiality is to avoid using microphones, such an action would prevent people from collecting the advantages of a powerful detection environment. Agarwal and his collaborators wanted to find a solution for developers who would allow them to balance confidentiality and utility.
The intuition of researchers was to design a light filter that could work even on the smallest and most affordable microcontrollers. This filter could then identify and delete the contents of the probable speech so that sensitive data never leave the device – which is often called processing on the edge.
The filter works as a simple binary classifier to find out if there is a speech in the audio. The team designed the filter by empirically analyzing the rate of recognition of the word content disclosed from automatic voice recognition models based on in -depth learning.
Kirigami also balances how aggressively deletes the vocal content possible with a configurable threshold. With an aggressive threshold, the filter favors the suppression of speech, but can also cut an audio of non-super-species which could be useful for other applications. With a less aggressive threshold, the filter allows more environmental sounds and activity to pass for better application values, but increases the risk that a content linked to speech makes it beyond the sensor.
“Kirigami deletes most of the content of speech, but not the other ambient sounds that care about the recognition of activities,” said Haozhe Zhou, an S3D doctoral student who led the project with Boovaraghavan. “You can always couple it with previous techniques to give you additional confidentiality.”
Researchers are currently exploring many useful applications for detecting activities. For example, Mayank Goel, an associate professor of S3D and HCII, uses audio detection to remind people living with the dementia of daily tasks, monitor children with a deficit / hyperactivity disorder for the attention for behavioral abnormalities and assess students for signs of depression.
“These are only examples that are done in our laboratories,” said Goel. “You will find similar scenarios worldwide where you need non -invasive data from the person on their daily life.”
While the interest in intelligent domestic infrastructure and the Internet of things continues to grow, the team believes that developers could easily change Kirigami to meet their unique privacy needs.
Articles detailing Kirigami appeared both in ACM acts on interactive, mobile, laptops and omnipresent technologies and ACM Mobicom ’24: Proceedings of the 30th International International Conference on Computer Science and Mobile Network.
More information:
Haozhe Zhou et al, Word filtering available for the recognition of acoustic activity preserving privacy, Proceedings of the 30th International International Conference on IT and mobile networks (2024). DOI: 10.1145 / 3636534.3698865
Sudershan Boovaraghavan and Al, Kirigami: filtering light speech for activity recognition preserving confidentiality using audio, ACM acts on interactive, mobile, portable and omnipresent technologies (2024). DOI: 10.1145 / 3643502
Supplied by Carnegie Mellon University
Quote: Audio confidentiality protection: speech filtering technology balances confidentiality and usefulness in intelligent devices (2025, April 21) recovered on April 21, 2025
This document is subject to copyright. In addition to any fair program for private or research purposes, no part can be reproduced without written authorization. The content is provided only for information purposes.