A group of researchers from British universities has successfully trained a deep learning model capable of extracting data from keyboard keystrokes captured through a microphone with an impressive 95% accuracy. Even when using Zoom for training the sound classification algorithm, the prediction accuracy only slightly decreased to 93%, which is still alarmingly high and sets a new record for this type of attack in that medium.
This type of attack poses a serious threat to data security, as it can potentially expose sensitive information like passwords, discussions, messages, and other confidential data to malicious third parties.
What makes acoustic attacks particularly concerning is that unlike other side-channel attacks that often require specific conditions and are limited by data rates and distances, acoustic attacks have become much simpler to execute due to the widespread availability of microphone-equipped devices capable of capturing high-quality audio.
With the rapid progress in machine learning, sound-based side-channel attacks have become more feasible and much more dangerous than previously imagined.
Eavesdropping on keyboard inputs
The initial phase of the attack involves capturing the target’s keyboard keystrokes, which are essential for training the prediction algorithm. This can be accomplished either by using a nearby microphone or by infecting the target’s phone with malware that grants access to its microphone.
Alternatively, keystrokes can be recorded during a Zoom call, where an unauthorized participant correlates the typed messages with the sound recordings.
To obtain the necessary training data, the researchers pressed 36 keys on a modern MacBook Pro 25 times each, recording the sound produced by each keystroke. They then transformed these recordings into waveforms and spectrograms, which reveal distinct patterns for each key. Specific data processing techniques were employed to enhance the signals for keystroke identification.
Using the spectrogram images, the researchers trained a classifier called ‘CoAtNet,’ which is an image classifier. They fine-tuned various parameters like epoch, learning rate, and data splitting to achieve the best prediction accuracy.
Throughout their experiments, the researchers utilized the same laptop with the keyboard commonly used in Apple laptops for the past two years. They also placed an iPhone 13 mini at a distance of 17cm from the target and used Zoom for some recordings.
The CoANet classifier achieved an impressive 95% accuracy when using smartphone recordings and 93% accuracy for recordings obtained through Zoom. Skype resulted in a slightly lower but still usable accuracy of 91.7%.
Potential countermeasures or remedies
The paper proposes several strategies that users concerned about acoustic side-channel attacks can employ. These include modifying their typing styles or using randomized passwords to make it harder for attackers to infer sensitive information from keystroke sounds.
Other defensive measures involve utilizing software to replicate keystroke sounds, employing white noise, or employing software-based keystroke audio filters to obfuscate the captured audio data.
However, it is important to note that the attack model demonstrated effectiveness even against very silent keyboards, making the use of sound dampeners on mechanical keyboards or switching to membrane-based keyboards less effective as countermeasures.
For enhanced security, the paper recommends considering biometric authentication when possible, and relying on password managers to reduce the need for manual input of sensitive information, thus further mitigating the risk of acoustic side-channel attacks.