Acoustic Keystroke Recovery - Reconstructing Typed Text from a Laptop Microphone (Full Guide, 85% success rate)

What: A method to recover typed text from a laptop microphone is described.
Impact: Could expose sensitive information during video calls or voice recordings.

On this page Next tutorial Before anything else, the boring but necessary question: who actually gets attacked by this? The realistic threat model is not an attacker physically planting a microphone next to your keyboard. The realistic threat model is the microphone you voluntarily turned on and pointed at your keyboard six hours a day: Video conferencing. Zoom, Teams, Meet, Discord. The other participants - or anyone who got a recording - have a clean stereo capture of every key you pressed during the call. Including the password you typed when you alt-tabbed to log into the production database. Voice notes and phone calls. A laptop on a desk is a near-field microphone. A phone call placed on speaker next to a keyboard is the same. Compromised endpoints. Malware does not need root or kernel access to record audio on most desktop OSes. Browser tabs sometimes do not need it either. Public spaces. Coffee shops, conference rooms, libraries. A phone on the next table is a sufficient capture device with the modern microphones in flagship handsets. This is not a theoretical attack. Asonov and Agrawal demonstrated the first practical version in 2004. Zhuang, Zhou, and Tygar layered language models on top in 2005. In 2023, Harrison, Toreini, and Mehrnezhad published results above 95% top-1 accuracy on a MacBook Pro using only a smartphone recording from 17cm away. The capability has been in the open literature for two decades. What has changed is that the models are now small enough to train on a laptop in an afternoon. The point of this writeup is not to enable attacks. It is to make the threat concrete enough that you actually mute your microphone when you are about to type something sensitive. If you finish reading and do exactly that, the writeup paid for itself. Why it works (the physics) Press the A key on a typical chiclet laptop keyboard. The microphone picks up roughly three things: The push event. The mechanical click of the key reaching the bottom of its travel - the dome collapsing or the scissor mechanism bottoming out. Sharp transient, broadband, ~5–10ms duration. The release event. The dome springing back up. Lower amplitude, slightly different spectral content. ~10–30ms after the push. The chassis resonance. The keystroke vibrates the laptop's case, and the case rings briefly at its own characteristic frequencies. This is the part that leaks position. The third part is where the attack lives. Different keys are different distances from the case's resonant nodes, hit slightly different points on the same shared keyboard plate, and excite slightly different vibrational modes. The differences are small - humans cannot reliably distinguish them - but they are stable, repeatable, and large enough for a neural network to pick up. There is also a behavioral component: most people press different keys with different fingers, at slightly different angles and pressures, which adds another layer of class-distinguishing signal. This is why models trained on one user generalize poorly to other users on the same keyboard, but extremely well to the same user on the same keyboard. It is also why touch typists are easier to attack than hunt-and-peck typists - touch typing is more consistent. The relevant audio band is roughly 400 Hz to 12 kHz. Below 400 Hz you mostly capture room noise and HVAC; above 12 kHz the keystroke energy has fallen off and you are mostly recording microphone hiss. Standard 44.1 kHz or 48 kHz mono capture is more than enough - there is no need for a fancy mic. The setup Hardware: Any laptop with a built-in microphone that you will use both as the recording device and the typing target. Using the same machine for both roles makes data collection drastically simpler. No external mic needed. The point of the exercise is to show what happens with the microphone you already have. Adding a Blue Yeti improves results but obscures the actual threat. Software (Python 3.11+, all from pip ): python -m venv venv source venv/bin/activate pip install numpy scipy librosa soundfile sounddevice pynput torch tqdm matplotlib Copy A note on pynput : this is the keylogger. It runs locally, on your own machine, recording your own keystrokes for the express purpose of building a labeled dataset. Do not run a keylogger on a machine that is not yours. This is not a legal grey area; it is a black one in most jurisdictions, including the EU under the GDPR and various national computer-misuse acts. If you would not be comfortable explaining what you are doing to a judge, do not do it. The rest of this tutorial assumes you are working on hardware you own and recording only your own keystrokes. Working directory layout: keystroke-attack/ ├── collect.py # Records audio + keystroke labels in parallel ├── extract.py # Splits raw audio into per-keystroke clips ├── features.py # Mel-spectrogram extraction ├── train.py # CNN training loop ├── predict.py # Inference on new audio ├── data/ │ ├── raw/ # WAV files + JSON keystroke ...

Read Full Article → ← Back to News

Acoustic Keystroke Recovery - Reconstructing Typed Text from a Laptop Microphone (Full Guide, 85% success rate)

Related Articles

Share this article