2026-03-15 | Arjoonn S
Simple and elegant voice typing on your laptop using your phone.
About STT Bridge
Laptop microphones are mediocre and prone to CPU fan noise. Phone keyboards have had excellent speech-to-text built in for years. The STT Bridge exploits that gap: it serves a tiny web page from your laptop, you open it on your phone, and every keystroke from your phone's voice keyboard is replayed on your laptop via xdotool. No cloud service, no app install on your phone, no special driver - just a browser tab.
Get started with a one liner on Linux
# Download pre-built binary
curl -L https://midpath.in/cdn/sttbridge/stt -o stt && chmod +x stt
./stt # listens on :8080 by default
Here's a demo :
| Tool | Why / Why not? |
|---|---|
| STT Bridge | Single binary on laptop, quick and good. |
| KDE Connect | Daemon on laptop, app on phone. Keyboard event transmission / configuration issues. |
| Remote Mouse / similar | Invasive install on laptop. Ads on phone app. |
| whisper.cpp / similar | Complex setup! Almost always run into NVIDIA issues! No way to port to team mates easily. |
KDE Connect is great but requires installing an app on the phone and a daemon on the laptop, and it is tightly coupled to the KDE ecosystem. Remote Mouse (and similar tools) require a proprietary app on the phone. whisper.cpp runs an ASR model locally on the laptop , which if you can get it running is quiet powerful; but iff you can get it running.
STT Bridge has none of these trade-offs: the phone's built-in voice keyboard already runs a fast, high-quality on-device model; we just pipe its output to the laptop over HTTP.
The whole system is a single Go binary (~200 lines) that does two things: serve a web UI and translate incoming text into X11 keystrokes.
Because keystrokes are injected at the X11 level through xdotool, whatever application
has focus receives them - exactly as if you had typed on a physical keyboard. This means you can
dictate into:
Focus the window, switch to your phone, dictate. That's it.
Because the keystrokes land exactly like real keyboard input, you can freely mix voice and keyboard typing in the same document. Type a sentence on your laptop, continue with a voice paragraph on your phone, then jump back to the keyboard to fix a word - all without switching context in the application. This makes STT Bridge genuinely powerful for long-form writing, coding comments, and chat: use whichever input mode is faster for the next thing you want to say.
Modern on-device STT keyboards (Google Gboard, Samsung Keyboard, SwiftKey, etc.) do a lot of heavy lifting that you would otherwise need to implement yourself in a model-based pipeline:
..
Say "comma", "question mark", "exclamation mark" and they appear as the correct character.23; many keyboards handle
currency, percentages, and common symbols by voice.All of this happens before the text even reaches the server - you get clean, formatted output without writing a single line of post-processing code.
Mobile STT keyboards don't emit individual keystrokes; they replace the entire textarea value on each
recognition event. The server keeps the previous text it received. On each
POST /type it computes a longest-common-prefix diff:
xdotool key BackSpace once per character that was deleted.xdotool type for the characters that were added.This keeps keystrokes minimal and avoids retyping the entire transcript on every update. A mutex serialises concurrent requests so xdotool calls never interleave.
By default the server listens on your LAN. If you want to dictate from your phone while your laptop
is on a different network - say, your laptop is connect to a public WIFI and you are using mobile data on your phone you can still get it to work:
install a VPN like Tailscale on both devices.
Tailscale assigns each device a stable private IP (e.g. 100.x.y.z) that works regardless
of which physical network either device is on. Point your phone's browser at
http://<tailscale-ip>:8080 and everything works exactly the same way, with
end-to-end WireGuard encryption for free.
No port forwarding, no dynamic DNS, no VPN configuration files - just install Tailscale and go.
Install xdotool, via their installation instructions. After that download the stt binary and use it.
# Download pre-built binary
curl -L https://midpath.in/cdn/sttbridge/stt -o stt && chmod +x stt
./stt # listens on :8080 by default
Open http://<your-laptop-ip>:8080 on your phone, and start typing away!
You can put the binary into a docker container along with xdotool to avoid installing it on your system entirely. We can also remap ports via docker. We have a
docker image save output saved and you can just use that via:
curl -L https://midpath.in/cdn/sttbridge/stt.tar.gz | docker load
Then to allow Docker to send keystrokes to your X session and run:
xhost +local:docker # To allow X session keystrokes
docker run --rm \
--ports "39215:8080" \ # Remap ports in case 8080 is occupied
--restart always \ # Auto start on reboot
-e DISPLAY=$DISPLAY \ # Pass in the display
-v /tmp/.X11-unix:/tmp/.X11-unix \ # For X11 communication
midpath_stt_bridge:latest
xdotool does not work on Wayland (yet) or macOS. Maybe add support for other systems based on demand?STT Bridge is MIT-licensed and free to use, fork, and modify. If it saves you time or you just want to encourage more tools like this, consider supporting us by paying for this tool.
Every contribution helps us spend more time on open, dependency-light developer tools.