Speech-to-Text Without the Internet

Your audio never
leaves your machine.

AirScribe runs AI speech recognition entirely on-device. No uploads. No subscriptions. No cloud dependency. Built for teams that handle sensitive audio.

Processing
100% local
Latency
<2s on-device
Languages
99+ supported
Live Transcription Local Mode
14:32:07 Dr. Reyes: The patient presented with intermittent chest discomfort and elevated blood pressure consistent with the past three visits. We'll adjust the dosage accordingly.

14:32:18 Nurse Chen: Noted. I'll update the chart and schedule the follow-up for two weeks out.
Manifesto

The cloud was a necessary compromise. It no longer is.

In 2019, you had no choice. Your phone couldn't run a neural network. Every transcription shipped your audio to someone else's server. That made sense when the alternative was nothing.

Now your laptop runs Whisper faster than most cloud APIs return results.

The economics have flipped. Cloud subscriptions cost $15–30 per month. Over three years that's $540–$1080. AirScribe runs the same model on your hardware — once.

But cost is secondary to risk. When your audio hits a server, it's no longer yours. It can be stored, analyzed, leaked, subpoenaed, or resold. The Terms of Service you agreed to give that company perpetual rights to use your voice data for training.

For a clinician discussing a patient, a lawyer preparing testimony, or a journalist protecting a source — that exposure is unacceptable. AirScribe exists because some conversations should never leave the room.

How It Works

State-of-the-art AI.
Zero network dependency.

AirScribe runs a quantized Whisper model directly on your hardware. The same architecture powering millions of cloud transcriptions — but private, offline, and yours to control.

Quantized Whisper
INT8-quantized Whisper Large runs at ~1.5x realtime on modern laptops. No GPU required. The model is downloaded once and cached locally — subsequent sessions start instantly.
WebAssembly + WASM
All inference runs in a WebAssembly sandbox. Zero native installation required. AirScribe works in any modern browser on Windows, macOS, and Linux — from a single binary or a web page.
🔒
No Server Contact
The model file lives on disk. Audio stays in browser memory during processing. No WebSocket connections, no telemetry, no heartbeat pings. AirScribe generates zero outbound network traffic during transcription.
📋
Speaker Diarization
Pyannote-based speaker segmentation labels different voices automatically. Output includes timestamps, speaker labels, and confidence scores — ready for medical or legal documentation.
🌐
Offline API
A local HTTP API accepts audio and returns JSON transcripts. Drop-in replacement for cloud STT providers. Integrate into existing pipelines without changing your workflow — just point to localhost.
🎯
99+ Languages
Multilingual transcription out of the box. English accuracy rivals human transcriptionists on clean audio. Mixed-language conversations, code-switching, and accented speech handled natively.
Privacy

Compliance-ready.
Out of the box.

AirScribe's architecture eliminates entire categories of data risk. No data leaves your device — so there's nothing to breach, nothing to audit, nothing to misconfigure.

🛡 Security Audit Checklist
No audio stored on servers
Processing happens in-memory. Audio is never written to disk or transmitted.
No training on user data
Model runs locally. Your audio is never used for model training or improvement.
HIPAA-ready architecture
PHI never leaves the device.符合 HIPAA安全港要求 for covered entities.
Zero outbound traffic
After model download, AirScribe operates fully offline. Network can be disabled.
No third-party SDKs in stack
Fully open-source stack. No black-box dependencies, no telemetry, no third-party calls.

Traditional cloud transcription creates a paper trail that compliance teams spend months auditing. AirScribe eliminates the audit surface entirely — the data was never there to protect.

  • No PHI transmitted to external systems
  • No data residency concerns — audio stays in the room
  • Works in air-gapped environments
  • No vendor risk — no SaaS provider to breach
  • Suitable for attorneys-client privilege
  • Journalist source protection built in
Use Cases

Built for sensitive
environments.

AirScribe was designed for professions where confidentiality isn't optional — it's the job requirement.

Clinical Documentation
Ambient scribing for medical encounters. Patient conversations never leave the exam room. Compatible with major EHR systems via local API.
Legal Transcription
Deposition recording and attorney notes. Covers attorney-client privilege, work product doctrine, and court reporting requirements natively.
🎭
Journalism & Research
Interview transcription with source protection. Air-gapped deployment for investigative reporting in hostile environments.
💼
Financial Services
FINRA compliance recording, advisor notes, and client meetings. Works in regulated environments where cloud transcription is prohibited.
🏫
Government & Defense
Classified environment transcription. STIG-compliant deployment for facilities without internet connectivity.
🎓
Academic Research
IRB-compliant participant interviews. Human subjects data never leaves campus infrastructure. Works with institutional firewall policies.
🛠
Field Operations
Remote transcription in low-connectivity environments. Maritime, field research, and disaster response — anywhere the network doesn't reach.
🎯
Developer Integration
Local HTTP API for pipeline integration. Drop-in replacement for Deepgram, AssemblyAI, or Whisper API — same request format, full data control.
Architecture

How audio becomes
transcript — locally.

AirScribe processes audio in a strict sequence with no network calls after initialization.

🎤
Audio Input
Microphone, file upload, or streaming
Input
🔍
VAD (Voice Activity Detection)
Silero VAD — isolates speech segments
Edge
Whisper Inference
INT8 Quantized — local model, browser WASM
Local
📋
Speaker Diarization
Pyannote — timestamps and speaker labels
Local
JSON Transcript
Words, timestamps, speakers, confidence
Output

The model is downloaded once (~150MB for Whisper Large quantized). After that, the entire pipeline runs without internet access. Audio buffers stay in browser memory and are never written to disk or sent to any server.

Model Size
~155MB
RAM Usage
<512MB
Min CPU
Apple M1 / i5+
Languages
99+ languages