Speech-to-Text Without the Internet

Your audio never
leaves your machine.

AirScribe runs AI speech recognition entirely on-device. No uploads. No subscriptions. No cloud dependency. Built for teams that handle sensitive audio.

Processing

100% local

Latency

<2s on-device

Languages

99+ supported

Live Transcription Local Mode

14:32:07 Dr. Reyes: The patient presented with intermittent chest discomfort and elevated blood pressure consistent with the past three visits. We'll adjust the dosage accordingly.

14:32:18 Nurse Chen: Noted. I'll update the chart and schedule the follow-up for two weeks out.

Manifesto

The cloud was a necessary compromise. It no longer is.

In 2019, you had no choice. Your phone couldn't run a neural network. Every transcription shipped your audio to someone else's server. That made sense when the alternative was nothing.

Now your laptop runs Whisper faster than most cloud APIs return results.

The economics have flipped. Cloud subscriptions cost $15–30 per month. Over three years that's $540–$1080. AirScribe runs the same model on your hardware — once.

But cost is secondary to risk. When your audio hits a server, it's no longer yours. It can be stored, analyzed, leaked, subpoenaed, or resold. The Terms of Service you agreed to give that company perpetual rights to use your voice data for training.

For a clinician discussing a patient, a lawyer preparing testimony, or a journalist protecting a source — that exposure is unacceptable. AirScribe exists because some conversations should never leave the room.

How It Works

State-of-the-art AI.
Zero network dependency.

AirScribe runs a quantized Whisper model directly on your hardware. The same architecture powering millions of cloud transcriptions — but private, offline, and yours to control.

⚙

Quantized Whisper

INT8-quantized Whisper Large runs at ~1.5x realtime on modern laptops. No GPU required. The model is downloaded once and cached locally — subsequent sessions start instantly.

⚡

WebAssembly + WASM

All inference runs in a WebAssembly sandbox. Zero native installation required. AirScribe works in any modern browser on Windows, macOS, and Linux — from a single binary or a web page.

🔒

No Server Contact

The model file lives on disk. Audio stays in browser memory during processing. No WebSocket connections, no telemetry, no heartbeat pings. AirScribe generates zero outbound network traffic during transcription.

📋

Speaker Diarization

Pyannote-based speaker segmentation labels different voices automatically. Output includes timestamps, speaker labels, and confidence scores — ready for medical or legal documentation.

🌐

Offline API

A local HTTP API accepts audio and returns JSON transcripts. Drop-in replacement for cloud STT providers. Integrate into existing pipelines without changing your workflow — just point to localhost.

🎯

99+ Languages

Multilingual transcription out of the box. English accuracy rivals human transcriptionists on clean audio. Mixed-language conversations, code-switching, and accented speech handled natively.

Privacy

Compliance-ready.
Out of the box.

AirScribe's architecture eliminates entire categories of data risk. No data leaves your device — so there's nothing to breach, nothing to audit, nothing to misconfigure.

🛡 Security Audit Checklist

✓

No audio stored on servers

Processing happens in-memory. Audio is never written to disk or transmitted.

✓

No training on user data

Model runs locally. Your audio is never used for model training or improvement.

✓

HIPAA-ready architecture

PHI never leaves the device.符合 HIPAA安全港要求 for covered entities.

✓

Zero outbound traffic

After model download, AirScribe operates fully offline. Network can be disabled.

✓

No third-party SDKs in stack

Fully open-source stack. No black-box dependencies, no telemetry, no third-party calls.

Traditional cloud transcription creates a paper trail that compliance teams spend months auditing. AirScribe eliminates the audit surface entirely — the data was never there to protect.

✓ No PHI transmitted to external systems
✓ No data residency concerns — audio stays in the room
✓ Works in air-gapped environments
✓ No vendor risk — no SaaS provider to breach
✓ Suitable for attorneys-client privilege
✓ Journalist source protection built in

Use Cases

Built for sensitive
environments.

AirScribe was designed for professions where confidentiality isn't optional — it's the job requirement.

⚕

Clinical Documentation

Ambient scribing for medical encounters. Patient conversations never leave the exam room. Compatible with major EHR systems via local API.

ᵍ

Legal Transcription

Deposition recording and attorney notes. Covers attorney-client privilege, work product doctrine, and court reporting requirements natively.

🎭

Journalism & Research

Interview transcription with source protection. Air-gapped deployment for investigative reporting in hostile environments.

💼

Financial Services

FINRA compliance recording, advisor notes, and client meetings. Works in regulated environments where cloud transcription is prohibited.

🏫

Government & Defense

Classified environment transcription. STIG-compliant deployment for facilities without internet connectivity.

🎓

Academic Research

IRB-compliant participant interviews. Human subjects data never leaves campus infrastructure. Works with institutional firewall policies.

🛠

Field Operations

Remote transcription in low-connectivity environments. Maritime, field research, and disaster response — anywhere the network doesn't reach.

🎯

Developer Integration

Local HTTP API for pipeline integration. Drop-in replacement for Deepgram, AssemblyAI, or Whisper API — same request format, full data control.

Architecture

How audio becomes
transcript — locally.

AirScribe processes audio in a strict sequence with no network calls after initialization.

🎤

Audio Input

Microphone, file upload, or streaming

Input

↓

🔍

VAD (Voice Activity Detection)

Silero VAD — isolates speech segments

Edge

↓

⚙

Whisper Inference

INT8 Quantized — local model, browser WASM

Local

↓

📋

Speaker Diarization

Pyannote — timestamps and speaker labels

Local

↓

✓

JSON Transcript

Words, timestamps, speakers, confidence

Output

The model is downloaded once (~150MB for Whisper Large quantized). After that, the entire pipeline runs without internet access. Audio buffers stay in browser memory and are never written to disk or sent to any server.

Model Size

~155MB

RAM Usage

<512MB

Min CPU

Apple M1 / i5+

Languages

99+ languages

Your audio neverleaves your machine.

The cloud was a necessary compromise. It no longer is.

State-of-the-art AI.Zero network dependency.

Compliance-ready.Out of the box.

Built for sensitiveenvironments.

How audio becomestranscript — locally.

Your audio never
leaves your machine.

State-of-the-art AI.
Zero network dependency.

Compliance-ready.
Out of the box.

Built for sensitive
environments.

How audio becomes
transcript — locally.