Tools & Resources Archive Details

Whisper Timestamped – a Hugging Face Space by Xenova

What it is

A Hugging Face Space designed for generating word-level timestamps in audio transcriptions; features include active usage with 217 likes, ongoing discussions, and easy access to files and app interface for real-time audio processing.

Gabriel’s notes

Whisper Timestamped: In-browser speech recognition w/ word-level timestamps – whisper-base (timestamped) is a 73 million parameter speech recognition model that can generate word-level timestamps across 100 different languages. Once loaded, the model (196 MB) will be cached and reused when you revisit the page. Everything runs locally in-browser, using Transformers.js and ONNX Runtime Web, so no API calls are made to a server for inference. You can disconnect from the internet after the model has loaded.

Good fit if you want to:

  • create, edit, or analyze audio/video content and media workflows.

Pricing snapshot (auto-enriched): Free tier available with basic CPU Spaces hosting; usage-based pricing applies for upgraded hardware and inference endpoints; monthly subscription plans available for PRO ($9/month) and Team ($20/user/month) with additional features and quotas.

Work-use / compliance snapshot (auto-enriched): Hugging Face Spaces, including Whisper Timestamped by Xenova, are suitable for workplace use with SOC2 Type 2 certification and GDPR compliance, offering GDPR data processing agreements through Enterprise plans, though explicit HIPAA compliance and SSO availability are not clearly stated.

Alternatives (auto-enriched): Alternative: AssemblyAI | Comparison: AssemblyAI offers robust speech-to-text capabilities with advanced features like multi-speaker diarization and custom vocabulary, while Whisper Timestamped focuses on word-level timestamps using the Whisper model on Hugging Face Spaces.

Note: pricing and policy details can change—verify on the official site before making decisions.

Visit the resource