What it is
A Hugging Face Space designed for generating word-level timestamps in audio transcriptions; features include active usage with 217 likes, ongoing discussions, and easy access to files and app interface for real-time audio processing.
Gabriel’s notes
Whisper Timestamped: In-browser speech recognition w/ word-level timestamps – whisper-base (timestamped) is a 73 million parameter speech recognition model that can generate word-level timestamps across 100 different languages. Once loaded, the model (196 MB) will be cached and reused when you revisit the page. Everything runs locally in-browser, using Transformers.js and ONNX Runtime Web, so no API calls are made to a server for inference. You can disconnect from the internet after the model has loaded.
Good fit if you want to:
- create, edit, or analyze audio/video content and media workflows.
Pricing snapshot (auto-enriched): Free tier available with basic CPU Spaces hosting; usage-based pricing applies for upgraded hardware and inference endpoints; monthly subscription plans available for PRO ($9/month) and Team ($20/user/month) with additional features and quotas.
Work-use / compliance snapshot (auto-enriched): Hugging Face Spaces, including Whisper Timestamped by Xenova, are suitable for workplace use with SOC2 Type 2 certification and GDPR compliance, offering GDPR data processing agreements through Enterprise plans, though explicit HIPAA compliance and SSO availability are not clearly stated.
Alternatives (auto-enriched): Alternative: AssemblyAI | Comparison: AssemblyAI offers robust speech-to-text capabilities with advanced features like multi-speaker diarization and custom vocabulary, while Whisper Timestamped focuses on word-level timestamps using the Whisper model on Hugging Face Spaces.
Note: pricing and policy details can change—verify on the official site before making decisions.