Whisper, OpenAI’s Automatic Speech Recognition system, delivers multilingual, noise-tolerant, and technical-language-ready transcription through a streamlined encoder-decoder architecture. With Vodia PBX’s integration, organizations can choose between using OpenAI’s service or hosting Whisper AI locally for complete data sovereignty and control. This on-premise option ensures that sensitive call data stays within your infrastructure while still benefiting from powerful transcription capabilities. To explore deployment options, see our Whisper AI on-premise setup documentation, review a self-hosted integration example, or follow our cloud-based call transcription guide.
Whisper is OpenAI’s Automatic Speech Recognition (ASR) system. The system has been trained on about 700,000 hours of supervised data, both multilingual and multitask, collected from the Internet. Thanks to this training, accomplished with a diverse and massive set of data, Whisper manages accents, background noise, and technical language with impressive ease. It also performs transcription in numerous languages and translates these languages into American English.
Implemented as an encoder-decoder transformer, Whisper’s architecture is an uncomplicated, end-to-end approach: it breaks input audio into 30-second pieces, which are converted into a log-Mel spectrum and sent through an encoder; the decoder is trained to anticipate the proper text caption, combined with special tokens that direct the single model to undertake language identification, multilingual speech transcription, phrase-level timestamps, and speech translation to-English.
In November of last year we announced a beta version of the Vodia PBX that connects the telephone system to the beta version of the OpenAI realtime API. If your organization prioritizes data sovereignty and on-premises processing, Vodia also supports the deployment of Whisper AI within your dedicated infrastructure. This enables you to maintain full control over your transcription processes, ensuring sensitive call data remains securely within your network boundaries.
To view the transcribed content, simply log in to your user portal, navigate to the 'History' section, select the desired call, then examine the 'call content' area.
To ensure optimal performance when running Whisper AI on your own hardware, refer to the official hardware requirements outlined in the OpenAI Whisper GitHub repository.
Now that we’re supporting real-time AI API integration with OpenAI, we’re also looking at integrating with more AI providers, so we can provide seamless AI integration within workflows. We’d love to tell you all about it - reach out to us at sales@vodia.com or call +1 (617) 861-3490 (United States), +61 2 7201 0788 (APAC), or +49 30 555 78749 (Europe).
Vodia is excited to share the recording of our April 8, 2025 webinar on "Real-Time Media Streaming in Vodia PBX: AI, Call Transcription, and Security in V69.5.6." In this session, we explore how AI, call transcription, and security have become core features in the latest version of our PBX. We dive into the practical benefits of AI, including how it improves business communication with real-time transcription, enhances security with advanced features, and streamlines workflows through integrations like OpenAI and Microsoft Teams. Watch the recording to learn how these innovations can transform your business communications and boost efficiency.
Vodia offers advanced solutions for managing Call Data Records (CDRs) and call recordings, allowing businesses to handle data efficiently. Traditionally, CDRs were stored locally in CSV files, requiring manual transfer to remote locations, which can be time-consuming for organizations handling large volumes. Vodia simplifies this process by providing real-time streaming of CDRs using protocols like WebCDR, JSONS, TCP, and MongoDB, allowing for immediate analysis and custom report generation. Additionally, Vodia integrates with cloud storage services like AWS, DigitalOcean, Linode, and Wasabi, eliminating the need for manual synchronization and local storage. This integration enhances data security, scalability and ease of management.
Vodia PBX now offers real-time call transcription through a seamless integration with Whisper AI, OpenAI’s advanced speech recognition system. With support for multiple languages, technical vocabulary, and noisy environments, Whisper delivers accurate transcriptions even in complex call scenarios. Administrators can enable transcription per tenant using an OpenAI API key, making setup simple and flexible. Once active, all calls are automatically transcribed and accessible in the user portal for easy review and record-keeping. This powerful integration brings enhanced clarity, compliance, and insight to voice communication—whether you're managing support teams, analyzing conversations, or working across language barriers.