OpenAI Whisper

Product

The name of the base system (platform):	Artificial intelligence (AI, Artificial intelligence, AI)
Developers:	OpenAI
Date of the premiere of the system:	March 2023
Branches:	Information Technology
Technology:	Application Development Tools

Content

2025: Support in ValueAI
2024: Hospital-used OpenAI model found to be hallucinated
2023: Speech-to-Text System Announcement
Notes

Main article: Neural networks (neural networks)

2025: Support in ValueAI

ValueAI Expanded analysis capabilities, data added support for services Yandex SpeechKit and Whisper from for OpenAI converting audio files to text format. This functionality accelerates the extraction of useful audio data from information unstructured data and opens up new application scenarios for business to artificial intelligence automate processes and make decisions. This was WaveAccess (WaveAxess) announced on July 2, 2025. More here.

2024: Hospital-used OpenAI model found to be hallucinated

The model used in hospitals OpenAI turned out to be subject to hallucinations.

Generative models of artificial intelligence are prone to generating incorrect information. Surprisingly, this problem also affected the field of automatic transcription, where the model must accurately play the audio recording. Software engineers, developers and scientists are seriously concerned about OpenAI's Whisper decryptions, Haitek + reported on October 28, 2024, citing the Associated Press.

A University of Michigan researcher found hallucinations in eight out of ten audio recordings. A machine learning engineer who studied more than 100 hours of Whisper transcriptions found errors in more than half of them. And the developer said that he discovered fictitious information in almost all 26,000 transcriptions created by him using Whisper.

Scientists from Cornell University, the University of Washington and other institutions found that Whisper "hallucinates" about 1% of the time, coming up with whole sentences during pauses in records. Pauses are particularly common in the speech of people with aphasia, the researchers note. Sometimes AI-invented phrases contain aggression and racism, and sometimes nonsense.

Hallucinations included fictional medical terms or phrases one would expect from YouTube videos, such as "Thank you for watching!." OpenAI reportedly used more than 1 million hours of YouTube video to decrypt the GPT-4.

All this creates serious risks, since Whisper is applied in medical institutions. Thus, Whisper is used by Nabla as a medical transcription tool. According to her estimates, the model deciphered 7 million conversations with doctors. More than 30,000 doctors and 40 health systems use the AI tool. Nabla is reportedly aware of the Whisper hallucinations and is "addressing this issue."

We thank the researchers for sharing their findings, "said OpenAI^[1] to^[2].

2023: Speech-to-Text System Announcement

On March 1, 2023, OpenAI introduced the application programming interface (API) for the Whisper system, which debuted in September 2022.

Whisper is an intelligent speech-to-text tool trained on 680,000 hours of multilingual and "multitasking" data collected from the internet. The system is able to correctly perceive pronunciation with an accent, identify background noises, as well as technical jargon. According to OpenAI, the solution can "reliably" transcribe speech in several languages, as well as translate from these languages into English. However, Whisper has its limitations, especially in the field of predictive decryption. In addition, the quality of Whisper's work varies between languages.

OpenAI introduces open speech-to-text API

Thanks to the introduction of the Whisper API, third-party developers will be able to integrate this neural network into their applications. It supports working with files in various formats, including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM. The cost of using the Whisper large-v2 model is $0.006 per minute. The resulting text can then be used in other neural network-based applications.

We released the tool, but it wasn't really enough for the entire developer ecosystem to build around it. The Whisper API is the same large model that you can get with open source, but we have optimized it as much as possible. This is much faster and very convenient, - TechCrunch quotes the words of the president and chairman of the board of OpenAI Greg Brockman.

It is noted that the Whisper API is already used by participants in the Speak project - applications based on artificial intelligence for learning foreign languages. In particular, using the API, a "new companion AI product" will be created^[3]

Notes

↑ [https://hightech.plus/2024/10/28/ispolzuemaya-v-bolnicah-model-openai-okazalas-podverzhena-gallyucinaciyam. The OpenAI model used in hospitals turned out
↑ be subject to hallucinations]
↑ OpenAI debuts Whisper API for speech-to-text transcription and translation

Источник — «https://tadviser.com/index.php/Product:OpenAI_Whisper»

The site content is translated by machine translation software powered by PROMT. The machine-translated articles are not always perfect and may contain errors in vocabulary, syntax or grammar. Read original article
If you find inaccuracies or errors in the results of machine translation, please write to editor@tadviser.ru. We will make every effort to correct them as soon as possible.

Simple Link

How to create a "smart plant": Key characteristics of a modern digital enterprise 14500

Model Studio CS: How to use BIM to give new impetus to the development of the fuel and energy complex 16700