Audio
ModelsExpand Collapse
AudioResponseFormat = "json" or "text" or "srt" or 3 moreThe format of the output, in one of these options: json, text, srt, verbose_json, vtt, or diarized_json. For gpt-4o-transcribe and gpt-4o-mini-transcribe, the only supported format is json. For gpt-4o-transcribe-diarize, the supported formats are json, text, and diarized_json, with diarized_json required to receive speaker annotations.
The format of the output, in one of these options: json, text, srt, verbose_json, vtt, or diarized_json. For gpt-4o-transcribe and gpt-4o-mini-transcribe, the only supported format is json. For gpt-4o-transcribe-diarize, the supported formats are json, text, and diarized_json, with diarized_json required to receive speaker annotations.
AudioTranscriptions
Turn audio into text or text into audio.
Create transcription
ModelsExpand Collapse
Transcription = object { text, logprobs, usage } Represents a transcription response returned by model, based on the provided input.
Represents a transcription response returned by model, based on the provided input.
logprobs: optional array of object { token, bytes, logprob } The log probabilities of the tokens in the transcription. Only returned with the models gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.
The log probabilities of the tokens in the transcription. Only returned with the models gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.
TranscriptionDiarized = object { duration, segments, task, 2 more } Represents a diarized transcription response returned by the model, including the combined transcript and speaker-segment annotations.
Represents a diarized transcription response returned by the model, including the combined transcript and speaker-segment annotations.
Segments of the transcript annotated with timestamps and speaker labels.
Segments of the transcript annotated with timestamps and speaker labels.
TranscriptionDiarizedSegment = object { id, end, speaker, 3 more } A segment of diarized transcript text with speaker metadata.
A segment of diarized transcript text with speaker metadata.
TranscriptionSegment = object { id, avg_logprob, compression_ratio, 7 more }
Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.
Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.
TranscriptionStreamEvent = TranscriptionTextSegmentEvent { id, end, speaker, 3 more } or TranscriptionTextDeltaEvent { delta, type, logprobs, segment_id } or TranscriptionTextDoneEvent { text, type, logprobs, usage } Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
TranscriptionTextSegmentEvent = object { id, end, speaker, 3 more } Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
TranscriptionTextDeltaEvent = object { delta, type, logprobs, segment_id } Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you create a transcription with the Stream parameter set to true.
Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you create a transcription with the Stream parameter set to true.
logprobs: optional array of object { token, bytes, logprob } The log probabilities of the delta. Only included if you create a transcription with the include[] parameter set to logprobs.
The log probabilities of the delta. Only included if you create a transcription with the include[] parameter set to logprobs.
TranscriptionTextDoneEvent = object { text, type, logprobs, usage } Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you create a transcription with the Stream parameter set to true.
Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you create a transcription with the Stream parameter set to true.
logprobs: optional array of object { token, bytes, logprob } The log probabilities of the individual tokens in the transcription. Only included if you create a transcription with the include[] parameter set to logprobs.
The log probabilities of the individual tokens in the transcription. Only included if you create a transcription with the include[] parameter set to logprobs.
TranscriptionTextDeltaEvent = object { delta, type, logprobs, segment_id } Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you create a transcription with the Stream parameter set to true.
Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you create a transcription with the Stream parameter set to true.
logprobs: optional array of object { token, bytes, logprob } The log probabilities of the delta. Only included if you create a transcription with the include[] parameter set to logprobs.
The log probabilities of the delta. Only included if you create a transcription with the include[] parameter set to logprobs.
TranscriptionTextDoneEvent = object { text, type, logprobs, usage } Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you create a transcription with the Stream parameter set to true.
Emitted when the transcription is complete. Contains the complete transcription text. Only emitted when you create a transcription with the Stream parameter set to true.
logprobs: optional array of object { token, bytes, logprob } The log probabilities of the individual tokens in the transcription. Only included if you create a transcription with the include[] parameter set to logprobs.
The log probabilities of the individual tokens in the transcription. Only included if you create a transcription with the include[] parameter set to logprobs.
TranscriptionTextSegmentEvent = object { id, end, speaker, 3 more } Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
Emitted when a diarized transcription returns a completed segment with speaker information. Only emitted when you create a transcription with stream set to true and response_format set to diarized_json.
TranscriptionVerbose = object { duration, language, text, 3 more } Represents a verbose json transcription response returned by model, based on the provided input.
Represents a verbose json transcription response returned by model, based on the provided input.
Segments of the transcribed text and their corresponding details.
Segments of the transcribed text and their corresponding details.
Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.
Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.
usage: optional object { seconds, type } Usage statistics for models billed by audio input duration.
Usage statistics for models billed by audio input duration.
AudioTranslations
Turn audio into text or text into audio.
Create translation
ModelsExpand Collapse
TranslationVerbose = object { duration, language, text, segments }
Segments of the translated text and their corresponding details.
Segments of the translated text and their corresponding details.
Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.
Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.
AudioSpeech
Turn audio into text or text into audio.
Create speech
AudioVoices
Turn audio into text or text into audio.
Create voice
AudioVoice Consents
Turn audio into text or text into audio.