Speech to Text using OpenAI API
Introduction
The OpenAI API provides a speech to text endpoint that converts spoken audio into written text. This tutorial covers how to use the speech to text endpoint effectively, including its parameters and examples in JavaScript and Python.
Endpoint Overview
The speech to text endpoint utilizes advanced AI models to transcribe spoken language into text format. It can be used for applications such as transcription services, voice-controlled applications, and more.
Using the Speech to Text Endpoint
API Request
To convert speech to text, send a POST request to the endpoint URL with your API key and the audio file.
POST /v1/completions HTTP/1.1 Host: api.openai.com Content-Type: audio/mpeg Authorization: Bearer YOUR_API_KEY [Binary audio data]
In this example, replace [Binary audio data]
with the actual binary data of the audio file.
API Response
The API responds with the transcribed text.
HTTP/1.1 200 OK Content-Type: application/json { "id": "transcript-5ZXXXXXX", "object": "text_transcript", "created": 1638197413, "model": "speech-davinci-002", "transcript": "This is an example of speech to text conversion." }
The response includes the transcribed text from the provided audio file.
Parameters
Here are some common parameters you can use with the speech to text endpoint:
- model: The model to use for transcribing speech. Example: "speech-davinci-002".
- audio file: Provide the binary data of the audio file to transcribe.
Examples in JavaScript
Here's how you can use the speech to text endpoint in JavaScript:
const axios = require('axios'); const fs = require('fs'); const apiKey = 'YOUR_API_KEY'; const endpoint = 'https://api.openai.com/v1/transcriptions'; async function transcribeSpeech(audioFile) { try { const fileData = fs.readFileSync(audioFile); const response = await axios.post(endpoint, fileData, { headers: { 'Content-Type': 'audio/mpeg', 'Authorization': `Bearer ${apiKey}` } }); console.log(response.data.transcript); } catch (error) { console.error('Error:', error); } } transcribeSpeech('path/to/audio/file.mp3');
Examples in Python
Here's how you can use the speech to text endpoint in Python:
import openai api_key = 'YOUR_API_KEY' openai.api_key = api_key def transcribe_speech(audio_file): with open(audio_file, 'rb') as file: response = openai.Transcription.create( model="speech-davinci-002", audio=file ) return response['transcript'] transcript = transcribe_speech('path/to/audio/file.mp3') print(transcript)
Conclusion
The speech to text endpoint in the OpenAI API offers a powerful tool for converting spoken audio into written text. By understanding its usage, parameters, and seeing examples in JavaScript and Python, you can integrate speech recognition capabilities into various applications, enabling voice-driven interactions and transcription services.