Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Speech to Text using OpenAI API

Introduction

The OpenAI API provides a speech to text endpoint that converts spoken audio into written text. This tutorial covers how to use the speech to text endpoint effectively, including its parameters and examples in JavaScript and Python.

Endpoint Overview

The speech to text endpoint utilizes advanced AI models to transcribe spoken language into text format. It can be used for applications such as transcription services, voice-controlled applications, and more.

Using the Speech to Text Endpoint

API Request

To convert speech to text, send a POST request to the endpoint URL with your API key and the audio file.

POST /v1/completions HTTP/1.1
Host: api.openai.com
Content-Type: audio/mpeg
Authorization: Bearer YOUR_API_KEY

[Binary audio data]
                    

In this example, replace [Binary audio data] with the actual binary data of the audio file.

API Response

The API responds with the transcribed text.

HTTP/1.1 200 OK
Content-Type: application/json

{
    "id": "transcript-5ZXXXXXX",
    "object": "text_transcript",
    "created": 1638197413,
    "model": "speech-davinci-002",
    "transcript": "This is an example of speech to text conversion."
}
                    

The response includes the transcribed text from the provided audio file.

Parameters

Here are some common parameters you can use with the speech to text endpoint:

  • model: The model to use for transcribing speech. Example: "speech-davinci-002".
  • audio file: Provide the binary data of the audio file to transcribe.

Examples in JavaScript

Here's how you can use the speech to text endpoint in JavaScript:

const axios = require('axios');
const fs = require('fs');

const apiKey = 'YOUR_API_KEY';
const endpoint = 'https://api.openai.com/v1/transcriptions';

async function transcribeSpeech(audioFile) {
    try {
        const fileData = fs.readFileSync(audioFile);
        const response = await axios.post(endpoint, fileData, {
            headers: {
                'Content-Type': 'audio/mpeg',
                'Authorization': `Bearer ${apiKey}`
            }
        });

        console.log(response.data.transcript);
    } catch (error) {
        console.error('Error:', error);
    }
}

transcribeSpeech('path/to/audio/file.mp3');
                

Examples in Python

Here's how you can use the speech to text endpoint in Python:

import openai

api_key = 'YOUR_API_KEY'
openai.api_key = api_key

def transcribe_speech(audio_file):
    with open(audio_file, 'rb') as file:
        response = openai.Transcription.create(
            model="speech-davinci-002",
            audio=file
        )
        return response['transcript']

transcript = transcribe_speech('path/to/audio/file.mp3')
print(transcript)
                

Conclusion

The speech to text endpoint in the OpenAI API offers a powerful tool for converting spoken audio into written text. By understanding its usage, parameters, and seeing examples in JavaScript and Python, you can integrate speech recognition capabilities into various applications, enabling voice-driven interactions and transcription services.