For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://modelgates.ai/docs/_mcp/server.
Audio
ModelGates supports both sending audio files to compatible models and receiving audio responses via the API. This guide covers how to work with audio inputs and outputs.
Audio Inputs
Send audio files to compatible models for transcription, analysis, and processing. Audio input requests use the /api/v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification.
Note: Audio files must be base64-encoded - direct URLs are not supported for audio content.
You can search for models that support audio input by filtering to audio input modality on our Models page.
Sending Audio Files
Here's how to send an audio file for processing:
import { ModelGates } from '@modelgates/sdk';import fs from "fs/promises"; const modelgates = new ModelGates({ apiKey: '{}',}); async function encodeAudioToBase64(audioPath: string): Promise<string> { const audioBuffer = await fs.readFile(audioPath); return audioBuffer.toString("base64");} // Read and encode the audio fileconst audioPath = "path/to/your/audio.wav";const base64Audio = await encodeAudioToBase64(audioPath); const result = await modelgates.chat.send({ model: "{}", messages: [ { role: "user", content: [ { type: "text", text: "Please transcribe this audio file.", }, { type: "input_audio", inputAudio: { data: base64Audio, format: "wav", }, }, ], }, ], stream: false,}); console.log(result);import requestsimport jsonimport base64 def encode_audio_to_base64(audio_path): with open(audio_path, "rb") as audio_file: return base64.b64encode(audio_file.read()).decode('utf-8') url = "https://modelgates.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {API_KEY_REF}", "Content-Type": "application/json"} # Read and encode the audio fileaudio_path = "path/to/your/audio.wav"base64_audio = encode_audio_to_base64(audio_path) messages = [ { "role": "user", "content": [ { "type": "text", "text": "Please transcribe this audio file." }, { "type": "input_audio", "input_audio": { "data": base64_audio, "format": "wav" } } ] }] payload = { "model": "{{MODEL}}", "messages": messages} response = requests.post(url, headers=headers, json=payload)print(response.json())import fs from "fs/promises"; async function encodeAudioToBase64(audioPath: string): Promise<string> { const audioBuffer = await fs.readFile(audioPath); return audioBuffer.toString("base64");} // Read and encode the audio fileconst audioPath = "path/to/your/audio.wav";const base64Audio = await encodeAudioToBase64(audioPath); const response = await fetch("https://modelgates.ai/api/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${API_KEY_REF}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "{{MODEL}}", messages: [ { role: "user", content: [ { type: "text", text: "Please transcribe this audio file.", }, { type: "input_audio", input_audio: { data: base64Audio, format: "wav", }, }, ], }, ], }),}); const data = await response.json();console.log(data);Supported Audio Input Formats
Supported audio formats vary by provider. Common formats include:
wav- WAV audiomp3- MP3 audioaiff- AIFF audioaac- AAC audioogg- OGG Vorbis audioflac- FLAC audiom4a- M4A audiopcm16- PCM16 audiopcm24- PCM24 audio
Note: Check your model's documentation to confirm which audio formats it supports. Not all models support all formats.
Audio Output
ModelGates supports receiving audio responses from models that have audio output capabilities. To request audio output, include the modalities and audio parameters in your request.
You can search for models that support audio output by filtering to audio output modality on our Models page.
Requesting Audio Output
To receive audio output, set modalities to ["text", "audio"] and provide the audio configuration with your desired voice and format:
import requestsimport jsonimport base64 url = "https://modelgates.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {API_KEY_REF}", "Content-Type": "application/json"} payload = { "model": "{{MODEL}}", "messages": [ { "role": "user", "content": "Say hello in a friendly tone." } ], "modalities": ["text", "audio"], "audio": { "voice": "alloy", "format": "wav" }, "stream": True} # Audio output requires streaming — the response is delivered as SSE chunksresponse = requests.post(url, headers=headers, json=payload, stream=True) audio_data_chunks = []transcript_chunks = [] for line in response.iter_lines(): if not line: continue decoded = line.decode("utf-8") if not decoded.startswith("data: "): continue data = decoded[len("data: "):] if data.strip() == "[DONE]": break chunk = json.loads(data) delta = chunk["choices"][0].get("delta", {}) audio = delta.get("audio", {}) if audio.get("data"): audio_data_chunks.append(audio["data"]) if audio.get("transcript"): transcript_chunks.append(audio["transcript"]) transcript = "".join(transcript_chunks)print(f"Transcript: ") # Combine and decode the base64 audio chunks, then savefull_audio_b64 = "".join(audio_data_chunks)audio_bytes = base64.b64decode(full_audio_b64)with open("output.wav", "wb") as f: f.write(audio_bytes)const response = await fetch("https://modelgates.ai/api/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${API_KEY_REF}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "{{MODEL}}", messages: [ { role: "user", content: "Say hello in a friendly tone.", }, ], modalities: ["text", "audio"], audio: { voice: "alloy", format: "wav", }, stream: true, }),}); // Audio output requires streaming — parse the SSE chunksconst reader = response.body!.getReader();const decoder = new TextDecoder(); const audioDataChunks: string[] = [];const transcriptChunks: string[] = [];let buffer = ""; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += decoder.decode(value, { stream: true }); const lines = buffer.split("\n"); buffer = lines.pop()!; // keep incomplete line in buffer for (const line of lines) { if (!line.startsWith("data: ")) continue; const data = line.slice("data: ".length).trim(); if (data === "[DONE]") break; const chunk = JSON.parse(data); const audio = chunk.choices?.[0]?.delta?.audio; if (audio?.data) audioDataChunks.push(audio.data); if (audio?.transcript) transcriptChunks.push(audio.transcript); }} const transcript = transcriptChunks.join("");console.log(`Transcript: $`); // audioDataChunks joined together is the full base64-encoded audioconst fullAudioB64 = audioDataChunks.join("");Streaming Chunk Format
Audio output requires streaming (stream: true). Audio data and transcript are delivered incrementally via the delta.audio field in each chunk:
{ "choices": [ { "delta": { "audio": { "data": "<base64-encoded audio chunk>", "transcript": "Hello" } } } ]}Audio Configuration Options
The audio parameter accepts the following options:
| Option | Description |
|---|---|
voice | The voice to use for audio generation (e.g., alloy, echo, fable, onyx, nova, shimmer). Available voices vary by model. |
format | The audio format for the output (e.g., wav, mp3, flac, opus, pcm16). Available formats vary by model. |