Speech to Text: Audio Transcription Tool

PocketPaw can transcribe audio files to text using OpenAI’s Whisper API.

Setup

export POCKETPAW_OPENAI_API_KEY="sk-..."

Configuration

Setting	Env Variable	Default	Description
Model	`POCKETPAW_STT_MODEL`	`whisper-1`	Whisper model to use

Usage

User: Transcribe this audio file: /path/to/recording.mp3
Agent: [uses stt tool] → "Here is the transcription..."

Tool Schema

{
  "name": "stt",
  "description": "Transcribe audio to text using OpenAI Whisper",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "Path to the audio file to transcribe"
      },
      "language": {
        "type": "string",
        "description": "Language code (optional, auto-detected)"
      }
    },
    "required": ["file_path"]
  }
}

Supported Formats

Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm.

Policy Group

Belongs to group:voice.

Voice & TTS

Convert text to speech with OpenAI TTS or ElevenLabs.

OCR Tool

Extract text from images using GPT-4o Vision.

Tools Overview

Browse all 50+ built-in tools available in PocketPaw.

Last updated: April 29, 2026

1 min read

Edit this page

Was this page helpful?