Speech to Text: Audio Transcription Tool
PocketPaw can transcribe audio files to text using OpenAI’s Whisper API.
Setup
export POCKETPAW_OPENAI_API_KEY="sk-..."Configuration
| Setting | Env Variable | Default | Description |
|---|---|---|---|
| Model | POCKETPAW_STT_MODEL | whisper-1 | Whisper model to use |
Usage
User: Transcribe this audio file: /path/to/recording.mp3Agent: [uses stt tool] → "Here is the transcription..."Tool Schema
{ "name": "stt", "description": "Transcribe audio to text using OpenAI Whisper", "input_schema": { "type": "object", "properties": { "file_path": { "type": "string", "description": "Path to the audio file to transcribe" }, "language": { "type": "string", "description": "Language code (optional, auto-detected)" } }, "required": ["file_path"] }}Supported Formats
Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm.
Policy Group
Belongs to group:voice.
Related
Voice & TTS
Convert text to speech with OpenAI TTS or ElevenLabs.
OCR Tool
Extract text from images using GPT-4o Vision.
Tools Overview
Browse all 50+ built-in tools available in PocketPaw.
Was this page helpful?