GPT-4o mini Realtime Preview is a real-time multimodal model for interactive voice and visual experiences. It handles speech, text, and images with streaming input and output, plus tool/function calling for grounded actions. Typical uses include voice assistants, live call handling, real-time captioning, and visual question answering over camera or screen content. Technical highlights include bidirectional audio, vision understanding, streaming responses, and structured outputs via functions.
Commercial Use
Features
Pricing
API
Versions
Pricing for GPT-4o mini Realtime Preview
Explore competitive pricing for GPT-4o mini Realtime Preview, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how GPT-4o mini Realtime Preview can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)
Official Price (USD / M Tokens)
Discount
Input:$75/M
Output:$300/M
Input:$93.75/M
Output:$375/M
-20%
Sample code and API for GPT-4o mini Realtime Preview
Access comprehensive sample code and API resources for GPT-4o mini Realtime Preview to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of GPT-4o mini Realtime Preview in your projects.
Versions of GPT-4o mini Realtime Preview
The reason GPT-4o mini Realtime Preview has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.