Tier summary
| Tier | Sustained RPM | Burst | Context | Thinking |
|---|---|---|---|---|
| Async | Queue | Not applicable | 16K | Not recommended |
| Lite | 50 | 100 for 5 min/h | 8K default · 32K configurable | 100 req/month |
| Pro | 200 | 400 for 5 min/h | 8K default · 32K configurable | 1,000 req/month |
| Pro+ | 500 | 700 for 5 min/h | 16K default · 32K configurable | Unlimited |
| Scale | 5,000+ | Negotiable | Custom | Unlimited · high priority |
Bundle sublimits
- Embeddings have a separate quota so conversational traffic is not blocked.
- Whisper and TTS are limited by RPM and reasonable audio or text size.
- Handle 429 responses with exponential backoff and jittered retries.
When a limit is exceeded
The API returns `429` with enough metadata to retry. On Pro and higher, you can negotiate a larger burst or move batch workloads to Async.