Rate limits · Tessera

Rate-limiting model

Each tier has a sustained RPM (requests per minute) and a burst capacity that allows short spikes. RPM is measured over a 60s rolling window; burst is allowed for up to 5 minutes per hour. Beyond those 5 minutes, requests above the sustained rate return 429.

Limits by tier

Tier	Sustained RPM	Burst	Per-service sublimits
Async	— (job queue)	—	LLM + embeddings only; no Whisper/TTS
Lite (€450/mo)	50	100 (5 min/h)	Embeddings 100 RPM · Whisper 10 RPM · TTS 10 RPM
Pro (€650/mo)	200	400 (5 min/h)	Embeddings 400 RPM · Whisper 30 RPM · TTS 30 RPM
Scale	500	700 (5 min/h)	No sublimits — full bundle
Enterprise	5,000+	Negotiable per hardware	No sublimits

Response headers

Every API response carries headers so the client can know its quota state without waiting for a 429.

Header	Meaning
`X-RateLimit-Limit-Requests`	Your sustained RPM for this endpoint
`X-RateLimit-Remaining-Requests`	Remaining requests in the current window
`X-RateLimit-Reset-Requests`	Seconds until the quota resets
`Retry-After` (only on 429)	Seconds to wait before retrying

Handling 429

When you receive a 429, the response includes a `Retry-After` header. Wait at least that long before retrying. Recommended exponential backoff with jitter:

Exponential backoff with jitter

import time, random

def call_with_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            delay = max(
                int(e.response.headers.get("Retry-After", "1")),
                (2 ** attempt) + random.random()
            )
            time.sleep(delay)

Upgrading or downgrading tiers

Upgrades take effect immediately (within 60s of confirmation). The new RPM applies on the next window.
Downgrades apply on the next billing cycle to avoid mid-month cuts.
For temporary upgrades (campaign, event), ask support for an "extended burst window" instead of changing tiers.