OpenAI-compatible API · no token-meter

Dedicated AI inference in your region. Flat monthly invoice.

Tessera replaces OpenAI with a base_url change. Open-source models on dedicated GPU in EU, LATAM and US. No token-meter, signed DPA, human support in English or Spanish.

5-minute trial · 3 lines of code Dedicated GPU in EU, LATAM and US GDPR and AI Act by design
python · three lines to try
from openai import OpenAI

client = OpenAI(
  base_url="https://api.tesseraai.cloud/v1",
  api_key="sk-tessera-…",
)
Measured benchmarks

Numbers we measured, not promised.

Metrics taken on a sustained run on real infrastructure with 25 active customers. No per-token competitor publishes their own infrastructure benchmarks. We do.

TTFT P95
350 ms
qwen3.6-35b-a3b
Success rate
99.95 %
8,000 requests · 0 generation errors
Included capacity
100 RPM
per Pro customer · burst 200
Tokens validated
5.91 M
on a 1 h 50 min sustained run

Measured on 2026-04-27 on RTX PRO 6000 Blackwell with 25 simultaneous customers. Full report (saturation curve, noisy neighbor, long context) available under NDA.

Why Tessera

Three pillars. No fine print.

Predictability over virtuosity

Flat monthly fee. The invoice fits in a single cell. No surprises on Black Friday or end of quarter. Your CFO signs without flinching and you stop defending variance to the finance committee.

Sovereignty by design

GPU physically in EU, LATAM or US, your choice. Your data never crosses a jurisdiction you didn’t sign for. GDPR and AI Act by architecture for EU; data residency guaranteed for US and LATAM. DPA available across all tiers, public subprocessor list.

Continuity without lock-in

OpenAI v1-compatible API. Open-source models with Apache 2.0 license. If you decide to leave, you leave in an afternoon. We earn the renewal every month.

Models included

One bundle. Five open models. Zero upsell.

Every tier accesses the full catalogue. We don’t bill per model. There’s no "premium" tier hiding the good model behind a paywall.

chat

Qwen 3.6-35B-A3B

Primary chat / reasoning model. 32K context, direct and thinking modes switchable per request. Ideal for assistants, RAG and classification.

Apache 2.0
audio · transcription

Whisper large-v3 + turbo

Multilingual transcription in two flavours on the same endpoint: `large-v3` for top accuracy and `large-v3-turbo` (distilled decoder) up to ~54% faster on long audio. Native ES, EN, PT, CA.

MIT
audio · synthesis

Kokoro 82M TTS

Natural voice synthesis with strong Spanish coverage. Sub-200 ms latency, ideal for IVR and conversational agents.

Apache 2.0
embeddings

Qwen3-Embedding-8B

Embeddings for retrieval, clustering and semantic search. 4,096 dimensions, multilingual, optimized for long contexts.

Apache 2.0
reranker

Qwen3-Reranker-4B

Second-stage RAG reranking. Trained jointly with Qwen3-Embedding-8B (same family, no cross-vendor penalty). Cohere-compatible response shape — drop-in migration from Cohere / Voyage / Jina.

Apache 2.0

When we ship a new model, we tell you a month in advance. 12-month model-freeze clause with opt-in free upgrade.

How it works

Three steps. One afternoon.

  1. Step 1: Pick your tier and region

    EU for GDPR-bound companies, LATAM for South-American sovereignty, US for companies that prefer American residency. Residency is contracted, not discovered on a status page.

  2. Step 2: Change the base_url in your code

    Your OpenAI client stays the same. It just points to api.tesseraai.cloud. The rest of the SDK, your Langchain or LlamaIndex code and your prompts stay untouched.

  3. Step 3: Pay a flat monthly fee

    One invoice, one cell. No surprises on traffic spikes. We earn the renewal every month; no exit clauses to negotiate.

Trying takes 5 minutes and 3 lines of code. Full migration with tests, one day with founder-led support.
Honest comparison

How we compare with the rest.

No asterisks. If something doesn’t apply, we put a dash.

Feature Tessera OpenAI / Anthropic AWS Bedrock
Pricing model Flat monthlyPer-token variablePer-token variable
Data residency EU + LATAM + USUSMulti-region
EN / ES support Yes, founder directEN onlyLimited
OpenAI v1 compatibility 100%NativeVia adapter
Enterprise SLA 99.5% – 99.95%99.9%99.9%
Signed GDPR DPA YesYes (addendum)Yes
Monthly cost ~5M tokens/day €650~€3,500~€2,500
Pricing

One invoice. One cell.

Flat fee, dedicated GPU on Pro and above, no token-meter or seasonality surcharges.

no commitment
Tessera Async
€200/mo

For overnight processing and batch jobs.

  • No RPM, job queue
  • 16K context
  • LLM + embeddings
  • SLA <30s P95 (best-effort)
  • No Whisper / TTS
Get startedTessera Async
Tessera Lite
€450/mo

For small teams with a single use case.

  • 50 sustained RPM · 100 burst (5 min/h)
  • Full bundle: LLM + embeddings + Whisper + TTS
  • Sublimits: embeddings 100 RPM · Whisper 10 RPM · TTS 10 RPM
  • 8K context default · up to 32K configurable
  • Thinking mode: 100 requests/mo
  • Email · <24h business-hour response
  • Region: EU, LATAM or US
Get startedTessera Lite
Most popular
Tessera Pro
€650/mo

For typical mid-market production workloads.

  • 200 sustained RPM · 400 burst (5 min/h)
  • Full bundle, no operational sublimits
  • Sublimits: embeddings 400 RPM · Whisper 30 RPM · TTS 30 RPM
  • 8K context default · up to 32K configurable
  • Thinking mode: 1,000 requests/mo
  • Email + chat · <8h business-hour response
  • Status page with usage metrics
  • Region: EU, LATAM or US
Get startedTessera Pro
Tessera Pro+
€1,200/mo

For high concurrency and long context.

  • 500 sustained RPM · 700 burst (5 min/h)
  • Full bundle without additional sublimits
  • 32K context default · up to 128K configurable
  • Thinking mode: unlimited
  • Scheduling priority over Pro and Lite
  • Custom event webhooks
  • Priority chat · <4h business-hour response
  • Region: EU, LATAM or US
Get startedTessera Pro+
Tessera Scale
from€5,000/mo

For integrated products with very high concurrency.

  • 5,000+ RPM (negotiable per hardware)
  • Full bundle + optional LoRA fine-tuning
  • 128K context default · up to 256K native
  • Dedicated hardware of your choice
  • Thinking mode: unlimited, high priority
  • Maximum scheduling priority
  • Shared Slack · <1h business-hour response
  • Onboarding assisted by senior engineer
  • Region: EU, LATAM or US
Talk to salesTessera Scale
Tessera Enterprise
from€15,000/mo

Dedicated server, custom configuration, RFP-ready.

  • 100% dedicated server (no multi-tenant)
  • RPM, context, SLA and models tailored
  • Hardware chosen per workload
  • Compliance: SOC 2 Type I, ISO 27001 (in progress)
  • Dedicated support: founder + senior engineer
  • Negotiated roadmap commitment
  • Deployment: Tessera cloud, private cloud or on-premise
  • Fine-tuning on your data under NDA (optional)
Talk to salesTessera Enterprise
Commercial honesty

Who Tessera fits. And who it doesn’t.

We don’t compete on raw price against the cheap end of the market. If we fit, you save money and headaches; if we don’t fit, we tell you on the first call.

Tessera fits if

  • You spend €2,000–5,000 a month on frontier today (GPT-5.5, Opus 4.7, Gemini Pro) and the monthly variance complicates your reporting.
  • Your DPO or legal team asks where customer data physically lives, and the answer matters.
  • You want to consolidate invoices instead of running three different cloud providers.
  • You need human support in English or Spanish, in European, Latin American or US business hours, with bounded response times.
  • Your team builds product, not ML infrastructure — you want a drop-in that works, not a -90% optimization on prompt caching.

Tessera doesn’t fit if

  • You already use Gemini Flash-Lite or GPT-5.4 nano and it works for you — that’s your win, not Tessera’s.
  • Your traffic is extremely bursty (0 to 10,000 RPM in seconds). Per-token serverless beats dedicated GPU.
  • You have an in-house ML team that optimizes every prompt and squeezes hyperscaler volume discounts.
  • You need specific closed models (GPT-4o image, Sora, Veo) that only run in their native cloud.

If your case sits in the right column, we tell you on the first conversation. We don’t push contracts that don’t fit.

Built for developers, governed for enterprise

The piece your team and your DPO sign on the same day.

Documentation a developer reads in fifteen minutes. Compliance a DPO validates on Monday.

Docs with real examples

Python, Node.js, Go and cURL snippets per endpoint. Editable cookbook, errors documented with cause and workaround.

Public status page

status.tesseraai.cloud. Per-region latency in real time. Postmortems published within five business days, before customers ask.

Usage webhooks for your billing

Consumption events on every request. Plug your own chargeback or cost-center system without going through the dashboard.

Exportable audit logs

Signed logs, exportable to your S3 or GCS bucket. Configurable retention for DORA, SOC 2 and AI Act audits.

Trying Tessera takes 5 minutes. Full migration, one day.

OpenAI v1-compatible API, open models on dedicated GPU in EU, LATAM or US. Flat monthly invoice, no token-meter. Founder-led support in English or Spanish.

Get started — Tessera trial Book a 20-min call