OpenAI-compatible API · no token-meter

Dedicated AI inference in your region. Flat monthly invoice.

Tessera replaces OpenAI with a base_url change. Open-source models on dedicated GPU in EU, LATAM and US. No token-meter, signed DPA, human support in English or Spanish.

See pricing Talk to the founder

5-minute trial · 3 lines of code Dedicated GPU in EU, LATAM and US GDPR and AI Act by design

python · three lines to try

from openai import OpenAI

client = OpenAI(
  base_url="https://api.tesseraai.cloud/v1",
  api_key="sk-tessera-…",
)

Measured benchmarks

Numbers we measured, not promised.

Metrics taken on a sustained run on real infrastructure with 25 active customers. No per-token competitor publishes their own infrastructure benchmarks. We do.

TTFT P95

350 ms

qwen3.6-35b-a3b

Success rate

99.95 %

8,000 requests · 0 generation errors

Included capacity

100 RPM

per Pro customer · burst 200

Tokens validated

5.91 M

on a 1 h 50 min sustained run

Measured on 2026-04-27 on RTX PRO 6000 Blackwell with 25 simultaneous customers. Full report (saturation curve, noisy neighbor, long context) available under NDA.

Why Tessera

Three pillars. No fine print.

Predictability over virtuosity

Flat monthly fee. The invoice fits in a single cell. No surprises on Black Friday or end of quarter. Your CFO signs without flinching and you stop defending variance to the finance committee.

Sovereignty by design

GPU physically in EU, LATAM or US, your choice. Your data never crosses a jurisdiction you didn’t sign for. GDPR and AI Act by architecture for EU; data residency guaranteed for US and LATAM. DPA available across all tiers, public subprocessor list.

Continuity without lock-in

OpenAI v1-compatible API. Open-source models with Apache 2.0 license. If you decide to leave, you leave in an afternoon. We earn the renewal every month.

Models included

One bundle. Five open models. Zero upsell.

Every tier accesses the full catalogue. We don’t bill per model. There’s no "premium" tier hiding the good model behind a paywall.

chat

Qwen 3.6-35B-A3B

Primary chat / reasoning model. 32K context, direct and thinking modes switchable per request. Ideal for assistants, RAG and classification.

  Apache 2.0 

audio · transcription

Whisper large-v3 + turbo

Multilingual transcription in two flavours on the same endpoint: `large-v3` for top accuracy and `large-v3-turbo` (distilled decoder) up to ~54% faster on long audio. Native ES, EN, PT, CA.

MIT

audio · synthesis

Kokoro 82M TTS

Natural voice synthesis with strong Spanish coverage. Sub-200 ms latency, ideal for IVR and conversational agents.

Hear all 54 voices →

  Apache 2.0 

embeddings

Qwen3-Embedding-8B

Embeddings for retrieval, clustering and semantic search. 4,096 dimensions, multilingual, optimized for long contexts.

  Apache 2.0 

reranker

Qwen3-Reranker-4B

Second-stage RAG reranking. Trained jointly with Qwen3-Embedding-8B (same family, no cross-vendor penalty). Cohere-compatible response shape — drop-in migration from Cohere / Voyage / Jina.

  Apache 2.0 

When we ship a new model, we tell you a month in advance. 12-month model-freeze clause with opt-in free upgrade.

How it works

Three steps. One afternoon.

Step 1: Pick your tier and region

EU for GDPR-bound companies, LATAM for South-American sovereignty, US for companies that prefer American residency. Residency is contracted, not discovered on a status page.
02
Step 2: Change the base_url in your code

Your OpenAI client stays the same. It just points to api.tesseraai.cloud. The rest of the SDK, your Langchain or LlamaIndex code and your prompts stay untouched.
```
              $ OPENAI_BASE_URL="https://api.tesseraai.cloud/v1"
  python app.py
  ✓ ready · TTFT 312 ms
            
```
Step 3: Pay a flat monthly fee

One invoice, one cell. No surprises on traffic spikes. We earn the renewal every month; no exit clauses to negotiate.

Trying takes 5 minutes and 3 lines of code. Full migration with tests, one day with founder-led support.

Honest comparison

How we compare with the rest.

No asterisks. If something doesn’t apply, we put a dash.

Feature	Tessera	OpenAI / Anthropic	AWS Bedrock
Pricing model	Flat monthly	Per-token variable	Per-token variable
Data residency	EU + LATAM + US	US	Multi-region
EN / ES support	Yes, founder direct	EN only	Limited
OpenAI v1 compatibility	100%	Native	Via adapter
Enterprise SLA	99.5% – 99.95%	99.9%	99.9%
Signed GDPR DPA	Yes	Yes (addendum)	Yes
Monthly cost ~5M tokens/day	€650	~€3,500	~€2,500

Pricing

One invoice. One cell.

Flat fee, dedicated GPU on Pro and above, no token-meter or seasonality surcharges.

no commitment

Tessera Async

€200/mo

For overnight processing and batch jobs.

No RPM, job queue
16K context
LLM + embeddings
SLA <30s P95 (best-effort)
No Whisper / TTS

Get started

Tessera Lite

€450/mo

For small teams with a single use case.

50 sustained RPM · 100 burst (5 min/h)
Full bundle: LLM + embeddings + Whisper + TTS
Sublimits: embeddings 100 RPM · Whisper 10 RPM · TTS 10 RPM
8K context default · up to 32K configurable
Thinking mode: 100 requests/mo
Email · <24h business-hour response
Region: EU, LATAM or US

Get started

Who Tessera fits. And who it doesn’t.

We don’t compete on raw price against the cheap end of the market. If we fit, you save money and headaches; if we don’t fit, we tell you on the first call.

Tessera fits if

You spend €2,000–5,000 a month on frontier today (GPT-5.5, Opus 4.7, Gemini Pro) and the monthly variance complicates your reporting.
Your DPO or legal team asks where customer data physically lives, and the answer matters.
You want to consolidate invoices instead of running three different cloud providers.
You need human support in English or Spanish, in European, Latin American or US business hours, with bounded response times.
Your team builds product, not ML infrastructure — you want a drop-in that works, not a -90% optimization on prompt caching.

Tessera doesn’t fit if

You already use Gemini Flash-Lite or GPT-5.4 nano and it works for you — that’s your win, not Tessera’s.
Your traffic is extremely bursty (0 to 10,000 RPM in seconds). Per-token serverless beats dedicated GPU.
You have an in-house ML team that optimizes every prompt and squeezes hyperscaler volume discounts.
You need specific closed models (GPT-4o image, Sora, Veo) that only run in their native cloud.

If your case sits in the right column, we tell you on the first conversation. We don’t push contracts that don’t fit.

Built for developers, governed for enterprise

The piece your team and your DPO sign on the same day.

Documentation a developer reads in fifteen minutes. Compliance a DPO validates on Monday.

Docs with real examples

Python, Node.js, Go and cURL snippets per endpoint. Editable cookbook, errors documented with cause and workaround.

Public status page

status.tesseraai.cloud. Per-region latency in real time. Postmortems published within five business days, before customers ask.

Usage webhooks for your billing

Consumption events on every request. Plug your own chargeback or cost-center system without going through the dashboard.

Exportable audit logs

Signed logs, exportable to your S3 or GCS bucket. Configurable retention for DORA, SOC 2 and AI Act audits.

Trying Tessera takes 5 minutes. Full migration, one day.

OpenAI v1-compatible API, open models on dedicated GPU in EU, LATAM or US. Flat monthly invoice, no token-meter. Founder-led support in English or Spanish.

Get started Book a 20-min call