Quickstart

Migrating from OpenAI

Tessera implements the OpenAI v1 API. In most cases the migration is a base_url swap and a rename of model identifiers. This guide documents the rest: what is identical, what behaves differently, and what we don’t support yet.

5 min read

TL;DR

If your code uses the official OpenAI SDKs (Python, Node, Go) against the classic chat, embeddings, audio and TTS endpoints, the migration is three lines. The client stays the OpenAI client — only its target changes.

Change base_url to https://api.tesseraai.cloud/v1.
Swap your API key for the Tessera one (sk-tessera-…).
Rename the model strings according to the table below.
No custom SDK required. No wrapper either.

1. Swap the base_url

The official OpenAI SDKs expose the configuration parameter: base_url in Python and baseURL in Node. Pass it to the client constructor and the rest of your code is unaffected.

# Antes (OpenAI)
client = OpenAI(api_key="sk-…")

# Después (Tessera) — sólo cambia el destino
client = OpenAI(
    base_url="https://api.tesseraai.cloud/v1",
    api_key="sk-tessera-…",
)

response = client.chat.completions.create(
    model="Qwen/Qwen3.6-35B-A3B",
    messages=[{"role": "user", "content": "Hola"}],
)

Typical migration diff

- base_url="https://api.openai.com/v1"
+ base_url="https://api.tesseraai.cloud/v1"

2. Map your models

Tessera offers five open models on dedicated GPUs. The table maps the most-used OpenAI models to their closest Tessera equivalent. It is not a 1:1 equivalence — different architectures, different training. We recommend validating against your real prompts before promoting the swap to production.

OpenAI model	Tessera equivalent	Notes
`gpt-4o · gpt-4-turbo · gpt-4`	`Qwen/Qwen3.6-35B-A3B`	Our chat model. General reasoning, tool calling, structured JSON, 128K context.
`gpt-4o-mini`	`Qwen/Qwen3.6-35B-A3B`	We don’t ship a mid-tier model: the chat model serves every case. Latency tends to match gpt-4o-mini and quality lands above it.
`gpt-3.5-turbo`	`Qwen/Qwen3.6-35B-A3B (async queue)`	Same model on an async/lower-priority queue. Built for batch jobs and non-interactive workloads.
`text-embedding-3-small · -large · ada-002`	`Qwen3-Embedding-8B`	4096-dim output. Accepts input string or array, same as OpenAI. The exact model: string is published when the endpoint goes live.
`whisper-1`	`whisper-large-v3`	WAV / MP3 / M4A / WebM up to 25 MB. Translation and verbose JSON supported. The exact model: string is published when the endpoint goes live.
`tts-1 · tts-1-hd`	`kokoro-82m`	Voices in es-ES, es-LA and en-US. MP3 / WAV / OGG output. The exact model: string is published when the endpoint goes live.

We don’t publish comparative benchmarks against OpenAI: any number outside your own dataset is marketing. The home calculator lets you simulate your cost on your real volumes.

3. Supported endpoints

Coverage targets the most-used routes of OpenAI’s v1 API. What we don’t expose yet is marked “roadmap” — if it blocks your migration, ping us and we prioritise it.

Endpoint	Status	Note
`POST /v1/chat/completions`	Supported	SSE streaming, tools, JSON mode, response_format=json_schema.
`POST /v1/embeddings`	Supported	input string or array. encoding_format float (base64 on the roadmap).
`POST /v1/rerank`	Supported	Tessera-native endpoint (not in OpenAI). Response shape matches Cohere — drop-in migration from Cohere / Voyage / Jina.
`POST /v1/audio/transcriptions`	Supported	multipart/form-data identical to OpenAI. response_format json \| text \| verbose_json.
`POST /v1/audio/translations`	Supported	Same contract. Translates to English.
`POST /v1/audio/speech`	Supported	Voices es-ES / es-LA / en-US. response_format mp3 \| wav \| opus.
`GET /v1/models`	Supported	Returns the five Tessera models with extended metadata.
`POST /v1/moderations`	Partial	Endpoint exposed. Always responds "flagged: false" — we don’t run managed moderation.
`/v1/responses (Responses API)`	Roadmap	Migrate to Chat Completions: that contract is the one all SDKs still ship support for.
`/v1/realtime (WebSocket)`	Roadmap	Voice + low-latency duplex. No ETA.
`/v1/batch`	Roadmap	For async volumes use the chat model’s async queue for now (same identifier, lower priority).
`/v1/fine-tuning`	Roadmap	Let’s talk: the model is open-weights, there are non-standard paths for custom checkpoints.
`/v1/images`	Roadmap	No image model. No ETA.
`/v1/assistants · /v1/threads · /v1/runs`	Roadmap	OpenAI marked them deprecated in favour of Responses. We recommend migrating to Chat Completions + tools.

4. Documented behavioural differences

Same contract doesn’t mean same behaviour. These are the points where Tessera diverges from what you’d see against OpenAI, ordered by likelihood of affecting you.

temperature, top_p, max_tokens: Identical. max_tokens is interpreted as "output tokens" — input context does not count against it.
tool_choice and tools: OpenAI-compatible function calling. tool_choice accepts "auto", "none", "required" and { type: "function", function: { name } }. Parallel tool calls supported.
response_format: JSON mode (response_format: { type: "json_object" }) and JSON Schema (type: "json_schema") supported. Strict validation: the runtime retries internally before failing if the model leaves the schema.
stream: SSE identical to OpenAI: data: {…}, terminated by data: [DONE]. 15 s heartbeats to keep proxies alive.
seed: Accepted but does not guarantee absolute determinism across model versions. OpenAI doesn’t guarantee it either — the difference is we say so.
logprobs / top_logprobs: Not yet supported on /v1/chat/completions. On the roadmap. If your use case depends on logprobs, tell us.
n (multiple completions): Accepted but always returns n=1. For multi-sampling, call N times — same cost, more predictable latency.
logit_bias: Not supported. Most uses fold into constrained generation; we recommend response_format=json_schema instead.
Vision (image inputs): Not supported. Qwen3.6-35B-A3B is text-only. If vision is a hard blocker, ping us: there is a private beta on a different model.
Rate limits and errors: Same error envelope (error.type, error.code, error.message). 429 with Retry-After header. Detail in /docs/api/rate-limits.

5. Gradual rollout

Once it works locally, the prudent move is to send real traffic gradually and compare against the OpenAI flow before pulling the plug. Typical pattern, feature-flagged or percentage-based:

# Python — percentage routing, no changes to the product SDK
import random
from openai import OpenAI

openai_client = OpenAI(api_key=OPENAI_KEY)
tessera_client = OpenAI(
    base_url="https://api.tesseraai.cloud/v1",
    api_key=TESSERA_KEY,
)

def chat(messages, *, tessera_share=0.10):
    use_tessera = random.random() < tessera_share
    client = tessera_client if use_tessera else openai_client
    model = "Qwen/Qwen3.6-35B-A3B" if use_tessera else "gpt-4o"
    return client.chat.completions.create(model=model, messages=messages)

Start with tessera_share=0.05 on low-criticality traffic. Log both sides and diff cold for 1–2 days before turning the dial up.

What not to migrate yet

Some flows still depend on capabilities we don’t expose. If your product uses them, keep that subset on OpenAI — everything else can live on Tessera without issue.

Realtime API (duplex voice over WebSocket).
Batch API: use the chat model’s async queue if your workload tolerates async.
Fine-tuning via API. Open-weights enables alternative paths — talk to us.
Vision (image inputs): the model is text-only.
Image generation, DALL·E, Sora.
Assistants API and Agents SDK: OpenAI is winding them down, we recommend Chat Completions + tools.

Need help?

Migrations usually take less than an afternoon. If you get stuck — an endpoint you can’t find, output that doesn’t match, a prompt that behaves differently — write directly to the founder.

Talk to the founder See your savings on real numbers