Migrating from OpenAI
Tessera implements the OpenAI v1 API. In most cases the migration is a base_url swap and a rename of model identifiers. This guide documents the rest: what is identical, what behaves differently, and what we don’t support yet.
TL;DR
If your code uses the official OpenAI SDKs (Python, Node, Go) against the classic chat, embeddings, audio and TTS endpoints, the migration is three lines. The client stays the OpenAI client — only its target changes.
- Change base_url to https://api.tesseraai.cloud/v1.
- Swap your API key for the Tessera one (sk-tessera-…).
- Rename the model strings according to the table below.
- No custom SDK required. No wrapper either.
1. Swap the base_url
The official OpenAI SDKs expose the configuration parameter: base_url in Python and baseURL in Node. Pass it to the client constructor and the rest of your code is unaffected.
# Antes (OpenAI)
client = OpenAI(api_key="sk-…")
# Después (Tessera) — sólo cambia el destino
client = OpenAI(
base_url="https://api.tesseraai.cloud/v1",
api_key="sk-tessera-…",
)
response = client.chat.completions.create(
model="Qwen/Qwen3.6-35B-A3B",
messages=[{"role": "user", "content": "Hola"}],
)
- base_url="https://api.openai.com/v1"
+ base_url="https://api.tesseraai.cloud/v1"2. Map your models
Tessera offers five open models on dedicated GPUs. The table maps the most-used OpenAI models to their closest Tessera equivalent. It is not a 1:1 equivalence — different architectures, different training. We recommend validating against your real prompts before promoting the swap to production.
| OpenAI model | Tessera equivalent | Notes |
|---|---|---|
gpt-4o · gpt-4-turbo · gpt-4 | Qwen/Qwen3.6-35B-A3B | Our chat model. General reasoning, tool calling, structured JSON, 128K context. |
gpt-4o-mini | Qwen/Qwen3.6-35B-A3B | We don’t ship a mid-tier model: the chat model serves every case. Latency tends to match gpt-4o-mini and quality lands above it. |
gpt-3.5-turbo | Qwen/Qwen3.6-35B-A3B (async queue) | Same model on an async/lower-priority queue. Built for batch jobs and non-interactive workloads. |
text-embedding-3-small · -large · ada-002 | Qwen3-Embedding-8B | 4096-dim output. Accepts input string or array, same as OpenAI. The exact model: string is published when the endpoint goes live. |
whisper-1 | whisper-large-v3 | WAV / MP3 / M4A / WebM up to 25 MB. Translation and verbose JSON supported. The exact model: string is published when the endpoint goes live. |
tts-1 · tts-1-hd | kokoro-82m | Voices in es-ES, es-LA and en-US. MP3 / WAV / OGG output. The exact model: string is published when the endpoint goes live. |
We don’t publish comparative benchmarks against OpenAI: any number outside your own dataset is marketing. The home calculator lets you simulate your cost on your real volumes.
3. Supported endpoints
Coverage targets the most-used routes of OpenAI’s v1 API. What we don’t expose yet is marked “roadmap” — if it blocks your migration, ping us and we prioritise it.
| Endpoint | Status | Note |
|---|---|---|
POST /v1/chat/completions | Supported | SSE streaming, tools, JSON mode, response_format=json_schema. |
POST /v1/embeddings | Supported | input string or array. encoding_format float (base64 on the roadmap). |
POST /v1/rerank | Supported | Tessera-native endpoint (not in OpenAI). Response shape matches Cohere — drop-in migration from Cohere / Voyage / Jina. |
POST /v1/audio/transcriptions | Supported | multipart/form-data identical to OpenAI. response_format json | text | verbose_json. |
POST /v1/audio/translations | Supported | Same contract. Translates to English. |
POST /v1/audio/speech | Supported | Voices es-ES / es-LA / en-US. response_format mp3 | wav | opus. |
GET /v1/models | Supported | Returns the five Tessera models with extended metadata. |
POST /v1/moderations | Partial | Endpoint exposed. Always responds "flagged: false" — we don’t run managed moderation. |
/v1/responses (Responses API) | Roadmap | Migrate to Chat Completions: that contract is the one all SDKs still ship support for. |
/v1/realtime (WebSocket) | Roadmap | Voice + low-latency duplex. No ETA. |
/v1/batch | Roadmap | For async volumes use the chat model’s async queue for now (same identifier, lower priority). |
/v1/fine-tuning | Roadmap | Let’s talk: the model is open-weights, there are non-standard paths for custom checkpoints. |
/v1/images | Roadmap | No image model. No ETA. |
/v1/assistants · /v1/threads · /v1/runs | Roadmap | OpenAI marked them deprecated in favour of Responses. We recommend migrating to Chat Completions + tools. |
4. Documented behavioural differences
Same contract doesn’t mean same behaviour. These are the points where Tessera diverges from what you’d see against OpenAI, ordered by likelihood of affecting you.
temperature, top_p, max_tokens- Identical. max_tokens is interpreted as "output tokens" — input context does not count against it.
tool_choice and tools- OpenAI-compatible function calling. tool_choice accepts "auto", "none", "required" and { type: "function", function: { name } }. Parallel tool calls supported.
response_format- JSON mode (response_format: { type: "json_object" }) and JSON Schema (type: "json_schema") supported. Strict validation: the runtime retries internally before failing if the model leaves the schema.
stream- SSE identical to OpenAI: data: {…}, terminated by data: [DONE]. 15 s heartbeats to keep proxies alive.
seed- Accepted but does not guarantee absolute determinism across model versions. OpenAI doesn’t guarantee it either — the difference is we say so.
logprobs / top_logprobs- Not yet supported on /v1/chat/completions. On the roadmap. If your use case depends on logprobs, tell us.
n (multiple completions)- Accepted but always returns n=1. For multi-sampling, call N times — same cost, more predictable latency.
logit_bias- Not supported. Most uses fold into constrained generation; we recommend response_format=json_schema instead.
Vision (image inputs)- Not supported. Qwen3.6-35B-A3B is text-only. If vision is a hard blocker, ping us: there is a private beta on a different model.
Rate limits and errors- Same error envelope (error.type, error.code, error.message). 429 with Retry-After header. Detail in /docs/api/rate-limits.
5. Gradual rollout
Once it works locally, the prudent move is to send real traffic gradually and compare against the OpenAI flow before pulling the plug. Typical pattern, feature-flagged or percentage-based:
# Python — percentage routing, no changes to the product SDK
import random
from openai import OpenAI
openai_client = OpenAI(api_key=OPENAI_KEY)
tessera_client = OpenAI(
base_url="https://api.tesseraai.cloud/v1",
api_key=TESSERA_KEY,
)
def chat(messages, *, tessera_share=0.10):
use_tessera = random.random() < tessera_share
client = tessera_client if use_tessera else openai_client
model = "Qwen/Qwen3.6-35B-A3B" if use_tessera else "gpt-4o"
return client.chat.completions.create(model=model, messages=messages)
Start with tessera_share=0.05 on low-criticality traffic. Log both sides and diff cold for 1–2 days before turning the dial up.
What not to migrate yet
Some flows still depend on capabilities we don’t expose. If your product uses them, keep that subset on OpenAI — everything else can live on Tessera without issue.
- Realtime API (duplex voice over WebSocket).
- Batch API: use the chat model’s async queue if your workload tolerates async.
- Fine-tuning via API. Open-weights enables alternative paths — talk to us.
- Vision (image inputs): the model is text-only.
- Image generation, DALL·E, Sora.
- Assistants API and Agents SDK: OpenAI is winding them down, we recommend Chat Completions + tools.
Need help?
Migrations usually take less than an afternoon. If you get stuck — an endpoint you can’t find, output that doesn’t match, a prompt that behaves differently — write directly to the founder.