Overview
Converts one or more strings into dense vectors up to 4,096 dimensions. Designed for RAG, semantic search, clustering and deduplication. Multilingual — works equally on English, Spanish, Portuguese, Catalan and many more. Supports the `dimensions` parameter (Matryoshka) to shrink the vector without retraining.
Endpoint and model
POST `https://api.tesseraai.cloud/v1/embeddings`. Pass `model: "Qwen/Qwen3-Embedding-8B"` in the request body.
| Attribute | Value |
|---|---|
| Dimensions | 4,096 native; configurable downward (32–4,096) via the `dimensions` parameter |
| Languages | Multilingual (100+ languages, including EN, ES, PT, CA, IT, FR, DE) |
| Input type | string or array of strings |
| Max length | 8,192 tokens per input |
| Quantisation | Q8_0 GGUF (near-FP16 quality) |
| Licence | Apache 2.0 |
Output dimensions
The model is trained with Matryoshka Representation Learning (MRL): the first N components of the vector form a self-contained representation of the input. You can request shorter vectors without retraining and with minimal quality loss. Pass the OpenAI-standard `dimensions` parameter and Tessera truncates the vector and L2-renormalises it server-side, ready to use with cosine similarity.
- `dimensions` is optional. Without it you get the native 4,096-dim vector.
- Accepts any value between 32 and 4,096. Common values: 1,536 (pgvector HNSW cap), 1,024, 768, 512.
- No extra inference cost — truncation happens post-inference in the gateway.
- Returned vectors are L2-normalised — ready for `cosine_distance` in pgvector / Qdrant / Weaviate without extra steps.
# cURL
curl https://api.tesseraai.cloud/v1/embeddings \
-H "Authorization: Bearer $TESSERA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Embedding-8B",
"input": ["Text 1", "Text 2"],
"dimensions": 1536
}'
# Python with the OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.tesseraai.cloud/v1",
api_key=os.environ["TESSERA_API_KEY"],
)
resp = client.embeddings.create(
model="Qwen/Qwen3-Embedding-8B",
input=["Text 1", "Text 2"],
dimensions=1536,
)
# resp.data[0].embedding → list of 1,536 floats, L2-normalised
# Ready for INSERT INTO docs (embedding) VALUES (%s) on pgvectorRequest
curl https://api.tesseraai.cloud/v1/embeddings \
-H "Authorization: Bearer $TESSERA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Embedding-8B",
"input": [
"I want to know my invoice balance",
"Cancel my subscription"
]
}'Response
Identical structure to OpenAI: `data[].embedding` is a 4,096-float array per input (or whatever was requested via `dimensions`), L2-normalised.
{
"object": "list",
"model": "Qwen/Qwen3-Embedding-8B",
"data": [
{"object": "embedding", "index": 0, "embedding": [0.013, 0.0062, ...]},
{"object": "embedding", "index": 1, "embedding": [-0.022, 0.041, ...]}
],
"usage": {"prompt_tokens": 12, "total_tokens": 12}
}Typical use cases
- RAG: index your corpus in Qdrant / pgvector / Weaviate and retrieve passages by similarity before sending them to the chat model.
- Semantic search: replacement for keyword search where "unpaid invoice" should match "outstanding balance".
- Clustering / topic modelling: group tickets, documents or emails by similarity for unsupervised analysis.
- Deduplication: detect near-identical content (duplicate FAQ entries, repeated leads) by cosine-similarity threshold.