AI Act LLM Inference: Compliance & Deployment Guide
How the EU AI Act applies to LLM inference across the EU, the US, and LATAM: roles, deadlines, deployer duties, DPA clauses, zone checklist.
AI Act LLM Inference: Compliance & Deployment Guide
The EU AI Act (Regulation (EU) 2024/1689) classifies most large language models as General-Purpose AI (GPAI), not high-risk systems by default. A managed inference platform hosting third-party open-weight models typically acts as a deployer or distributor, and that role determines every contract, log, and impact assessment your customers need.
As of 2026-05-18, GPAI provider obligations under Articles 53 and 55 have been active since 2025-08-02. The full high-risk rules under Title III take effect on 2026-08-02.
This guide maps those rules for LLM inference across Tessera’s three main markets: the EU, the US, and Latin America. It is written for engineering, legal, and security teams building a defensible compliance posture before that deadline.
Who Is the Provider, Who Is the Deployer
The Act looks at what you actually do, not your company name. Article 3 defines a provider as the entity that develops the model or sells it under its own brand. A deployer is someone using an AI system in their own business.
For a managed LLM inference platform hosting open-weight models like Qwen, Llama, or Mistral, three roles usually overlap:
- The upstream publisher (Alibaba for Qwen, Meta for Llama, Mistral AI for its family) stays the GPAI provider under Article 53.
- The inference platform acts as a distributor or service provider. It only becomes a new provider if it fine-tunes, retrains, or rebrands the model in a way that changes its intended purpose.
- The customer is typically the deployer. If they use the LLM for an Annex III high-risk task like employment screening or credit scoring, full deployer rules apply.
If your platform ships a fine-tuned version under its own brand, it inherits GPAI provider obligations for that derivative. Tessera serves open-weight models exactly as released, without retraining or rebranding, so the customer remains the deployer for any high-risk application built on top.
What Is in Force Today and What Changes in August 2026
The Act rolls out in phases across three concurrent tracks.
Already active as of 2026-05-18:
- Article 4 (AI literacy) has been live since 2025-02-02. Staff using AI systems need a baseline understanding of how they work.
- Article 5 (prohibited practices) went live on 2025-02-02. It bans eight categories of AI use, including manipulative dark patterns and unrestricted real-time biometric identification in public spaces. Read the Article 5 text.
- Articles 53 to 55 (GPAI provider duties) started on 2025-08-02, covering documentation, copyright policy, training-data summaries, and systemic-risk rules for top-tier models.
Starting 2026-08-02:
- Title III, Chapter 2 high-risk rules take effect. Providers must address risk management, data governance, technical documentation, logging, transparency, human oversight, accuracy, cybersecurity, and robustness, plus conformity assessment and registration.
- Chapter 3 deployer rules for high-risk systems also kick in.
Travers Smith reported on 2026-05-07 that EU officials are considering a delay to the 2026-08-02 deadline. The amendment has not passed. Treat 2026-08-02 as the hard date until the Official Journal publishes any change. Check the EU AI Act timeline for the full schedule.
GPAI Obligations and the 10^25 FLOPs Threshold
Article 53 lists four baseline duties for any GPAI provider. The European Commission AI policy page summarizes them:
- Draft and maintain technical documentation (Annex XI).
- Share documentation with downstream AI providers who integrate the model (Annex XII).
- Publish a copyright compliance policy (Article 53(1)(c)).
- Release a detailed summary of the training data (Article 53(1)(d)).
Article 55 adds five extra duties when a GPAI model reaches systemic-risk classification:
- Run standardized benchmark evaluations.
- Conduct adversarial testing.
- Assess and mitigate systemic risks.
- Track, document, and report serious incidents.
- Maintain strong cybersecurity protections.
The Act presumes systemic risk when cumulative training compute crosses 10^25 floating-point operations (Article 51). The AI Office can also flag a model below that threshold if its capabilities or real-world impact warrant it.
Qwen3.6-35B-A3B sits well below the 10^25 FLOPs mark based on published training budgets, so the systemic-risk rules do not apply to customers using Tessera’s hosted Qwen. If you fine-tune an open-weight model with heavy compute, verify whether your total training run crosses that line.
When the AI Act Reaches You from Outside the EU
Article 2 sets the territorial scope. The Act applies to non-EU entities in three specific scenarios:
- The provider sells an AI system in the EU or puts it into service there.
- The deployer operates inside the EU.
- The AI system’s output gets used in the Union, regardless of where the provider or deployer is based.
US and LATAM teams consistently underestimate that third trigger. You don’t need an EU office, EU servers, or EU staff. If the inference output reaches an EU user, regulator, or downstream system, the Act applies.
If You Operate from the United States
There is no broad federal AI law as of 2026-05-18. Federal oversight relies on voluntary NIST guidance (the AI Risk Management Framework and its Generative AI Profile) and FTC enforcement against deceptive AI claims under existing consumer-protection rules.
State and local laws fill the gaps:
- The Colorado AI Act (SB 24-205) took effect on 2026-02-01. It regulates high-risk AI used for consequential decisions in employment, housing, education, lending, healthcare, insurance, legal services, and essential goods. A hosted LLM only falls under this if it supports one of those decisions; a general chat assistant is not automatically covered.
- NYC Local Law 144 of 2021 is active. It requires bias audits for automated employment decision tools that heavily influence hiring or promotion for NYC residents.
- Illinois BIPA (740 ILCS 14) creates liability for any inference platform processing biometric identifiers like face geometry, voiceprints, or iris scans. Speech-to-text and image captioning don’t automatically trigger this; the law targets the biometric identifier itself.
- California SB 1047 is not operative as a controlling statewide AI law and should not anchor your compliance strategy.
If your US product reaches EU users, layer EU AI Act compliance on top of your state obligations. The biggest 2026 compliance event for a US LLM SaaS will be the 2026-08-02 EU high-risk deadline, not a federal move. Holland & Knight and Modulos have detailed breakdowns at hklaw.com and modulos.ai.
If You Operate from Latin America
LATAM is shifting toward risk-based AI rules. Right now, only Brazil and Chile have concrete laws you can plan around, per the Baker McKenzie regional map.
Brazil’s PL 2.338/2023 is still pending in Congress. It proposes a risk-based model with significant penalties, but there is no enacted effective date yet. The Lei Geral de Proteção de Dados (LGPD) and ANPD guidance already apply to LLM inference that handles personal data, especially for cross-border transfers and sensitive categories.
Chile introduced an AI Bill modeled on international frameworks and updated its National AI Policy. The new Personal Data Protection Law applies whenever LLM inference touches profiling, biometrics, or automated decisions.
Mexico, Colombia, and Argentina are following the same regional trend, but no specific enacted AI statute for LLM inference exists in any of those countries yet. Existing data-protection laws, including Mexico’s LFPDPPP, Colombia’s Ley 1581/2012, and Argentina’s Ley 25.326, still apply when prompts or outputs contain personal data.
For LATAM teams selling to Europe, the Act’s extraterritorial reach is the real pressure point. Article 2 attaches the moment output gets used in the Union. High-risk providers without an EU establishment must appoint an authorized representative.
Customer Obligations as a Deployer of a High-Risk System
When a customer uses a hosted LLM for an Annex III high-risk task, deployer duties under Chapter 3 kick in on 2026-08-02. KPMG covers the full checklist in their decoding-the-AI-Act primer. The core duties are:
- Follow the provider’s instructions for use.
- Keep competent personnel in the loop for human oversight.
- Monitor the system against its intended purpose, and suspend use immediately if a deviation threatens health, safety, or fundamental rights.
- Keep system logs for the required period. The Act mandates a minimum of six months, unless EU or national law requires longer.
- Run a Fundamental Rights Impact Assessment (FRIA) under Article 27 if you are a public body, a private operator providing public services, or working in specific Annex III categories.
- Notify workers and their representatives before deploying high-risk systems in the workplace.
Generic chat models rarely fit high-risk tasks out of the box. A deployer that integrates a general LLM into a hiring screen, credit decision, or eligibility ranking takes on deployer duties. If they also brand the resulting system under their own name, they may trigger provider obligations as well.
DPA Clauses That Allocate AI Act Roles
A managed inference contract that omits AI Act role allocation creates legal exposure. The clauses below cover what customers ask their DPAs to address, following recommendations from Bastion’s SaaS compliance guide:
- Role allocation. State clearly which party is the provider, deployer, or other role for each defined use case. The statute overrides contracts, but a clear statement keeps evidence clean.
- Intended purpose. Document the approved use case. Require written notice and role re-allocation if the customer plans to use the model in a high-risk Annex III context.
- Documentation delivery. The platform commits to supplying technical documentation, model version identifiers, instructions for use, known limitations, and safety notes. Customers need this to meet their deployer or downstream provider duties.
- Change notice. Give advance warning of fine-tuning, retraining, version upgrades, safety-filter changes, or hosting-location shifts that could affect compliance.
- Logging and retention. Define what the platform logs, how long it retains it, and the export format. AI Act logging duties sit with the deployer; the platform must enable that capability, not block it.
- Incident reporting. Set the notification path and SLA for serious incidents under Article 73 or systemic-risk reporting under Article 55.
- Article 50 transparency split. Divide AI-generated content disclosures between the platform (system-level notices and machine-readable metadata) and the customer (end-user-facing disclosures in their workflows).
- Subprocessor list. Disclose key subprocessors and upstream model licensors.
How Tessera Maps to AI Act Obligations
Tessera runs managed inference on dedicated GPU clusters in the EU and Latin America. Three structural choices keep the compliance surface small for customers:
- Dedicated, not shared, GPUs. Compute never moves across tenants. Data residency stays enforceable per cluster.
- Open-weight models served as released. The customer’s upstream GPAI provider is the model publisher (Alibaba for Qwen, Meta for Llama, Mistral AI for its family). Tessera does not retrain or rebrand the base model.
- Flat monthly pricing. Predictable costs remove the audit volatility that token-metered providers create. Security and legal teams can model the cost of a compliance-driven workload review without guessing (see the pricing calculator).
Operators pick their region per deployment. The GDPR posture page details EU residency clusters. The AI Act page covers role allocation. The DPA template ships the clauses above.
Compliance Checklist by Zone
The lists below summarize what to action this quarter in each jurisdiction, against the EU AI Act service-desk timeline.
EU-headquartered operator using Tessera EU clusters:
- Confirm the use case is not Annex III high-risk. If it is, prepare for the 2026-08-02 deadline.
- Review the GPAI documentation supplied for the deployed model.
- Roll out Article 4 AI literacy training for staff using the system.
- Plan logging and retention around deployer obligations.
US-headquartered operator with EU users:
- Treat the Article 2 output-trigger as active from day one of EU traffic.
- Appoint an EU authorized representative if the system reaches high-risk classification.
- Layer Colorado SB 24-205 (effective 2026-02-01) on top when the workload supports consequential decisions.
- Track NYC LL 144 and Illinois BIPA exposure separately when applicable.
LATAM-headquartered operator:
- Apply LGPD (Brazil), Ley 1581 (Colombia), and LFPDPPP (Mexico) to all prompts and outputs containing personal data.
- Watch PL 2.338/2023 (Brazil) and the Chilean AI Bill for enactment.
- Apply EU AI Act extraterritorial scope when serving EU users.
- Use LATAM data residency clusters to keep workloads onshore.
FAQ
Does the EU AI Act apply to LLM inference?
Article 2 applies the Regulation to any AI system whose output gets used in the Union, regardless of where the provider or deployer sits. A non-EU LLM inference service falls in scope if EU users, employees, or downstream systems consume the output. The official scope guidance confirms the output-trigger rule.
Is a hosted LLM a high-risk AI system under the AI Act?
Not by default. Most general-purpose LLMs qualify as GPAI models under Article 53. The same model only becomes part of a high-risk system when a deployer integrates it into an Annex III use case like employment, education, credit, or essential services. Classification depends on the use case, not the model itself.
What is the systemic-risk threshold for a GPAI model?
Cumulative training compute above 10^25 floating-point operations triggers a presumption of systemic risk under Article 51. The AI Office can also designate a model below that threshold based on capability or real-world impact.
When does the EU AI Act apply to a US or LATAM company?
When the AI system’s output gets used in the Union (Article 2). The provider does not need an EU office, server, or staff member. Output consumption inside the EU is sufficient.
What documentation do downstream deployers need from a GPAI provider?
Annex XII defines the minimum content: model capabilities, limitations, intended use, training data summary, computing resources used, evaluation outcomes, and safety information. The provider must keep this updated whenever material changes occur.