Cloudflare AI Gateway

Cloudflare AI Gateway is a free observability and control layer for AI applications that provides caching, rate limiting, analytics, cost tracking, and request management across 20+ AI providers, deeply integrated with the Cloudflare Workers ecosystem.

Overview

Cloudflare AI Gateway acts as a proxy between your application and AI providers. By routing requests through the gateway, you gain visibility into usage patterns, control over costs, and resilience through caching and fallbacks — all without changing your application's core logic.

The gateway is available on all Cloudflare plans, including the free tier. Core features like analytics, caching, rate limiting, and logging have no per-call gateway fee.

Key Features

Caching

AI Gateway can serve identical requests directly from Cloudflare's global edge cache, reducing latency by up to 90% and significantly cutting costs by avoiding repeated API calls to providers. This is especially effective for applications with predictable query patterns, such as FAQ bots, documentation assistants, or classification tasks.

Cache behavior is configurable per gateway:

TTL control — Set how long cached responses remain valid
Cache bypass — Skip cache for specific requests when fresh responses are needed
Cache analytics — Monitor hit rates to understand cost savings

Rate Limiting

Rate limiting prevents unexpected cost spikes and protects against abuse. You can configure limits based on:

Requests per time window — Set maximum requests per second, minute, or hour
Sliding or fixed windows — Choose the counting method that fits your use case
Per-user or global limits — Apply limits at different granularity levels

Analytics and Logging

The AI Gateway dashboard provides real-time visibility into:

Request volume — Total requests, successful responses, and errors
Token usage — Input and output tokens consumed per model and provider
Cost tracking — Estimated spend across all providers
Latency metrics — Response times by provider and model
Error rates — Provider-level error tracking for debugging

All requests are logged with full request and response metadata, enabling post-hoc analysis and debugging.

Request Retries and Model Fallback

When a provider returns an error, AI Gateway can automatically retry the request or fall back to an alternative model or provider. This keeps your application running even when individual providers experience outages.

Guardrails

AI Gateway includes content moderation and safety features that can filter requests and responses based on configurable rules, helping prevent misuse and enforce content policies.

Supported Providers

Cloudflare AI Gateway supports a broad range of AI providers:

OpenAI — GPT, o-series, and other models
Anthropic — Claude model family
Google AI Studio — Gemini models
Google Vertex AI — Enterprise Gemini access
AWS Bedrock — Multi-provider managed models
Azure OpenAI — Microsoft-hosted OpenAI models
Mistral AI — Mistral and Mixtral models
Meta / Llama — Via supported inference providers
Groq — High-speed inference
Cohere — Command and Embed models
DeepSeek — DeepSeek models
xAI — Grok models
Workers AI — Cloudflare's own serverless AI inference
Replicate — Open-source model hosting
HuggingFace — Open-source model inference
Perplexity — Search-augmented models
ElevenLabs — Voice and audio generation
Fal AI — Image and video generation

Workers AI Integration

Cloudflare AI Gateway integrates natively with Workers AI, Cloudflare's serverless AI inference platform. You can route Workers AI requests through the gateway using either the REST API or an environment binding in your Worker script:

// Using the AI binding in a Cloudflare Worker
export default {
  async fetch(request, env) {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      messages: [
        { role: 'user', content: 'What is an AI Gateway?' }
      ],
    });

    return Response.json(response);
  },
};

When the Worker is configured with an AI Gateway binding, all env.AI.run() calls automatically route through the gateway, gaining caching, rate limiting, and analytics without additional code.

Universal Endpoint

For non-Workers applications, AI Gateway provides a universal endpoint that proxies requests to any supported provider. You prepend the gateway URL to your existing provider API calls:

# Instead of calling OpenAI directly:
# https://api.openai.com/v1/chat/completions

# Route through AI Gateway:
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

This works with any HTTP client or SDK — you only need to change the base URL.

Pricing

Cloudflare AI Gateway is free on all Cloudflare plans:

No per-call gateway fee — Core gateway features (analytics, caching, rate limiting, logging) are free
Provider costs — You pay the underlying AI provider directly for model usage
Unified Billing — Cloudflare offers the option to pay for third-party model usage directly through your Cloudflare invoice, with a small transaction convenience fee
Workers AI — Cloudflare's own inference service has its own pricing, with a generous free tier

When to Use Cloudflare AI Gateway

Best suited for:

Applications already using Cloudflare Workers or the Cloudflare developer platform
Teams that want free caching and rate limiting for AI requests
Projects that need detailed analytics and cost tracking across providers
Applications with predictable query patterns that benefit from response caching
Teams that want a single dashboard for all AI provider usage

Consider alternatives when:

You need tight SDK integration (Vercel AI Gateway may be better for AI SDK users)
You want a unified API that normalizes request formats across providers (OpenRouter or LiteLLM)
You need advanced routing features like cost-optimized or throughput-optimized routing
You require an on-premises or self-hosted solution

Related Resources

AI Gateways — Overview of AI Gateways and when to use them
Cloudflare — General Cloudflare platform overview
Vercel AI Gateway — Alternative gateway integrated with Vercel AI SDK
OpenRouter — Alternative unified API with 300+ models

Cloudflare AI Gateway

Cloudflare AI Gateway

Overview

Key Features

Caching

Rate Limiting

Analytics and Logging

Request Retries and Model Fallback

Guardrails

Supported Providers

Workers AI Integration

Universal Endpoint

Pricing

When to Use Cloudflare AI Gateway

Related Resources

Related Topics in Cloud & Hosting