AI Agent Frameworks
AI agent frameworks are the libraries that turn an LLM call into an autonomous loop — the model picks tools, calls them, reads results, decides what to do next. By 2026 the category has split into roughly eight serious players with very different opinions: lightweight TypeScript SDKs that emphasize streaming UIs, Python-first graph orchestrators built for stateful long-horizon workflows, role-based crew abstractions for non-developers, and provider-locked SDKs from the model labs themselves.
This is the comparison: what each framework is built for, the tradeoffs, and how to pick.
Why this category fragmented
By late 2024 a single "agent framework" mostly meant LangChain. By 2026 there are at least eight serious options, fragmented across three axes:
- Language: TypeScript-first vs Python-first vs both.
- State: stateless tool loops vs durable multi-step graphs vs role-based crews.
- Lock-in: model-agnostic vs single-provider (Claude Agent SDK, OpenAI Agents SDK).
Picking is now a real architecture decision rather than a default. The framework you pick shapes what your agents can do — short tool loops vs hours-long stateful workflows — and where they can run.
The eight serious frameworks
Vercel AI SDK
What it is: TypeScript-native, streaming-first, built for embedding LLM features in web apps. Tight integration with Next.js, React (useChat, useCompletion), and the Vercel deployment platform.
Language: TypeScript.
State: stateless or manual. There is no built-in durability layer; for long-horizon agent loops you compose it with Vercel Workflow DevKit.
Tool calling: native, with Zod schemas for type-safe tool definitions. Provider-agnostic via "provider/model" strings through Vercel AI Gateway.
Observability: AI Gateway dashboards if you route through it; otherwise minimal built-in.
Best for: streaming chat UIs, AI features inside Next.js apps, multi-provider routing without lock-in.
Limitations: tool loops bounded by Vercel function timeouts (300–800s); not the right tool for hours-long autonomous agents.
See: AI SDK reference.
Claude Agent SDK
What it is: Anthropic's official agent SDK, exposing the same primitives that power Claude Code (file editing, bash, search, custom tools, subagents). Built for code-touching and codebase-aware agents.
Language: TypeScript and Python — the two SDKs are version-aligned.
State: session-based, persisted to disk. listSessions() and getSessionMessages() let you resume long-running agent sessions.
Tool calling: native, with built-in tools (Read, Write, Edit, Glob, Grep, Bash) and custom tools via the tool() helper or MCP servers.
Observability: hooks (PreToolUse, PostToolUse) for logging, validation, and refusal of specific actions.
Best for: agents that touch codebases, dev tooling, Claude-only workflows where the built-in tools cover most needs.
Limitations: Claude-only. Doesn't easily route across providers. Less appropriate for general chat-style agents that don't need the file/shell toolset.
See: Claude Agent SDK reference.
LangGraph
What it is: graph-based orchestration framework from the LangChain team. Workflows are directed graphs with cycles and persistent state. The most expressive option for complex multi-step agents.
Language: Python (TypeScript in beta).
State: built-in checkpointing — every node execution can persist to a backing store, allowing pause/resume across days. Best-in-class state management.
Tool calling: native, plus MCP support via adapter.
Observability: LangSmith — mature tracing and debugging UI; arguably the strongest debugging story in the category.
Best for: long-horizon stateful agents, complex multi-step orchestration with branching and recovery, Python-shop teams comfortable with the framework's depth.
Limitations: steep learning curve. The graph DSL is powerful but unfamiliar; smaller agents end up over-engineered.
Mastra
What it is: TypeScript-first agent framework with Agent Networks (LLM-routed multi-agent coordination), graph workflows, and built-in observability. Effectively the LangGraph of TypeScript.
Language: TypeScript.
State: built-in with multiple memory types (working memory, conversation history, semantic memory, episodic memory). Adapters for Postgres, Convex, and other stores.
Tool calling: native; native MCP support; OpenTelemetry built in.
Observability: built-in dashboard, local playground, autogenerated Swagger / OpenAPI docs.
Best for: TypeScript teams that want LangGraph-level capability without leaving the JS/TS ecosystem; full-stack apps where the agent runtime lives next to the API and the UI.
Limitations: newer than LangGraph; community and integration ecosystem still maturing.
CrewAI
What it is: Python framework that abstracts agents into roles, goals, backstories, and delegations. The role-based mental model is the differentiator — agents are designed like a small team rather than a single loop.
Language: Python.
State: short-term memory + long-term memory + task state. Less granular than LangGraph but easier to reason about.
Tool calling: native, with native MCP support since v1.10.
Observability: CrewAI dashboard; managed cloud option (CrewAI Enterprise / AMP Cloud) for hosted execution.
Best for: prototyping multi-agent systems quickly, teams that prefer high-level abstractions over graph-level control, fast iteration on "what if I had a researcher + writer + editor agent" patterns.
Limitations: less expressive than LangGraph for non-role-based workflows; the role abstraction can feel forced for systems that aren't actually role-based.
LangChain
What it is: the original, foundational agent framework. Mature, ubiquitous, and now somewhat displaced by LangGraph for new agent work.
Language: Python (TypeScript exists, less complete).
State: 8+ memory types — the broadest in the category.
Tool calling: native; broadest integration ecosystem (everything has a LangChain integration).
Observability: LangSmith.
Best for: general-purpose tasks, retrieval-augmented generation, prototyping where the integration ecosystem matters more than agent-loop sophistication.
Limitations: for new multi-agent or stateful work, LangGraph is the recommended evolution. LangChain alone is increasingly the wrong layer.
Pydantic AI
What it is: Python framework for type-safe agents with runtime validation. Minimal setup, Zod-style schemas, opinionated about correctness over feature breadth.
Language: Python.
State: manual (via tools and code).
Tool calling: native; community MCP support.
Observability: Logfire (Pydantic team's observability product).
Best for: Python teams that prioritize type safety and runtime validation, single-agent systems where correctness is paramount, integrations with FastAPI / Pydantic-heavy stacks.
Limitations: lighter on multi-agent and complex orchestration than LangGraph or CrewAI; primarily a single-agent abstraction with optional pydantic-graph for multi-step.
OpenAI Agents SDK
What it is: OpenAI's official agent SDK, released in March 2025. Lightweight Python framework with strong adoption (10M+ monthly downloads, 19k+ GitHub stars).
Language: Python; TypeScript SDK community-driven.
State: lightweight session management; less full-featured than LangGraph or Mastra.
Tool calling: native, optimized for OpenAI's o3 and gpt-4o models with strong typed tool calling.
Observability: OpenAI Traces dashboard.
Best for: OpenAI-only workflows, teams that already pay for OpenAI and want minimal framework overhead, agent prototypes where the OpenAI-specific tuning produces best output.
Limitations: provider lock-in — like Claude Agent SDK in reverse. Doesn't route across providers well.
Side-by-side
| Vercel AI SDK | Claude Agent SDK | LangGraph | Mastra | CrewAI | LangChain | Pydantic AI | OpenAI Agents | |
|---|---|---|---|---|---|---|---|---|
| Language | TS | TS + Py | Py (TS beta) | TS | Py | Py (TS-lite) | Py | Py (TS comm.) |
| State | Stateless / external | Sessions | Built-in checkpointing | Built-in (4 memory types) | Short + long memory | 8 memory types | Manual | Light sessions |
| Tool calling | Native (Zod) | Native + built-ins | Native + MCP | Native + MCP | Native + MCP | Native | Native | Native |
| Multi-agent | Manual | Subagents native | Graph nodes | Agent Networks | Crews (core) | Via LangGraph | Pydantic-graph | Light |
| Observability | AI Gateway | Hooks | LangSmith | Built-in | Dashboard | LangSmith | Logfire | Traces |
| Provider lock-in | None (gateway) | Anthropic | None | None | None | None | None | OpenAI |
| Long-running agents | No (timeouts) | Sessions | Yes (best) | Yes | Limited | Limited | Limited | Limited |
| Maturity | Mature | New (2025) | Mature | New (2024) | Mature | Most mature | Mature | New (2025) |
| Best for | TS web apps | Code-touching agents | Stateful Python workflows | TS multi-agent | Role-based crews | RAG, prototypes | Type-safe Python | OpenAI-native |
Decision matrix
| Job-to-be-done | Pick |
|---|---|
| Streaming chat UI in Next.js | Vercel AI SDK + maybe Workflow DevKit if you need durability |
| Agent that edits a codebase | Claude Agent SDK — the built-in file/shell tools are the right toolset |
| Multi-step Python workflow with state | LangGraph — the standard answer in 2026 |
| Multi-agent system in TypeScript | Mastra — closest LangGraph-equivalent in TS |
| Quick role-based prototype | CrewAI — if "researcher + writer + editor" maps to your work |
| OpenAI-native single-purpose agent | OpenAI Agents SDK — best output if you're committed to OpenAI |
| Type-safe agent with runtime validation | Pydantic AI — Pydantic-shop default |
| Hours-long durable agent (any language) | LangGraph (Python) or Mastra + Vercel Workflow (TS) |
| Provider-agnostic streaming | Vercel AI SDK + AI Gateway |
| Production agent shipping in 2 weeks | Claude Agent SDK or Mastra — fastest to working code |
If forced to pick a single default for new TypeScript work in 2026: Mastra. For Python: LangGraph. For Claude-only code-touching agents in either: Claude Agent SDK.
Honest tradeoffs you'll feel
- Vercel AI SDK is great until your agent needs to run for 20 minutes. Tool loops bounded by function timeouts; pair with Vercel Workflow DevKit for durability.
- LangGraph is the most expressive but the steepest curve. Small agents over-engineered against it look like they should have been a single function.
- CrewAI is the fastest to "running multi-agent prototype" but the role abstraction can feel forced. Some workflows aren't actually role-based; forcing them into the abstraction makes them harder.
- Provider-locked SDKs (Claude Agent, OpenAI Agents) trade flexibility for tight integration. Worth it if the lock-in matches your strategy; expensive if you need to migrate.
- The best observability story is LangSmith (LangChain / LangGraph). The closest TypeScript alternative is Mastra's built-in dashboard plus OpenTelemetry.
What none of these solve
- Truly autonomous, hours-long agents are still hard regardless of framework. Token cost, drift, and the lack of self-correcting loops produce expensive failures. Limit
max_stepsaggressively. - Quality regression tracking. No framework's observability fully captures "is the agent producing worse output than last week?" That's the LLM Quality Monitoring layer's job.
- Cross-framework portability. Code written for LangGraph doesn't move to Mastra without significant rewriting. Pick deliberately.
Cross-references
Frameworks that pair with the agents you build here:
- Browser automation in agents: Browserbase
- Sandboxed code execution: Vercel Sandbox
- Routing across providers: Vercel AI Gateway
- MCP for tool integration: MCP — Model Context Protocol
- Memory systems for long-running agents: AI Agent Memory Systems, AI Memory Architecture Decision Framework
- Coding harnesses (different category): Claude Code, Cursor