Navigating the Agentic Framework Landscape - A Practical Evaluation

The agentic AI landscape is evolving rapidly, and as a developer, the following question often arises: What framework should I use? LangChain, OpenAI Agents SDK, Google ADK, Autogen, CrewAI, and many other frameworks are available.

In this post, I evaluate these frameworks based on a specific use case: building a single-agent chat-based assistant. I assess them on feature completeness, usability, control, and production readiness.

After looking over the frameworks and writing a first version of this post, I stumbled upon the blog post Choosing Your Agent Framework by Allen Hutchison, which covers almost the same topic. Interestingly, we had come to very similar findings! As he explains, building agents is significantly harder than a simple loop due to requirements like tool validation, persistence, and error management. This justifies the existence of agentic frameworks and their sometimes criticized complexity. Let’s dive in!

What is Common Across Frameworks

Before diving into the specifics, it’s worth noting what all these frameworks share. Each provides a main class or method that defines the agent with its system prompt, tools, and other specifications. They all have a method or class that runs the agent with some input (typically a user message) and yields data back.

agent = Agent(
    system_prompt=...,
    model=...,
    tools=...,
    other_things=...,
)

async for event in agent.run_stream(input_):
    event = process_event(event)
    if event.worth_yielding():
        yield event

The process_event logic (the code that handles the stream of events coming from the agent) is unavoidable and somewhat painful to write. What makes it tedious is the variety of event types you need to handle: partial JSON deltas, tool call chunks versus text chunks, and different event shapes across providers. It tends to be very debug-based: you run the agent, inspect what events come back, and write handlers accordingly. The documentation across all frameworks is often complicated on this point.

CrewAI (v1.4.0)

CrewAI is distinct from the others because it is purpose-built for multi-agent task workflows, not conversational chatbots. It is structured around the concepts of an Agent, a Task, and a Crew.

If you are trying to build a chat interface, you will hit friction immediately. CrewAI has no native support for chat interactions, token streaming, or conversation memory. It is designed to take a task and execute it until completion, rather than maintaining a persistent dialogue with a user. Take a look at the following example:

crew = Crew(
    agents=[research_agent, writer_agent],
    tasks=[research_task, write_article_task],
    verbose=True
)
crew_output = crew.kickoff()

A workaround to build a conversational chatbot is to define a single agent with its task being to answer the user’s question based on the chat history, but it feels very unnatural. If your use case is truly multi-agent task execution, CrewAI shines. For chat-based assistants, look elsewhere. Although it’s worth noting that developers effectively use CrewAI tools inside a LangChain/LangGraph agent to get the best of both worlds: a proper chat interface with complex task delegation in the backend.

LangChain (v1.0.3)

LangChain v1 includes core tools to build agents that were previously in LangGraph, and rewrites how their single agent works. It remains the heavyweight in the room, bringing maturity and a massive ecosystem.

Setting up streaming can be confusing. The different streaming modes take a bit longer to understand, and the process_event function requires patience to set up correctly. There are also minor changes in streamed data across model providers, but they are manageable once you know what to expect.

async for mode, data in agent.astream(
    input_,
    {"configurable": {"thread_id": "1"}},
    stream_mode=["messages", "values"],
):
    ...

On the data side, LangChain has an excellent unified vector store interface with batch indexing and many integrations. RAG tools can be implemented natively. Even if you don’t use LangChain for the agent itself, the vector store interface seems like a solid option on its own. Production-ready checkpointing and chat memory are available with different providers like SQLite and Postgres.

A standout feature in v1.0.0 is the middleware system, which allows you to run logic before or after the LLM, Agent, or Tool executes. This is ideal for context management, and there are built-in middlewares for summarization, PII detection, and more. Workflow interruptions are also supported via HumanInTheLoopMiddleware, which is a feature many other frameworks lack.

agent = create_agent(
    model=model,
    tools=[python_interpreter, retrieve_context, internet_search] + mcp_tools,
    system_prompt=SYSTEM_PROMPT,
    checkpointer=agent_checkpointer,
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                "python_interpreter": {
                    "allowed_decisions": ["approve", "edit", "reject"]
                },
            },
            description_prefix="Tool execution requires approval",
        ),
    ],
)

For tools that need access to context and runtime information (like chat history, memory, or state) LangChain provides ToolRuntime, an optional parameter that can be passed to tools. This makes tools significantly more powerful.

from langchain.tools import tool, ToolRuntime

@tool
def summarize_conversation(runtime: ToolRuntime) -> str:
    """Summarize the conversation so far."""
    messages = runtime.state["messages"]
    human_msgs = sum(1 for m in messages if m.__class__.__name__ == "HumanMessage")
    ai_msgs = sum(1 for m in messages if m.__class__.__name__ == "AIMessage")
    tool_msgs = sum(1 for m in messages if m.__class__.__name__ == "ToolMessage")
    return (
        f"Conversation has {human_msgs} user messages, "
        f"{ai_msgs} AI responses, and {tool_msgs} tool messages."
    )

On the downside, the code interpreter tools suggested in the docs require API keys from external services, which seem no longer accepting new users or are obsolete. However, there are plenty of web search integrations available that work well.

Autogen (v0.7.5)

Autogen is simple to stream and get working quickly. It’s excellent for rapid prototyping when you want to get something running fast.

However, the non-OpenAI provider integration feels experimental. I ran into issues where the reflect_on_tool_use parameter sometimes defaults to False, which causes the agent to call tools but then stop without providing an answer to the user. This is a frustrating default that can waste debugging time.

Autogen proposes an interesting memory protocol: a vector store is queried on every agent call, and the top results extend the context automatically. It’s a neat idea, though the vector store integrations are more limited and the implementations feel slower than LangChain’s.

memory = ListMemory()
memory.add(MemoryContent(content="User likes pizza.", mime_type="text/plain"))
memory.add(MemoryContent(content="User dislikes cheese.", mime_type="text/plain"))

agent = AssistantAgent(
    name="assistant",
    memory=[memory],
    system_message="You are a helpful assistant.",
)

One strong point is the built-in code interpreter with safe Docker execution. Chat persistence works by default, but there’s no database integration available, so you’re limited in production scenarios. The CodeExecutorAgent (experimental) allows a human to validate code executions before they run, which is a nice touch.

Fine-grained control equivalent to LangChain’s middleware is missing. The documentation suggests a “group-chat” pattern to achieve human-in-the-loop validation, but it’s not very convincing.

from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
assistant = AssistantAgent("assistant", model_client=model_client)
user_proxy = UserProxyAgent("user_proxy", input_func=input)

termination = TextMentionTermination("APPROVE")
team = RoundRobinGroupChat([assistant, user_proxy], termination_condition=termination)

I also could not find a way to give tools access to runtime information like LangChain’s ToolRuntime.

OpenAI Agents SDK (v0.5.0)

The OpenAI Agents SDK is minimalist and solid. It strips away much of the abstraction found in LangChain, which can be refreshing.

The agent needs a Runner to execute, but it’s easy to define the process_event logic compared to other frameworks. Non-OpenAI providers are only supported through LiteLLM, so if you’re not using OpenAI models, there’s an extra layer of indirection.

result = Runner.run_streamed(agent, user_msg)
async for event in result.stream_events():
    if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
        yield event.data.delta

There’s no RAG or vector store support built-in (you have to implement it yourself). A Python interpreter is available with built-in cloud execution, but it’s undocumented and hard to discover. Chat persistence with SQLite or SQLAlchemy works out of the box.

A highlight is the very flexible context object that can be used to store state information. The context is available to all tools, making it practically equivalent to LangChain’s ToolRuntime. This is essential for building tools that need access to chat history or other runtime state.

from agents import Agent, RunContextWrapper, Runner, function_tool
from dataclasses import dataclass

@dataclass
class UserInfo:
    name: str
    uid: int

@function_tool
async def fetch_user_age(wrapper: RunContextWrapper[UserInfo]) -> str:
    """Fetch the age of the user."""
    return f"The user {wrapper.context.name} is 47 years old"

The guardrail options are limited (you can intercept and validate user input and agent output, but that’s about it). Notably, there’s no support for human-in-the-loop workflows in the Python SDK, though the JavaScript version does have it.

Google ADK (v1.18.0)

Google’s Agent Development Kit is similar to OpenAI’s SDK in that it has native support for Gemini models, but LiteLLM is needed for everything else. It’s a more ambitious framework than the OpenAI SDK, including multi-agent support, sub-agents with agent hierarchies, and other unique features.

Defining and running the agent is less straightforward than other frameworks. The interfaces can be cumbersome (for example, types.Content contains types.Part, which adds verbosity). Streaming was particularly frustrating: I couldn’t make simple token streaming work. ADK introduces bidirectional streaming, but I couldn’t get that working either. Note that higher-level helpers like event.text_content or async_stream_query exist to simplify this, but I’m showing the lower-level Runner API, as it would be the equivalent of process_event in other frameworks and the first implementation that is suggested in the documentation. Here is what the streaming code looks like:

message = types.Content(role="user", parts=[types.Part(text=user_msg)])
async for event in runner.run_async(
    user_id=user_id,
    session_id=thread_id,
    new_message=message,
):
    if not (hasattr(event, "is_final_response") and event.is_final_response()):
        continue
    if not (hasattr(event, "content") and event.content):
        continue
    if not (hasattr(event.content, "parts") and event.content.parts):
        continue
    for part in event.content.parts:
        if not (hasattr(part, "text") and part.text):
            continue
        yield part.text

The memory protocol has two components: chat history and a vector store that can be queried by tools. For development, there’s InMemorySessionService, and for production, the managed VertexAiMemoryBankService is available.

There are a lot of dependencies on GCP. Code execution engines are available in preview, hosted in GCP. RAG is handled through the Vertex AI RAG Engine. If you’re already in the Google ecosystem, these integrations are convenient, otherwise it might be a dealbreaker.

A unique feature is the ability to resume stopped agents (which no other framework offers). Extra control over agent behavior is provided through a “planner” that touches the model configuration (including thinking mode) and the system prompt. Callbacks are supported with the familiar pattern: before/after for LLM, Agent, and Tool execution.

app = App(
    name='my_resumable_agent',
    root_agent=root_agent,
    resumability_config=ResumabilityConfig(
        is_resumable=True,
    ),
)

my_agent = Agent(
    model="Gemini-2.5-flash",
    planner=BuiltInPlanner(
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=1024,
        ),
    ),
    # ... your tools here
)

Summary and Recommendations

When comparing the frameworks side-by-side, the choice depends heavily on your specific constraints.

Vibe-based Comparison

Framework	Good	Bad
LangChain	✓ Maturity, control, ecosystem	✗ Overly complex, learning curve
Autogen	✓ Simple, good built-in tools	✗ Lacks control and production readiness
OpenAI Agents SDK	✓ Minimalist and solid	✗ Lacks control and customizability
Google ADK	✓ Feature-rich, good bridge single- to multi-agent	✗ Cumbersome interfaces, many Google Cloud dependencies, cryptic streaming setup

Detailed Scoring

I rated the frameworks on interface clarity, completeness, customizability, production readiness, and documentation.

Framework	Interface Clarity	Completeness	Customizability	Production Readiness	Documentation	Mean
LangChain	★★	★★★	★★★	★★★	★★★	2.8
Autogen	★★	★	★★	★	★★	1.4
OpenAI Agents SDK	★★★	★★	★★	★★	★★	2.2
Google ADK	★★	★★	★★★	★★	★★★	2.4

From the scores we can conclude that:

LangChain: 2.8 — Winner on completeness and ecosystem
Google ADK: 2.4 — Strong customizability
OpenAI Agents SDK: 2.2 — Best interface clarity
Autogen: 1.4 — Struggles with production readiness

Allen Hutchison’s Decision Guide

I wanted to finish the post by adding Allen Hutchison’s decision guide, which is not far from my own findings.

Based on your System:

Need granular control? → LangGraph (LangChain)
Need minimalism? → OpenAI Agents SDK
RAG-centered application? → Phidata
Building an enterprise-scale hierarchy? → Google ADK
Need intuitive multi-agent roles? → CrewAI
Conversational collaboration? → Microsoft Autogen

Based on Constraints:

Need to prototype fast? → CrewAI or LangChain
Production in Google Cloud? → Google ADK
Production in Azure? → Microsoft Autogen
Mixed technical team? → CrewAI
No/low-code requirements? → Voiceflow, Botpress, n8n

For a deeper dive into the code and to run the benchmarks yourself, check out the repository: h2oai/agentic-frameworks