2026-02-22

Unmasking the Black Box: Mastering LangChain Callbacks for Debugging

7 min readAI EngineeringTechnical GuidesMonitoring & DevOpsLangChainPythonAI DevelopmentDebuggingSoftware EngineeringLLM Observability

A comprehensive guide for developers on using LangChain's callback system to monitor, debug, and optimize AI applications.

The Black Box Problem

If you have deployed an LLM application beyond a simple "Hello World" script, you know the feeling. You run a complex chain—maybe an Agent with access to three different tools and a vector database. You hit enter. The terminal hangs for six seconds. The output arrives, and it’s... not what you asked for.

Where did it go wrong? Did the retrieval step fail? Did the agent hallucinate parameters for the tool? Did the system prompt get overridden?

When building with Large Language Models (LLMs), we often treat them as black boxes. We feed in text, and we get text out. But in engineering, observability is not optional. To build robust micro-SaaS tools or intelligent agents, you need to see the nervous system of your application. You need to know exactly when a chain starts, what the prompt looked like after template injection, and how many tokens were consumed.

In the LangChain ecosystem, the answer to this is Callbacks.

In this post, we are going to tear down the LangChain callback system. I’m not just going to show you how to print to the console; we are going to build custom handlers that give you actual control over your application's execution flow.

What Are Callbacks in LangChain?

At a high level, LangChain’s callback system is an implementation of the Observer Pattern. It allows you to hook into various stages of your LLM application's lifecycle.

Your application emits "events" during execution, such as:

on_llm_start: When the LLM starts processing.
on_llm_end: When the LLM finishes generating text.
on_chain_start: When a chain (sequence of calls) begins.
on_tool_start: When an agent decides to use a specific tool.
on_tool_error: When that tool crashes.

By attaching a CallbackHandler to these events, you can log data, stream tokens to a frontend, inspect intermediate steps, or even calculate costs in real-time. This is the difference between hoping your code works and knowing how it works.

Level 1: The Quick Debug (StdOutCallbackHandler)

Before we write custom code, let’s look at the built-in tool that saves me hours of headache: the StdOutCallbackHandler.

When you are prototyping a chain, you often want to see exactly what is happening under the hood without setting up complex logging infrastructure. LangChain includes a handler that prints everything to your terminal.

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain

# Initialize the handler
handler = StdOutCallbackHandler()

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
prompt = ChatPromptTemplate.from_template("Explain {topic} to a 5 year old.")

# Pass the callback when creating the chain (Constructor callback)
chain = prompt | llm

# Or pass it during invocation (Request callback)
response = chain.invoke(
    {"topic": "Quantum Physics"}, 
    config={"callbacks": [handler]}
)

Why is this useful?
When you run this, your terminal won't just show the final answer. It will show you the exact formatted prompt sent to OpenAI. If you are using few-shot prompting or dynamic variable injection, this is the fastest way to verify that your prompt templates are rendering correctly.

Level 2: Building a Custom Callback Handler

Printing to the console is fine for localhost, but in production, we need structured logs. We might want to send data to Datadog, save inputs to a database for fine-tuning later, or track latency.

To do this, we subclass BaseCallbackHandler.

Let’s build a custom handler that logs the prompt and the completion tokens, effectively creating a simple audit log for our application.

from langchain.callbacks.base import BaseCallbackHandler
from typing import Dict, Any, List
import json
import time

class AuditLogHandler(BaseCallbackHandler):
    def __init__(self):
        self.step_times = {}

    def on_chain_start(self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any) -> Any:
        print(f"[INFO] Chain Started with inputs: {inputs.keys()}")
        self.step_times['chain_start'] = time.time()

    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> Any:
        print("[INFO] LLM processing started...")
        # This is crucial: seeing the ACTUAL prompt sent to the model
        for i, p in enumerate(prompts):
            print(f"\n--- PROMPT {i} PREVIEW ---\n{p[:100]}...\n-----------------------")

    def on_llm_end(self, response, **kwargs: Any) -> Any:
        duration = time.time() - self.step_times.get('chain_start', time.time())
        print(f"[INFO] LLM Finished in {duration:.2f}s")
        # Access token usage if available in the response
        if response.llm_output and 'token_usage' in response.llm_output:
            usage = response.llm_output['token_usage']
            print(f"[COST] Tokens used: {usage}")

    def on_tool_start(self, serialized: Dict[str, Any], input_str: str, **kwargs: Any) -> Any:
        print(f"[AGENT] Tool Requested: {serialized.get('name')} with input: {input_str}")

The Breakdown:

Inheritance: We inherit from BaseCallbackHandler so we don't have to implement every single method, only the ones we care about.
Inspection: In on_llm_start, we inspect prompts. This is the raw text hitting the API. If you have a bug where your context isn't being inserted, you will see it here.
Performance: In on_llm_end, we can calculate latency or inspect metadata like token counts provided by the provider.

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

This is where most developers trip up. In LangChain, there are two ways to inject these callbacks, and the difference matters for thread safety and scope.

1. Constructor Callbacks (Global/Object Scope)

You pass the callback when you initialize the LLM or Chain.

llm = ChatOpenAI(callbacks=[AuditLogHandler()])

Use case: Logging that should happen for every call this object makes, regardless of the user. Good for system-wide monitoring.

2. Request Callbacks (Runtime Scope)

You pass the callback inside the invoke, run, or call method via the config parameter.

chain.invoke(
    {"input": "..."}, 
    config={"callbacks": [AuditLogHandler()]}
)

Use case: This is critical for web servers (FastAPI/Django). If you want to stream a response back to a specific user via WebSocket, you must use Request Callbacks. If you attach a streaming handler to the global Constructor, and two users hit your API at once, User A might receive User B’s tokens. Always use Request callbacks for user-session specific logic.

Handling Async: The `AsyncCallbackHandler`

Modern AI engineering is asynchronous. If you are building a high-throughput agent, you are likely using await chain.ainvoke().

If you use a standard synchronous callback handler in an async chain, you might block the event loop. LangChain provides AsyncCallbackHandler. The methods are the same, just prefixed with on_... and defined as async def.

from langchain.callbacks.base import AsyncCallbackHandler

class AsyncStreamLogger(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        # This method is called for every token generated (Streaming)
        print(f"{token}", end="", flush=True)

Use on_llm_new_token for streaming logic. This is how chat interfaces like ChatGPT make the text appear as if it's being typed in real-time. By hooking into this event, you can push tokens into a WebSocket queue immediately.

Real-World Use Case: Cost Tracking

One of the most practical uses of callbacks is managing the "API Bill Shock." You can use the built-in get_openai_callback context manager to track spending for a specific block of code.

from langchain_community.callbacks import get_openai_callback

with get_openai_callback() as cb:
    result = chain.invoke({"topic": "The future of AI"})
    result2 = chain.invoke({"topic": "Rust vs Python"})
    
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Under the hood, this context manager uses a thread-local callback handler to sum up token usage from the LLM outputs. It cleans itself up automatically when the context exits.

Conclusion: Observability is a Feature

Building with LangChain without callbacks is like flying a plane with no instrument panel. You might get where you're going, but you won't know how much fuel you used, and if you crash, you won't know why.

Whether you are using the simple StdOutCallbackHandler for a quick debug session or writing custom Async handlers to stream data to a React frontend, mastering this system is essential for moving from "script kiddie" to "AI Engineer."

Start by adding the StdOut handler to your current project. Look at the prompts. You might be surprised by what your application is actually sending to the model.

Comments

Loading comments...

2026-02-22

Unmasking the Black Box: Mastering LangChain Callbacks for Debugging

7 min readAI EngineeringTechnical GuidesMonitoring & DevOpsLangChainPythonAI DevelopmentDebuggingSoftware EngineeringLLM Observability

A comprehensive guide for developers on using LangChain's callback system to monitor, debug, and optimize AI applications.

The Black Box Problem

Where did it go wrong? Did the retrieval step fail? Did the agent hallucinate parameters for the tool? Did the system prompt get overridden?

In the LangChain ecosystem, the answer to this is Callbacks.

What Are Callbacks in LangChain?

At a high level, LangChain’s callback system is an implementation of the Observer Pattern. It allows you to hook into various stages of your LLM application's lifecycle.

Your application emits "events" during execution, such as:

on_llm_start: When the LLM starts processing.
on_llm_end: When the LLM finishes generating text.
on_chain_start: When a chain (sequence of calls) begins.
on_tool_start: When an agent decides to use a specific tool.
on_tool_error: When that tool crashes.

Level 1: The Quick Debug (StdOutCallbackHandler)

Before we write custom code, let’s look at the built-in tool that saves me hours of headache: the StdOutCallbackHandler.

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain

# Initialize the handler
handler = StdOutCallbackHandler()

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
prompt = ChatPromptTemplate.from_template("Explain {topic} to a 5 year old.")

# Pass the callback when creating the chain (Constructor callback)
chain = prompt | llm

# Or pass it during invocation (Request callback)
response = chain.invoke(
    {"topic": "Quantum Physics"}, 
    config={"callbacks": [handler]}
)

Level 2: Building a Custom Callback Handler

Printing to the console is fine for localhost, but in production, we need structured logs. We might want to send data to Datadog, save inputs to a database for fine-tuning later, or track latency.

To do this, we subclass BaseCallbackHandler.

Let’s build a custom handler that logs the prompt and the completion tokens, effectively creating a simple audit log for our application.

from langchain.callbacks.base import BaseCallbackHandler
from typing import Dict, Any, List
import json
import time

class AuditLogHandler(BaseCallbackHandler):
    def __init__(self):
        self.step_times = {}

    def on_chain_start(self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any) -> Any:
        print(f"[INFO] Chain Started with inputs: {inputs.keys()}")
        self.step_times['chain_start'] = time.time()

    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> Any:
        print("[INFO] LLM processing started...")
        # This is crucial: seeing the ACTUAL prompt sent to the model
        for i, p in enumerate(prompts):
            print(f"\n--- PROMPT {i} PREVIEW ---\n{p[:100]}...\n-----------------------")

    def on_llm_end(self, response, **kwargs: Any) -> Any:
        duration = time.time() - self.step_times.get('chain_start', time.time())
        print(f"[INFO] LLM Finished in {duration:.2f}s")
        # Access token usage if available in the response
        if response.llm_output and 'token_usage' in response.llm_output:
            usage = response.llm_output['token_usage']
            print(f"[COST] Tokens used: {usage}")

    def on_tool_start(self, serialized: Dict[str, Any], input_str: str, **kwargs: Any) -> Any:
        print(f"[AGENT] Tool Requested: {serialized.get('name')} with input: {input_str}")

The Breakdown:

Inheritance: We inherit from BaseCallbackHandler so we don't have to implement every single method, only the ones we care about.
Inspection: In on_llm_start, we inspect prompts. This is the raw text hitting the API. If you have a bug where your context isn't being inserted, you will see it here.
Performance: In on_llm_end, we can calculate latency or inspect metadata like token counts provided by the provider.

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

This is where most developers trip up. In LangChain, there are two ways to inject these callbacks, and the difference matters for thread safety and scope.

1. Constructor Callbacks (Global/Object Scope)

You pass the callback when you initialize the LLM or Chain.

llm = ChatOpenAI(callbacks=[AuditLogHandler()])

Use case: Logging that should happen for every call this object makes, regardless of the user. Good for system-wide monitoring.

2. Request Callbacks (Runtime Scope)

You pass the callback inside the invoke, run, or call method via the config parameter.

chain.invoke(
    {"input": "..."}, 
    config={"callbacks": [AuditLogHandler()]}
)

Handling Async: The `AsyncCallbackHandler`

Modern AI engineering is asynchronous. If you are building a high-throughput agent, you are likely using await chain.ainvoke().

from langchain.callbacks.base import AsyncCallbackHandler

class AsyncStreamLogger(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        # This method is called for every token generated (Streaming)
        print(f"{token}", end="", flush=True)

Real-World Use Case: Cost Tracking

One of the most practical uses of callbacks is managing the "API Bill Shock." You can use the built-in get_openai_callback context manager to track spending for a specific block of code.

from langchain_community.callbacks import get_openai_callback

with get_openai_callback() as cb:
    result = chain.invoke({"topic": "The future of AI"})
    result2 = chain.invoke({"topic": "Rust vs Python"})
    
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Under the hood, this context manager uses a thread-local callback handler to sum up token usage from the LLM outputs. It cleans itself up automatically when the context exits.

Conclusion: Observability is a Feature

Start by adding the StdOut handler to your current project. Look at the prompts. You might be surprised by what your application is actually sending to the model.

Comments

Loading comments...

Unmasking the Black Box: Mastering LangChain Callbacks for Debugging

The Black Box Problem

What Are Callbacks in LangChain?

Level 1: The Quick Debug (StdOutCallbackHandler)

Level 2: Building a Custom Callback Handler

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

1. Constructor Callbacks (Global/Object Scope)

2. Request Callbacks (Runtime Scope)

Handling Async: The `AsyncCallbackHandler`

Real-World Use Case: Cost Tracking

Conclusion: Observability is a Feature

Comments

Add a comment

Unmasking the Black Box: Mastering LangChain Callbacks for Debugging

The Black Box Problem

What Are Callbacks in LangChain?

Level 1: The Quick Debug (StdOutCallbackHandler)

Level 2: Building a Custom Callback Handler

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

1. Constructor Callbacks (Global/Object Scope)

2. Request Callbacks (Runtime Scope)

Handling Async: The `AsyncCallbackHandler`

Real-World Use Case: Cost Tracking

Conclusion: Observability is a Feature

Comments

Add a comment

The Black Box Problem

What Are Callbacks in LangChain?

Level 1: The Quick Debug (StdOutCallbackHandler)

Level 2: Building a Custom Callback Handler

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

1. Constructor Callbacks (Global/Object Scope)

2. Request Callbacks (Runtime Scope)

Handling Async: The AsyncCallbackHandler

Real-World Use Case: Cost Tracking

Conclusion: Observability is a Feature

Comments

Add a comment

The Black Box Problem

What Are Callbacks in LangChain?

Level 1: The Quick Debug (StdOutCallbackHandler)

Level 2: Building a Custom Callback Handler

Level 3: Constructor vs. Request Callbacks (The Scope Trap)

1. Constructor Callbacks (Global/Object Scope)

2. Request Callbacks (Runtime Scope)

Handling Async: The AsyncCallbackHandler

Real-World Use Case: Cost Tracking

Conclusion: Observability is a Feature

Comments

Add a comment

Handling Async: The `AsyncCallbackHandler`

Handling Async: The `AsyncCallbackHandler`