2026-02-21

LangChain vs. LlamaIndex vs. Vanilla: The Builder's Stack Selection Matrix

7 min readEngineeringBuildAI AgentsLangChainLLMLlamaIndexAI DevelopmentSoftware Architecture

A comprehensive decision matrix for AI engineers choosing between LangChain, LlamaIndex, and DIY stacks. Includes architecture diagrams, folder structures, and complexity analysis.

If you hang out in AI engineering Discord servers long enough, you’ll see the same question asked ten times a day: "Should I use LangChain or just call the API directly?"

It’s the wrong question.

The right question is: Where does the complexity in your application live?

As builders, we often fall into the trap of "Resume Driven Development." We pick tools because they are trending, not because they solve the specific friction point of our architecture. I’ve built micro-SaaS tools using raw OpenAI calls, and I’ve built complex agentic workflows using LangGraph. Neither approach is universally correct.

This guide is the decision matrix I use when scoping a new build. No marketing fluff—just trade-offs, architecture patterns, and the code structures that actually work in production.

The Landscape: Three distinct paths

Before we look at code, let’s define the three main contenders in the orchestration layer.

1. LangChain / LangGraph (The Generalist)

Best for: Agentic workflows, tool calling, and applications that need to switch context frequently.

LangChain is the "Swiss Army Knife." It provides standard interfaces for everything (memory, prompt templates, vector stores). However, its biggest value add currently is LangGraph. If you are building stateful agents that need loops, conditional branching, and human-in-the-loop checkpoints, building this from scratch is painful. LangGraph solves the "control flow" problem.

2. LlamaIndex (The Data Expert)

Best for: RAG (Retrieval-Augmented Generation), dealing with messy data sources, and structured output.

If your app is essentially "Chat with my PDF/Database/Notion," LlamaIndex is usually the superior choice. Their data ingestion pipeline (LlamaHub) and indexing strategies are far more mature than LangChain’s equivalents. They treat data retrieval as a first-class citizen, not an afterthought.

3. Vanilla / DIY (The Surgeon)

Best for: Single-purpose features, high-performance endpoints, and learning the fundamentals.

"Vanilla" means using the official SDKs (OpenAI, Anthropic, Mistral) directly. This is the path of least resistance for debugging. When you use a framework, you inherit its abstractions. When those abstractions break, you are debugging the framework, not your code. For simple prompts or linear chains, frameworks add unnecessary latency and complexity.

The Decision Matrix: What to build?

I don't make decisions based on "vibes." I use a complexity heuristic. Here is the decision logic I apply to every new project:

🚀 The "What Stack?" Flowchart

Question 1: Does the app require looping, self-correction, or multi-step planning?
- Yes: Use LangGraph (LangChain).
- No: Go to Question 2.
Question 2: Is the primary value based on retrieving specific data from large, unorganized documents?
- Yes: Use LlamaIndex.
- No: Go to Question 3.
Question 3: Is this a linear process (Input -> Prompt -> Output) or a simple classification task?
- Yes: Use Vanilla SDKs.

Complexity vs. Control: The Abstraction Tax

Every builder needs to understand the "Abstraction Tax."

High Abstraction (LangChain/LlamaIndex): You write less code to get started. You can swap LLM providers easily. However, you lose visibility. When an agent loops infinitely, tracing the error through 15 layers of framework code is a nightmare.
Low Abstraction (Vanilla): You write more boilerplate. You have to handle retries and parsing yourself. But, you know exactly what is being sent to the LLM. You have total control over latency and token usage.

Blueprint: Starter Folder Structures

Theory is fine, but let’s look at how these actually look in a repository. Organizing your code correctly from Day 1 saves you a refactor on Day 7.

Scenario A: The Autonomous Agent (LangGraph)

Use case: A research assistant that searches the web, summarizes findings, and writes a report.

/src
  /agents
    /researcher
      graph.py       # Defines the state graph (nodes & edges)
      state.py       # Pydantic models for AgentState
      prompts.py     # System prompts specific to this agent
  /tools
    search.py        # Tavily/Serper integration
    scraper.py       # Web scraping logic
  /checkpoints       # Persistence layer for long-running threads
  main.py            # Entry point invoking the graph

Why this works: LangGraph demands you think in "State." By separating the graph logic from the tools, you can modularize the agent's capabilities.

Scenario B: The Knowledge Engine (LlamaIndex)

Use case: A legal document Q&A bot over 5,000 PDF contracts.

/src
  /ingestion
    loader.py        # LlamaParse or DirectoryReader logic
    pipeline.py      # Node parsing and metadata extraction
  /indices
    vector_store.py  # Pinecone/Qdrant setup
    retriever.py     # Custom retrieval logic (e.g., hybrid search)
  /engine
    chat.py          # The chat engine wrapper with memory
  config.py          # Embedding model settings

Why this works: In RAG, 80% of the battle is ingestion. This structure isolates the "ETL" (Extract, Transform, Load) pipeline from the query logic.

Scenario C: The Micro-Tool (Vanilla)

Use case: An email generator that takes bullet points and outputs a formatted HTML email.

/src
  /services
    llm_client.py    # Wrapper around OpenAI/Anthropic SDK
  /prompts
    email_templates.py  # Raw string templates with f-strings
  /utils
    parser.py        # Regex or JSON extraction logic
  app.py             # FastAPI/Streamlit entry point

Why this works: It’s flat and transparent. You don't need a `VectorStoreIndex` to write an email. You need a good prompt and a reliable API client.

The Hybrid Approach: The Secret Weapon

Here is the reality senior engineers know: You don’t have to pick just one.

My favorite stack right now is the Hybrid Stack:

LlamaIndex for the data layer (ingestion, chunking, and retrieval).
LangChain/LangGraph for the orchestration (making decisions based on that data).
Vanilla for the final generation step where specific formatting is critical.

For example, use LlamaIndex to fetch relevant context from your database. Pass that context into a LangGraph node. Let the graph decide if the context is sufficient. If yes, generate the answer. If no, trigger a web search tool.

Final Verdict: Ship, Don't Stall

The ecosystem moves too fast for you to be a purist.

If you are building a prototype to validate an idea: Use LangChain. It has the most pre-built integrations. You can hack a feature together in an afternoon.

If you are optimizing for production reliability: Audit your code. Replace the "magic" chains with Vanilla code where possible to reduce latency and dependencies.

Pick the stack that matches your complexity, not the hype. Now, go build.

Comments

Loading comments...

2026-02-21

LangChain vs. LlamaIndex vs. Vanilla: The Builder's Stack Selection Matrix

7 min readEngineeringBuildAI AgentsLangChainLLMLlamaIndexAI DevelopmentSoftware Architecture

A comprehensive decision matrix for AI engineers choosing between LangChain, LlamaIndex, and DIY stacks. Includes architecture diagrams, folder structures, and complexity analysis.

If you hang out in AI engineering Discord servers long enough, you’ll see the same question asked ten times a day: "Should I use LangChain or just call the API directly?"

It’s the wrong question.

The right question is: Where does the complexity in your application live?

This guide is the decision matrix I use when scoping a new build. No marketing fluff—just trade-offs, architecture patterns, and the code structures that actually work in production.

The Landscape: Three distinct paths

Before we look at code, let’s define the three main contenders in the orchestration layer.

1. LangChain / LangGraph (The Generalist)

Best for: Agentic workflows, tool calling, and applications that need to switch context frequently.

2. LlamaIndex (The Data Expert)

Best for: RAG (Retrieval-Augmented Generation), dealing with messy data sources, and structured output.

3. Vanilla / DIY (The Surgeon)

Best for: Single-purpose features, high-performance endpoints, and learning the fundamentals.

The Decision Matrix: What to build?

I don't make decisions based on "vibes." I use a complexity heuristic. Here is the decision logic I apply to every new project:

🚀 The "What Stack?" Flowchart

Question 1: Does the app require looping, self-correction, or multi-step planning?
- Yes: Use LangGraph (LangChain).
- No: Go to Question 2.
Question 2: Is the primary value based on retrieving specific data from large, unorganized documents?
- Yes: Use LlamaIndex.
- No: Go to Question 3.
Question 3: Is this a linear process (Input -> Prompt -> Output) or a simple classification task?
- Yes: Use Vanilla SDKs.

Complexity vs. Control: The Abstraction Tax

Every builder needs to understand the "Abstraction Tax."

High Abstraction (LangChain/LlamaIndex): You write less code to get started. You can swap LLM providers easily. However, you lose visibility. When an agent loops infinitely, tracing the error through 15 layers of framework code is a nightmare.
Low Abstraction (Vanilla): You write more boilerplate. You have to handle retries and parsing yourself. But, you know exactly what is being sent to the LLM. You have total control over latency and token usage.

Blueprint: Starter Folder Structures

Theory is fine, but let’s look at how these actually look in a repository. Organizing your code correctly from Day 1 saves you a refactor on Day 7.

Scenario A: The Autonomous Agent (LangGraph)

Use case: A research assistant that searches the web, summarizes findings, and writes a report.

/src
  /agents
    /researcher
      graph.py       # Defines the state graph (nodes & edges)
      state.py       # Pydantic models for AgentState
      prompts.py     # System prompts specific to this agent
  /tools
    search.py        # Tavily/Serper integration
    scraper.py       # Web scraping logic
  /checkpoints       # Persistence layer for long-running threads
  main.py            # Entry point invoking the graph

Why this works: LangGraph demands you think in "State." By separating the graph logic from the tools, you can modularize the agent's capabilities.

Scenario B: The Knowledge Engine (LlamaIndex)

Use case: A legal document Q&A bot over 5,000 PDF contracts.

/src
  /ingestion
    loader.py        # LlamaParse or DirectoryReader logic
    pipeline.py      # Node parsing and metadata extraction
  /indices
    vector_store.py  # Pinecone/Qdrant setup
    retriever.py     # Custom retrieval logic (e.g., hybrid search)
  /engine
    chat.py          # The chat engine wrapper with memory
  config.py          # Embedding model settings

Why this works: In RAG, 80% of the battle is ingestion. This structure isolates the "ETL" (Extract, Transform, Load) pipeline from the query logic.

Scenario C: The Micro-Tool (Vanilla)

Use case: An email generator that takes bullet points and outputs a formatted HTML email.

/src
  /services
    llm_client.py    # Wrapper around OpenAI/Anthropic SDK
  /prompts
    email_templates.py  # Raw string templates with f-strings
  /utils
    parser.py        # Regex or JSON extraction logic
  app.py             # FastAPI/Streamlit entry point

Why this works: It’s flat and transparent. You don't need a `VectorStoreIndex` to write an email. You need a good prompt and a reliable API client.

The Hybrid Approach: The Secret Weapon

Here is the reality senior engineers know: You don’t have to pick just one.

My favorite stack right now is the Hybrid Stack:

LlamaIndex for the data layer (ingestion, chunking, and retrieval).
LangChain/LangGraph for the orchestration (making decisions based on that data).
Vanilla for the final generation step where specific formatting is critical.

Final Verdict: Ship, Don't Stall

Comments

Loading comments...

LangChain vs. LlamaIndex vs. Vanilla: The Builder's Stack Selection Matrix

The Landscape: Three distinct paths

1. LangChain / LangGraph (The Generalist)

2. LlamaIndex (The Data Expert)

3. Vanilla / DIY (The Surgeon)

The Decision Matrix: What to build?

🚀 The "What Stack?" Flowchart

Complexity vs. Control: The Abstraction Tax

Blueprint: Starter Folder Structures

Scenario A: The Autonomous Agent (LangGraph)

Scenario B: The Knowledge Engine (LlamaIndex)

Scenario C: The Micro-Tool (Vanilla)

The Hybrid Approach: The Secret Weapon

Final Verdict: Ship, Don't Stall

Comments

Add a comment

LangChain vs. LlamaIndex vs. Vanilla: The Builder's Stack Selection Matrix

The Landscape: Three distinct paths

1. LangChain / LangGraph (The Generalist)

2. LlamaIndex (The Data Expert)

3. Vanilla / DIY (The Surgeon)

The Decision Matrix: What to build?

🚀 The "What Stack?" Flowchart

Complexity vs. Control: The Abstraction Tax

Blueprint: Starter Folder Structures

Scenario A: The Autonomous Agent (LangGraph)

Scenario B: The Knowledge Engine (LlamaIndex)

Scenario C: The Micro-Tool (Vanilla)

The Hybrid Approach: The Secret Weapon

Final Verdict: Ship, Don't Stall

Comments

Add a comment