
LangChain vs. LlamaIndex vs. Vanilla: The Builder's Stack Selection Matrix
A comprehensive decision matrix for AI engineers choosing between LangChain, LlamaIndex, and DIY stacks. Includes architecture diagrams, folder structures, and complexity analysis.
If you hang out in AI engineering Discord servers long enough, youâll see the same question asked ten times a day: "Should I use LangChain or just call the API directly?"
Itâs the wrong question.
The right question is: Where does the complexity in your application live?
As builders, we often fall into the trap of "Resume Driven Development." We pick tools because they are trending, not because they solve the specific friction point of our architecture. Iâve built micro-SaaS tools using raw OpenAI calls, and Iâve built complex agentic workflows using LangGraph. Neither approach is universally correct.
This guide is the decision matrix I use when scoping a new build. No marketing fluffâjust trade-offs, architecture patterns, and the code structures that actually work in production.
The Landscape: Three distinct paths
Before we look at code, letâs define the three main contenders in the orchestration layer.
1. LangChain / LangGraph (The Generalist)
Best for: Agentic workflows, tool calling, and applications that need to switch context frequently.
LangChain is the "Swiss Army Knife." It provides standard interfaces for everything (memory, prompt templates, vector stores). However, its biggest value add currently is LangGraph. If you are building stateful agents that need loops, conditional branching, and human-in-the-loop checkpoints, building this from scratch is painful. LangGraph solves the "control flow" problem.
2. LlamaIndex (The Data Expert)
Best for: RAG (Retrieval-Augmented Generation), dealing with messy data sources, and structured output.
If your app is essentially "Chat with my PDF/Database/Notion," LlamaIndex is usually the superior choice. Their data ingestion pipeline (LlamaHub) and indexing strategies are far more mature than LangChainâs equivalents. They treat data retrieval as a first-class citizen, not an afterthought.
3. Vanilla / DIY (The Surgeon)
Best for: Single-purpose features, high-performance endpoints, and learning the fundamentals.
"Vanilla" means using the official SDKs (OpenAI, Anthropic, Mistral) directly. This is the path of least resistance for debugging. When you use a framework, you inherit its abstractions. When those abstractions break, you are debugging the framework, not your code. For simple prompts or linear chains, frameworks add unnecessary latency and complexity.
The Decision Matrix: What to build?
I don't make decisions based on "vibes." I use a complexity heuristic. Here is the decision logic I apply to every new project:
đ The "What Stack?" Flowchart
- Question 1: Does the app require looping, self-correction, or multi-step planning?
- Yes: Use LangGraph (LangChain).
- No: Go to Question 2.
- Question 2: Is the primary value based on retrieving specific data from large, unorganized documents?
- Yes: Use LlamaIndex.
- No: Go to Question 3.
- Question 3: Is this a linear process (Input -> Prompt -> Output) or a simple classification task?
- Yes: Use Vanilla SDKs.
Complexity vs. Control: The Abstraction Tax
Every builder needs to understand the "Abstraction Tax."
- High Abstraction (LangChain/LlamaIndex): You write less code to get started. You can swap LLM providers easily. However, you lose visibility. When an agent loops infinitely, tracing the error through 15 layers of framework code is a nightmare.
- Low Abstraction (Vanilla): You write more boilerplate. You have to handle retries and parsing yourself. But, you know exactly what is being sent to the LLM. You have total control over latency and token usage.
Blueprint: Starter Folder Structures
Theory is fine, but letâs look at how these actually look in a repository. Organizing your code correctly from Day 1 saves you a refactor on Day 7.
Scenario A: The Autonomous Agent (LangGraph)
Use case: A research assistant that searches the web, summarizes findings, and writes a report.
/src
/agents
/researcher
graph.py # Defines the state graph (nodes & edges)
state.py # Pydantic models for AgentState
prompts.py # System prompts specific to this agent
/tools
search.py # Tavily/Serper integration
scraper.py # Web scraping logic
/checkpoints # Persistence layer for long-running threads
main.py # Entry point invoking the graph
Why this works: LangGraph demands you think in "State." By separating the graph logic from the tools, you can modularize the agent's capabilities.
Scenario B: The Knowledge Engine (LlamaIndex)
Use case: A legal document Q&A bot over 5,000 PDF contracts.
/src
/ingestion
loader.py # LlamaParse or DirectoryReader logic
pipeline.py # Node parsing and metadata extraction
/indices
vector_store.py # Pinecone/Qdrant setup
retriever.py # Custom retrieval logic (e.g., hybrid search)
/engine
chat.py # The chat engine wrapper with memory
config.py # Embedding model settings
Why this works: In RAG, 80% of the battle is ingestion. This structure isolates the "ETL" (Extract, Transform, Load) pipeline from the query logic.
Scenario C: The Micro-Tool (Vanilla)
Use case: An email generator that takes bullet points and outputs a formatted HTML email.
/src
/services
llm_client.py # Wrapper around OpenAI/Anthropic SDK
/prompts
email_templates.py # Raw string templates with f-strings
/utils
parser.py # Regex or JSON extraction logic
app.py # FastAPI/Streamlit entry point
Why this works: Itâs flat and transparent. You don't need a `VectorStoreIndex` to write an email. You need a good prompt and a reliable API client.
The Hybrid Approach: The Secret Weapon
Here is the reality senior engineers know: You donât have to pick just one.
My favorite stack right now is the Hybrid Stack:
- LlamaIndex for the data layer (ingestion, chunking, and retrieval).
- LangChain/LangGraph for the orchestration (making decisions based on that data).
- Vanilla for the final generation step where specific formatting is critical.
For example, use LlamaIndex to fetch relevant context from your database. Pass that context into a LangGraph node. Let the graph decide if the context is sufficient. If yes, generate the answer. If no, trigger a web search tool.
Final Verdict: Ship, Don't Stall
The ecosystem moves too fast for you to be a purist.
If you are building a prototype to validate an idea: Use LangChain. It has the most pre-built integrations. You can hack a feature together in an afternoon.
If you are optimizing for production reliability: Audit your code. Replace the "magic" chains with Vanilla code where possible to reduce latency and dependencies.
Pick the stack that matches your complexity, not the hype. Now, go build.
Comments
Loading comments...