LangChain and LangGraph Practical Tutorial - From Basics to Agent Workflows
A hands-on tutorial covering LangChain core concepts (chains, prompts, retrievers), LangGraph state management and agent workflows, RAG implementation, tool integration, multi-agent patterns, and production deployment.
LangChain is the most widely adopted framework for building LLM-powered applications, and LangGraph extends it with stateful, graph-based agent workflows. This tutorial walks through both frameworks from foundational concepts to production deployment, with working code at every step.
1. LangChain Overview
What is LangChain?
LangChain is an open-source framework that simplifies the development of applications powered by large language models. Rather than making raw API calls and manually gluing components together, LangChain provides standardized abstractions for prompts, models, output parsing, retrieval, and chaining.
┌────────────────────────────────────────────────────┐
│ LangChain Ecosystem │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ LangChain │ │ LangGraph │ │ LangSmith │ │
│ │ Core │ │ │ │ │ │
│ │ Chains, │ │ Stateful │ │ Tracing, │ │
│ │ prompts, │ │ agents, │ │ evaluation │ │
│ │ LCEL │ │ cycles │ │ monitoring │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ └───────────────┼───────────────┘ │
│ ┌─────┴──────┐ │
│ │ LangServe │ │
│ │ REST API │ │
│ └────────────┘ │
└────────────────────────────────────────────────────┘
Ecosystem Components
| Component | Purpose | Key Features |
|---|---|---|
| LangChain Core | Foundation library | Chains, prompts, output parsers, LCEL |
| LangGraph | Agent orchestration | StateGraph, cycles, persistence, human-in-the-loop |
| LangSmith | Observability platform | Tracing, evaluation, prompt management |
| LangServe | Deployment | FastAPI-based REST API serving |
| Community | Third-party integrations | 700+ integrations (vector stores, LLMs, tools) |
When to Use LangChain
Good fit: RAG applications, multi-step chains with structured I/O, prototyping LLM workflows, projects needing many integrations, and agent systems requiring tool calling.
Consider alternatives when: you only need simple API calls (use the provider SDK directly), you need maximum performance with minimal overhead, or your application logic does not fit the chain/graph paradigm.
Installation
pip install langchain langchain-core langchain-community
pip install langchain-openai langchain-anthropic
pip install langgraph
pip install langchain-chroma chromadb langchain-text-splitters
pip install langserve fastapi uvicorn2. Core Concepts
The Runnable Interface
Every component in LangChain implements the Runnable interface with three standard invocation methods.
| Method | Description | Use Case |
|---|---|---|
invoke(input) | Process a single input synchronously | Simple single request |
stream(input) | Yield output chunks as generated | Real-time streaming responses |
batch(inputs) | Process multiple inputs in parallel | Bulk processing |
Each method has an async counterpart: ainvoke, astream, abatch.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o", temperature=0)
# invoke
response = model.invoke("Explain what LangChain is in one sentence.")
# stream
for chunk in model.stream("Explain what LangChain is in one sentence."):
print(chunk.content, end="", flush=True)
# batch
responses = model.batch(["What is LangChain?", "What is LangGraph?"])ChatModel
ChatModels are the primary LLM interface. They accept a list of messages and return an AI message.
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
model = ChatOpenAI(model="gpt-4o", temperature=0)
messages = [
SystemMessage(content="You are a helpful coding assistant."),
HumanMessage(content="Write a Python function that checks if a number is prime."),
]
response = model.invoke(messages)PromptTemplate
PromptTemplates define reusable prompt structures with variable placeholders.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert in {domain}. Answer concisely."),
("human", "{question}"),
])
# With conversation history support
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
])OutputParser
OutputParsers transform raw LLM text into structured data.
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
parser = StrOutputParser() # Most common - extracts string content
class BookRecommendation(BaseModel):
title: str = Field(description="Book title")
author: str = Field(description="Author name")
reason: str = Field(description="Why this book is recommended")
json_parser = JsonOutputParser(pydantic_object=BookRecommendation)LCEL (LangChain Expression Language)
LCEL is the declarative composition syntax that connects Runnables using the pipe (|) operator.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()
result = chain.invoke({"question": "What is LCEL?"})
for chunk in chain.stream({"question": "What is LCEL?"}):
print(chunk, end="", flush=True)Note: LCEL is not just syntactic sugar. It automatically handles streaming propagation, async support, batch parallelism, and tracing through the entire chain.
3. Chains and LCEL
Building Chains with the Pipe Operator
The | operator creates a sequential pipeline where each component's output becomes the next component's input.
translate_chain = (
ChatPromptTemplate.from_messages([
("system", "Translate the following text to {language}."),
("human", "{text}"),
])
| ChatOpenAI(model="gpt-4o", temperature=0)
| StrOutputParser()
)
result = translate_chain.invoke({"language": "Korean", "text": "LangChain is great."})RunnablePassthrough
Passes input through unchanged, optionally adding extra fields.
from langchain_core.runnables import RunnablePassthrough
chain = (
RunnablePassthrough.assign(word_count=lambda x: len(x["text"].split()))
| ChatPromptTemplate.from_messages([
("system", "Summarize the following text ({word_count} words) in 2 sentences."),
("human", "{text}"),
])
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)RunnableLambda
Wraps any Python function as a Runnable for custom logic in chains.
from langchain_core.runnables import RunnableLambda
def preprocess(input_dict: dict) -> dict:
return {"text": input_dict["text"].strip().lower()}
def postprocess(output: str) -> dict:
return {"summary": output, "length": len(output)}
chain = (
RunnableLambda(preprocess)
| ChatPromptTemplate.from_messages([("human", "Summarize: {text}")])
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
| RunnableLambda(postprocess)
)RunnableParallel
Executes multiple chains concurrently and collects outputs into a dictionary.
from langchain_core.runnables import RunnableParallel
model = ChatOpenAI(model="gpt-4o", temperature=0)
analysis_chain = RunnableParallel(
summary=(
ChatPromptTemplate.from_messages([("human", "Summarize: {text}")])
| model | StrOutputParser()
),
keywords=(
ChatPromptTemplate.from_messages([("human", "Extract 5 keywords from: {text}")])
| model | StrOutputParser()
),
sentiment=(
ChatPromptTemplate.from_messages([("human", "Analyze the sentiment of: {text}")])
| model | StrOutputParser()
),
)
result = analysis_chain.invoke({"text": "LangChain makes building LLM apps easy."})Branching with RunnableBranch
Routes input to different chains based on conditions.
from langchain_core.runnables import RunnableBranch
branch_chain = RunnableBranch(
(
lambda x: x["language"] == "technical",
ChatPromptTemplate.from_messages([
("system", "You are a technical expert."), ("human", "{question}"),
]) | model | StrOutputParser()
),
(
lambda x: x["language"] == "simple",
ChatPromptTemplate.from_messages([
("system", "Explain as if talking to a 10-year-old."), ("human", "{question}"),
]) | model | StrOutputParser()
),
# Default branch
ChatPromptTemplate.from_messages([("human", "{question}")]) | model | StrOutputParser()
)Fallbacks
Define backup chains that run when the primary chain fails.
primary = ChatOpenAI(model="gpt-4o")
fallback = ChatAnthropic(model="claude-sonnet-4-20250514")
model_with_fallback = primary.with_fallbacks([fallback])
chain = (
ChatPromptTemplate.from_messages([("human", "{question}")])
| model_with_fallback
| StrOutputParser()
)Note: Fallbacks handle transient API errors, rate limits, and provider outages without disrupting the user experience.
4. RAG with LangChain
RAG Pipeline Architecture
Indexing: Documents → Text Splitter → Embeddings → Vector Store
Query: User Query → Retriever → Context + Query → LLM → Response
Step 1: Load Documents
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader, DirectoryLoader
pdf_docs = PyPDFLoader("data/report.pdf").load()
web_docs = WebBaseLoader("https://example.com/article").load()
all_docs = DirectoryLoader("data/docs/", glob="**/*.txt").load()Step 2: Split Text
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(all_docs)Step 3: Create Embeddings and Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db",
collection_name="my_documents",
)Step 4: Create Retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5},
)
# MMR retriever for diverse results
mmr_retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7},
)Step 5: Build the RAG Chain
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs):
return "\n\n---\n\n".join(doc.page_content for doc in docs)
rag_prompt = ChatPromptTemplate.from_messages([
("system", """Answer based only on the following context.
If the context is insufficient, say so.
Context:
{context}"""),
("human", "{question}"),
])
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| ChatOpenAI(model="gpt-4o", temperature=0)
| StrOutputParser()
)
answer = rag_chain.invoke("What are the key findings?")Complete Working Example
"""Complete RAG pipeline: load, split, embed, store, retrieve, generate."""
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
docs = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(docs)
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings(model="text-embedding-3-small"))
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| ChatPromptTemplate.from_messages([
("system", "Answer based on the context below.\n\nContext:\n{context}"),
("human", "{question}"),
])
| ChatOpenAI(model="gpt-4o", temperature=0)
| StrOutputParser()
)
print(rag_chain.invoke("What are the key components of an AI agent?"))Note: For production RAG, consider adding a reranker (e.g., Cohere Rerank) between retrieval and generation, and hybrid search combining BM25 keyword search with semantic search.
5. LangGraph Fundamentals
What is LangGraph?
LangGraph is a framework for building stateful, multi-step agent workflows as directed graphs. While LangChain chains are linear pipelines, LangGraph supports cycles, conditional branching, and persistent state -- making it ideal for agent loops.
LangChain Chains vs LangGraph
| Aspect | LangChain Chains (LCEL) | LangGraph |
|---|---|---|
| Execution flow | Linear (DAG) | Cycles allowed |
| State management | Input/output only | Explicit state object |
| Control flow | Pipe operator, branch | Conditional edges, loops |
| Persistence | None built-in | Built-in checkpointing |
| Human-in-the-loop | Not supported | interrupt_before / interrupt_after |
| Best for | Simple pipelines, RAG | Agents, multi-step reasoning |
StateGraph Core Concepts
┌────────────────────────────────────────┐
│ StateGraph │
│ │
│ ┌────────┐ edge ┌────────┐ │
│ │ Node A │────────→│ Node B │ │
│ └────────┘ └───┬────┘ │
│ conditional │
│ ┌─────┴─────┐ │
│ ▼ ▼ │
│ ┌────────┐ ┌────────┐ │
│ │ Node C │ │ Node D │ │
│ └────────┘ └────────┘ │
│ │
│ State: TypedDict flowing through all │
└────────────────────────────────────────┘
- Nodes: Python functions that receive the current state, perform work, and return state updates.
- Edges: Connections defining execution order.
- Conditional Edges: Routing functions that decide the next node based on state.
Basic LangGraph Example
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from operator import add
class State(TypedDict):
messages: Annotated[list[str], add] # Accumulate via add reducer
current_step: str
def step_one(state: State) -> dict:
return {"messages": ["Step 1 completed"], "current_step": "step_one"}
def step_two(state: State) -> dict:
return {"messages": ["Step 2 completed"], "current_step": "step_two"}
graph = StateGraph(State)
graph.add_node("step_one", step_one)
graph.add_node("step_two", step_two)
graph.add_edge(START, "step_one")
graph.add_edge("step_one", "step_two")
graph.add_edge("step_two", END)
app = graph.compile()
result = app.invoke({"messages": [], "current_step": ""})
print(result["messages"]) # ["Step 1 completed", "Step 2 completed"]Conditional Edges
from typing import Literal
class State(TypedDict):
query: str
query_type: str
response: str
def classify_query(state: State) -> dict:
query = state["query"].lower()
if "code" in query:
return {"query_type": "coding"}
elif "math" in query:
return {"query_type": "math"}
return {"query_type": "general"}
def handle_coding(state: State) -> dict:
return {"response": f"[Coding Expert] {state['query']}"}
def handle_math(state: State) -> dict:
return {"response": f"[Math Expert] {state['query']}"}
def handle_general(state: State) -> dict:
return {"response": f"[General] {state['query']}"}
def route_query(state: State) -> Literal["handle_coding", "handle_math", "handle_general"]:
return f"handle_{state['query_type']}"
graph = StateGraph(State)
graph.add_node("classify", classify_query)
graph.add_node("handle_coding", handle_coding)
graph.add_node("handle_math", handle_math)
graph.add_node("handle_general", handle_general)
graph.add_edge(START, "classify")
graph.add_conditional_edges("classify", route_query)
graph.add_edge("handle_coding", END)
graph.add_edge("handle_math", END)
graph.add_edge("handle_general", END)
app = graph.compile()
result = app.invoke({"query": "Write a sorting algorithm", "query_type": "", "response": ""})Note: The
Annotated[list, add]pattern is the most common reducer. It ensures every node's output messages are appended rather than replaced -- essential for agent conversation history.
6. Building Agents with LangGraph
The Agent Loop
An agent dynamically chooses its next step based on observations, following the ReAct pattern: Reason, Act, Observe, repeat.
┌────────────────────────────┐
│ Agent Loop │
│ │
│ ┌─────┐ ┌─────────┐ │
│ │ LLM │───→│ Decide │ │
│ └──▲──┘ └────┬────┘ │
│ │ ┌────▼────┐ │
│ │ │ Tool │ │
│ │ │ Call │ │
│ │ └────┬────┘ │
│ │ ┌────▼────┐ │
│ └───────│ Observe │ │
│ └─────────┘ │
│ Exit: LLM returns final │
│ answer without tool call │
└────────────────────────────┘
ReAct Agent with create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
@tool
def search_web(query: str) -> str:
"""Search the web for current information."""
return f"Search results for '{query}': LangChain is a framework for LLM apps..."
@tool
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"The weather in {city} is 22C and sunny."
agent = create_react_agent(
model=ChatOpenAI(model="gpt-4o", temperature=0),
tools=[search_web, calculator, get_weather],
prompt="You are a helpful assistant. Use tools when needed.",
)
result = agent.invoke({
"messages": [{"role": "user", "content": "What is 15 * 37 and what's the weather in Seoul?"}]
})
for msg in result["messages"]:
print(f"[{msg.__class__.__name__}] {msg.content}")Custom Agent with Tool Calling
from typing import TypedDict, Annotated, Literal
from langchain_core.messages import HumanMessage, ToolMessage, BaseMessage
from langgraph.graph import StateGraph, START, END
from operator import add
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add]
@tool
def lookup_database(query: str) -> str:
"""Look up information in the internal database."""
db = {"langchain": "LangChain is a framework for LLM apps.",
"langgraph": "LangGraph builds stateful agent workflows."}
return db.get(query.lower().strip(), f"No results for '{query}'.")
tools = [lookup_database]
tool_map = {t.name: t for t in tools}
model = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)
def call_model(state: AgentState) -> dict:
return {"messages": [model.invoke(state["messages"])]}
def call_tools(state: AgentState) -> dict:
last_message = state["messages"][-1]
results = []
for tc in last_message.tool_calls:
result = tool_map[tc["name"]].invoke(tc["args"])
results.append(ToolMessage(content=str(result), tool_call_id=tc["id"]))
return {"messages": results}
def should_continue(state: AgentState) -> Literal["call_tools", "end"]:
last = state["messages"][-1]
if hasattr(last, "tool_calls") and last.tool_calls:
return "call_tools"
return "end"
graph = StateGraph(AgentState)
graph.add_node("call_model", call_model)
graph.add_node("call_tools", call_tools)
graph.add_edge(START, "call_model")
graph.add_conditional_edges("call_model", should_continue, {"call_tools": "call_tools", "end": END})
graph.add_edge("call_tools", "call_model") # Loop back
agent = graph.compile()
result = agent.invoke({"messages": [HumanMessage(content="Look up 'langchain' in the database.")]})Adding Memory with Checkpointing
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
agent_with_memory = graph.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "user-123"}}
result1 = agent_with_memory.invoke(
{"messages": [HumanMessage(content="Look up langchain")]}, config=config,
)
# Second turn - agent remembers conversation
result2 = agent_with_memory.invoke(
{"messages": [HumanMessage(content="Summarize what you found.")]}, config=config,
)Note:
MemorySaverstores state in memory (lost on restart). For production, useSqliteSaverorPostgresSaverfor durable persistence.
7. Multi-Agent Workflows
Why Multi-Agent?
Complex tasks benefit from decomposing work across multiple specialized agents. Each agent has its own tools, prompts, and expertise. A supervisor delegates tasks and synthesizes results.
┌─────────────────────────────────────┐
│ Supervisor Pattern │
│ │
│ ┌──────────────────┐ │
│ │ Supervisor │ │
│ └──────┬───────────┘ │
│ ┌────┼────┐ │
│ ▼ ▼ ▼ │
│ ┌─────┐┌─────┐┌─────┐ │
│ │Rsrch││Code ││Write│ │
│ │Agent││Agent││Agent│ │
│ └─────┘└─────┘└─────┘ │
└─────────────────────────────────────┘
Supervisor + Worker Implementation
from typing import TypedDict, Annotated, Literal
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, BaseMessage, SystemMessage
from langgraph.graph import StateGraph, START, END
from operator import add
class MultiAgentState(TypedDict):
messages: Annotated[list[BaseMessage], add]
next_agent: str
research_result: str
code_result: str
final_answer: str
model = ChatOpenAI(model="gpt-4o", temperature=0)
def research_agent(state: MultiAgentState) -> dict:
response = model.invoke([
SystemMessage(content="You are a research specialist. Gather relevant information."),
HumanMessage(content=f"Research: {state['messages'][-1].content}"),
])
return {"research_result": response.content, "messages": [response]}
def code_agent(state: MultiAgentState) -> dict:
response = model.invoke([
SystemMessage(content="You are a coding specialist. Write clean code."),
HumanMessage(content=f"Based on research:\n{state.get('research_result', '')}\n\nImplement."),
])
return {"code_result": response.content, "messages": [response]}
def writing_agent(state: MultiAgentState) -> dict:
response = model.invoke([
SystemMessage(content="You are a technical writer."),
HumanMessage(content=f"Research:\n{state.get('research_result', '')}\n"
f"Code:\n{state.get('code_result', '')}\n\nWrite documentation."),
])
return {"final_answer": response.content, "messages": [response]}
def supervisor(state: MultiAgentState) -> dict:
response = model.invoke([
SystemMessage(content="""Decide which agent to call next.
Agents: research, code, writing, FINISH. Respond with only the name."""),
HumanMessage(content=f"Task: {state['messages'][0].content}\n"
f"Research: {'done' if state.get('research_result') else 'pending'}\n"
f"Code: {'done' if state.get('code_result') else 'pending'}\n"
f"Final: {'done' if state.get('final_answer') else 'pending'}"),
])
return {"next_agent": response.content.strip().lower()}
def route_supervisor(state: MultiAgentState) -> Literal["research", "code", "writing", "end"]:
n = state.get("next_agent", "")
if "research" in n: return "research"
elif "code" in n: return "code"
elif "writing" in n: return "writing"
return "end"
graph = StateGraph(MultiAgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("research", research_agent)
graph.add_node("code", code_agent)
graph.add_node("writing", writing_agent)
graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", route_supervisor,
{"research": "research", "code": "code", "writing": "writing", "end": END})
graph.add_edge("research", "supervisor")
graph.add_edge("code", "supervisor")
graph.add_edge("writing", "supervisor")
multi_agent = graph.compile()
result = multi_agent.invoke({
"messages": [HumanMessage(content="Create a Python rate limiter with token bucket")],
"next_agent": "", "research_result": "", "code_result": "", "final_answer": "",
})Sequential Pipeline
For fixed execution order, use a simple sequential pattern.
graph = StateGraph(MultiAgentState)
graph.add_node("research", research_agent)
graph.add_node("code", code_agent)
graph.add_node("writing", writing_agent)
graph.add_edge(START, "research")
graph.add_edge("research", "code")
graph.add_edge("code", "writing")
graph.add_edge("writing", END)
pipeline = graph.compile()Parallel Execution
When agents are independent, run them concurrently.
from langchain_core.runnables import RunnableParallel
parallel_research = RunnableParallel(
web=web_research_chain,
db=database_research_chain,
api=api_research_chain,
)
results = parallel_research.invoke({"query": "LangChain best practices"})Human-in-the-Loop
LangGraph supports human intervention using interrupt_before or interrupt_after.
from langgraph.checkpoint.memory import MemorySaver
agent = graph.compile(
checkpointer=MemorySaver(),
interrupt_before=["call_tools"], # Pause before tool execution
)
config = {"configurable": {"thread_id": "review-123"}}
# First invocation - pauses before tools
result = agent.invoke(
{"messages": [HumanMessage(content="Delete all old records")]}, config=config,
)
# Inspect planned action
print(f"Agent wants to call: {result['messages'][-1].tool_calls}")
# Approve and resume
result = agent.invoke(None, config=config)
# Or modify before resuming
agent.update_state(config, {"messages": [HumanMessage(content="Only delete records > 1 year old")]})
result = agent.invoke(None, config=config)Note: Human-in-the-loop is critical for high-stakes operations like database modifications, financial transactions, or sending communications.
8. LangSmith Observability
What is LangSmith?
LangSmith is LangChain's observability platform for tracing, evaluating, and monitoring LLM applications. It provides visibility into every step -- prompts sent, LLM responses, latency, and cost.
Setup
pip install langsmith
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="ls__your_api_key_here"
export LANGCHAIN_PROJECT="my-langchain-project"Note: Once environment variables are set, all LangChain and LangGraph operations are automatically traced with no code changes needed.
Tracing
# Automatic tracing - just invoke your chain
result = chain.invoke(
{"question": "What is LangSmith?"},
config={"run_name": "langsmith-demo"}, # Custom name for identification
)Each trace shows full input/output at every step, latency breakdown, token usage, cost estimates, and error details.
Custom Tracing with Decorators
from langsmith import traceable
@traceable(name="my-rag-pipeline")
def rag_query(question: str) -> str:
docs = retriever.invoke(question)
context = "\n".join(doc.page_content for doc in docs)
return chain.invoke({"context": context, "question": question})Evaluation
from langsmith import Client
from langsmith.evaluation import evaluate
client = Client()
dataset = client.create_dataset("qa-eval-dataset")
client.create_examples(
inputs=[{"question": "What is LangChain?"}, {"question": "What is LangGraph?"}],
outputs=[
{"answer": "A framework for building LLM applications."},
{"answer": "A library for stateful agent workflows."},
],
dataset_id=dataset.id,
)
def predict(inputs: dict) -> dict:
return {"answer": chain.invoke(inputs)}
results = evaluate(predict, data=dataset.name, evaluators=["qa", "relevance"],
experiment_prefix="rag-v1")Prompt Hub
from langchain import hub
prompt = hub.pull("rlm/rag-prompt") # Pull shared prompt
hub.push("my-org/my-prompt", prompt, new_repo_is_public=False) # Push your own9. Production Deployment
LangServe (FastAPI)
LangServe exposes any LangChain Runnable as a REST API with automatic OpenAPI docs and streaming.
# server.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = (
ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}"),
])
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)
app = FastAPI(title="LangChain API", version="1.0")
add_routes(app, chain, path="/chat")
# Run: uvicorn server:app --host 0.0.0.0 --port 8000# Auto-generated endpoints:
# POST /chat/invoke - Single invocation
# POST /chat/batch - Batch invocation
# POST /chat/stream - Streaming
# GET /chat/playground - Interactive UIClient:
from langserve import RemoteRunnable
chain = RemoteRunnable("http://localhost:8000/chat")
result = chain.invoke({"question": "What is LangChain?"})Error Handling
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
# Retry with backoff
model = ChatOpenAI(model="gpt-4o", max_retries=3, request_timeout=30)
# Fallback models
resilient_model = ChatOpenAI(model="gpt-4o").with_fallbacks([
ChatAnthropic(model="claude-sonnet-4-20250514")
])
# Error handling in LangGraph nodes
def safe_node(state: dict) -> dict:
try:
result = model.invoke(state["messages"])
return {"messages": [result]}
except Exception as e:
logger.error(f"Node failed: {e}")
return {"messages": [AIMessage(content=f"Error: {e}. Please try again.")]}Streaming
from fastapi.responses import StreamingResponse
import json
@app.post("/stream")
async def stream_response(request: dict):
async def generate():
async for chunk in chain.astream(request):
yield f"data: {json.dumps({'content': chunk})}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
# LangGraph event streaming
async for event in agent.astream_events(
{"messages": [HumanMessage(content="Search for LangChain")]}, version="v2",
):
if event["event"] == "on_chat_model_stream":
print(event["data"]["chunk"].content, end="")Caching
from langchain_core.globals import set_llm_cache
from langchain_community.cache import InMemoryCache, SQLiteCache
set_llm_cache(InMemoryCache()) # Fast, in-memory
set_llm_cache(SQLiteCache(database_path=".langchain.db")) # Persistent
# Semantic cache for similar queries
from langchain_community.cache import RedisSemanticCache
set_llm_cache(RedisSemanticCache(
redis_url="redis://localhost:6379",
embedding=OpenAIEmbeddings(),
score_threshold=0.95,
))Rate Limiting
import asyncio
from collections import deque
from time import time
class RateLimiter:
def __init__(self, max_calls: int, time_window: float = 60.0):
self.max_calls = max_calls
self.time_window = time_window
self.calls = deque()
async def acquire(self):
now = time()
while self.calls and self.calls[0] < now - self.time_window:
self.calls.popleft()
if len(self.calls) >= self.max_calls:
await asyncio.sleep(self.calls[0] + self.time_window - now)
self.calls.append(time())
rate_limiter = RateLimiter(max_calls=50, time_window=60)
async def rate_limited_invoke(chain, input_data):
await rate_limiter.acquire()
return await chain.ainvoke(input_data)Monitoring Best Practices
| Area | Tool / Approach | What to Track |
|---|---|---|
| Tracing | LangSmith | Every invocation, latency, tokens |
| Metrics | Prometheus + Grafana | Request rate, error rate, p50/p95 latency |
| Logging | Structured JSON logs | Input/output summaries, errors, tool calls |
| Alerts | PagerDuty / Opsgenie | Error rate spikes, latency, cost anomalies |
| Cost | LangSmith / custom | Token usage per chain, daily cost trends |
| Quality | LangSmith evaluations | Correctness, relevance, hallucination rate |
Deployment Checklist
- Environment variables: API keys in secrets manager, not in code
- Rate limiting: Configured within provider quotas
- Fallbacks: Backup model for critical chains
- Caching: Enabled for repeated queries
- Streaming: Implemented for user-facing endpoints
- Error handling: Graceful degradation on API failures
- Timeouts: Set at HTTP and LLM client levels
- Tracing: LangSmith or equivalent enabled
- Evaluation: Baseline metrics established
- Cost monitoring: Alerts for unexpected token usage
- Input validation: Sanitized before reaching prompts
- Prompt injection defense: System prompts hardened
References
- LangChain Documentation: https://python.langchain.com/docs/
- LangGraph Documentation: https://langchain-ai.github.io/langgraph/
- LangSmith Documentation: https://docs.smith.langchain.com/
- LangServe Documentation: https://python.langchain.com/docs/langserve/
- LangChain GitHub: https://github.com/langchain-ai/langchain
- LangGraph GitHub: https://github.com/langchain-ai/langgraph
- LCEL Conceptual Guide: https://python.langchain.com/docs/concepts/lcel/
- LangChain RAG Tutorial: https://python.langchain.com/docs/tutorials/rag/
- LangGraph Agent Tutorial: https://langchain-ai.github.io/langgraph/tutorials/introduction/
— Data Dynamics Engineering Team