langchainlanggraphllmragai-agentpythonai

LangChain and LangGraph Practical Tutorial - From Basics to Agent Workflows

A hands-on tutorial covering LangChain core concepts (chains, prompts, retrievers), LangGraph state management and agent workflows, RAG implementation, tool integration, multi-agent patterns, and production deployment.

Data DynamicsApril 16, 202618 min read

LangChain is the most widely adopted framework for building LLM-powered applications, and LangGraph extends it with stateful, graph-based agent workflows. This tutorial walks through both frameworks from foundational concepts to production deployment, with working code at every step.

1. LangChain Overview

What is LangChain?

LangChain is an open-source framework that simplifies the development of applications powered by large language models. Rather than making raw API calls and manually gluing components together, LangChain provides standardized abstractions for prompts, models, output parsing, retrieval, and chaining.

Loading diagram…

Ecosystem Components

Component	Purpose	Key Features
LangChain Core	Foundation library	Chains, prompts, output parsers, LCEL
LangGraph	Agent orchestration	StateGraph, cycles, persistence, human-in-the-loop
LangSmith	Observability platform	Tracing, evaluation, prompt management
LangServe	Deployment	FastAPI-based REST API serving
Community	Third-party integrations	700+ integrations (vector stores, LLMs, tools)

When to Use LangChain

Good fit: RAG applications, multi-step chains with structured I/O, prototyping LLM workflows, projects needing many integrations, and agent systems requiring tool calling.

Consider alternatives when: you only need simple API calls (use the provider SDK directly), you need maximum performance with minimal overhead, or your application logic does not fit the chain/graph paradigm.

Installation

pip install langchain langchain-core langchain-community
pip install langchain-openai langchain-anthropic
pip install langgraph
pip install langchain-chroma chromadb langchain-text-splitters
pip install langserve fastapi uvicorn

2. Core Concepts

The Runnable Interface

Every component in LangChain implements the Runnable interface with three standard invocation methods.

Method	Description	Use Case
`invoke(input)`	Process a single input synchronously	Simple single request
`stream(input)`	Yield output chunks as generated	Real-time streaming responses
`batch(inputs)`	Process multiple inputs in parallel	Bulk processing

Each method has an async counterpart: ainvoke, astream, abatch.

from langchain_openai import ChatOpenAI
 
model = ChatOpenAI(model="gpt-4o", temperature=0)
 
# invoke
response = model.invoke("Explain what LangChain is in one sentence.")
 
# stream
for chunk in model.stream("Explain what LangChain is in one sentence."):
    print(chunk.content, end="", flush=True)
 
# batch
responses = model.batch(["What is LangChain?", "What is LangGraph?"])

ChatModel

ChatModels are the primary LLM interface. They accept a list of messages and return an AI message.

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
 
model = ChatOpenAI(model="gpt-4o", temperature=0)
 
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Write a Python function that checks if a number is prime."),
]
response = model.invoke(messages)

PromptTemplate

PromptTemplates define reusable prompt structures with variable placeholders.

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in {domain}. Answer concisely."),
    ("human", "{question}"),
])
 
# With conversation history support
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

OutputParser

OutputParsers transform raw LLM text into structured data.

from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
 
parser = StrOutputParser()  # Most common - extracts string content
 
class BookRecommendation(BaseModel):
    title: str = Field(description="Book title")
    author: str = Field(description="Author name")
    reason: str = Field(description="Why this book is recommended")
 
json_parser = JsonOutputParser(pydantic_object=BookRecommendation)

LCEL (LangChain Expression Language)

LCEL is the declarative composition syntax that connects Runnables using the pipe (|) operator.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
 
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{question}"),
])
 
chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()
 
result = chain.invoke({"question": "What is LCEL?"})
 
for chunk in chain.stream({"question": "What is LCEL?"}):
    print(chunk, end="", flush=True)

Note: LCEL is not just syntactic sugar. It automatically handles streaming propagation, async support, batch parallelism, and tracing through the entire chain.

3. Chains and LCEL

Building Chains with the Pipe Operator

The | operator creates a sequential pipeline where each component's output becomes the next component's input.

translate_chain = (
    ChatPromptTemplate.from_messages([
        ("system", "Translate the following text to {language}."),
        ("human", "{text}"),
    ])
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)
 
result = translate_chain.invoke({"language": "Korean", "text": "LangChain is great."})

RunnablePassthrough

Passes input through unchanged, optionally adding extra fields.

from langchain_core.runnables import RunnablePassthrough
 
chain = (
    RunnablePassthrough.assign(word_count=lambda x: len(x["text"].split()))
    | ChatPromptTemplate.from_messages([
        ("system", "Summarize the following text ({word_count} words) in 2 sentences."),
        ("human", "{text}"),
    ])
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

RunnableLambda

Wraps any Python function as a Runnable for custom logic in chains.

from langchain_core.runnables import RunnableLambda
 
def preprocess(input_dict: dict) -> dict:
    return {"text": input_dict["text"].strip().lower()}
 
def postprocess(output: str) -> dict:
    return {"summary": output, "length": len(output)}
 
chain = (
    RunnableLambda(preprocess)
    | ChatPromptTemplate.from_messages([("human", "Summarize: {text}")])
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
    | RunnableLambda(postprocess)
)

RunnableParallel

Executes multiple chains concurrently and collects outputs into a dictionary.

from langchain_core.runnables import RunnableParallel
 
model = ChatOpenAI(model="gpt-4o", temperature=0)
 
analysis_chain = RunnableParallel(
    summary=(
        ChatPromptTemplate.from_messages([("human", "Summarize: {text}")])
        | model | StrOutputParser()
    ),
    keywords=(
        ChatPromptTemplate.from_messages([("human", "Extract 5 keywords from: {text}")])
        | model | StrOutputParser()
    ),
    sentiment=(
        ChatPromptTemplate.from_messages([("human", "Analyze the sentiment of: {text}")])
        | model | StrOutputParser()
    ),
)
 
result = analysis_chain.invoke({"text": "LangChain makes building LLM apps easy."})

Branching with RunnableBranch

Routes input to different chains based on conditions.

from langchain_core.runnables import RunnableBranch
 
branch_chain = RunnableBranch(
    (
        lambda x: x["language"] == "technical",
        ChatPromptTemplate.from_messages([
            ("system", "You are a technical expert."), ("human", "{question}"),
        ]) | model | StrOutputParser()
    ),
    (
        lambda x: x["language"] == "simple",
        ChatPromptTemplate.from_messages([
            ("system", "Explain as if talking to a 10-year-old."), ("human", "{question}"),
        ]) | model | StrOutputParser()
    ),
    # Default branch
    ChatPromptTemplate.from_messages([("human", "{question}")]) | model | StrOutputParser()
)

Fallbacks

Define backup chains that run when the primary chain fails.

primary = ChatOpenAI(model="gpt-4o")
fallback = ChatAnthropic(model="claude-sonnet-4-20250514")
model_with_fallback = primary.with_fallbacks([fallback])
 
chain = (
    ChatPromptTemplate.from_messages([("human", "{question}")])
    | model_with_fallback
    | StrOutputParser()
)

Note: Fallbacks handle transient API errors, rate limits, and provider outages without disrupting the user experience.

4. RAG with LangChain

RAG Pipeline Architecture

Loading diagram…

Step 1: Load Documents

from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader, DirectoryLoader
 
pdf_docs = PyPDFLoader("data/report.pdf").load()
web_docs = WebBaseLoader("https://example.com/article").load()
all_docs = DirectoryLoader("data/docs/", glob="**/*.txt").load()

Step 2: Split Text

from langchain_text_splitters import RecursiveCharacterTextSplitter
 
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(all_docs)

Step 3: Create Embeddings and Vector Store

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
 
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_name="my_documents",
)

Step 4: Create Retriever

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5},
)
 
# MMR retriever for diverse results
mmr_retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7},
)

Step 5: Build the RAG Chain

from langchain_core.runnables import RunnablePassthrough
 
def format_docs(docs):
    return "\n\n---\n\n".join(doc.page_content for doc in docs)
 
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer based only on the following context.
If the context is insufficient, say so.
 
Context:
{context}"""),
    ("human", "{question}"),
])
 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)
 
answer = rag_chain.invoke("What are the key findings?")

Complete Working Example

"""Complete RAG pipeline: load, split, embed, store, retrieve, generate."""
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
 
docs = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(docs)
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings(model="text-embedding-3-small"))
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
 
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | ChatPromptTemplate.from_messages([
        ("system", "Answer based on the context below.\n\nContext:\n{context}"),
        ("human", "{question}"),
    ])
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)
 
print(rag_chain.invoke("What are the key components of an AI agent?"))

Note: For production RAG, consider adding a reranker (e.g., Cohere Rerank) between retrieval and generation, and hybrid search combining BM25 keyword search with semantic search.

5. LangGraph Fundamentals

What is LangGraph?

LangGraph is a framework for building stateful, multi-step agent workflows as directed graphs. While LangChain chains are linear pipelines, LangGraph supports cycles, conditional branching, and persistent state -- making it ideal for agent loops.

LangChain Chains vs LangGraph

Aspect	LangChain Chains (LCEL)	LangGraph
Execution flow	Linear (DAG)	Cycles allowed
State management	Input/output only	Explicit state object
Control flow	Pipe operator, branch	Conditional edges, loops
Persistence	None built-in	Built-in checkpointing
Human-in-the-loop	Not supported	`interrupt_before` / `interrupt_after`
Best for	Simple pipelines, RAG	Agents, multi-step reasoning

StateGraph Core Concepts

Loading diagram…

Nodes: Python functions that receive the current state, perform work, and return state updates.
Edges: Connections defining execution order.
Conditional Edges: Routing functions that decide the next node based on state.

Basic LangGraph Example

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from operator import add
 
class State(TypedDict):
    messages: Annotated[list[str], add]  # Accumulate via add reducer
    current_step: str
 
def step_one(state: State) -> dict:
    return {"messages": ["Step 1 completed"], "current_step": "step_one"}
 
def step_two(state: State) -> dict:
    return {"messages": ["Step 2 completed"], "current_step": "step_two"}
 
graph = StateGraph(State)
graph.add_node("step_one", step_one)
graph.add_node("step_two", step_two)
graph.add_edge(START, "step_one")
graph.add_edge("step_one", "step_two")
graph.add_edge("step_two", END)
 
app = graph.compile()
result = app.invoke({"messages": [], "current_step": ""})
print(result["messages"])  # ["Step 1 completed", "Step 2 completed"]

Conditional Edges

from typing import Literal
 
class State(TypedDict):
    query: str
    query_type: str
    response: str
 
def classify_query(state: State) -> dict:
    query = state["query"].lower()
    if "code" in query:
        return {"query_type": "coding"}
    elif "math" in query:
        return {"query_type": "math"}
    return {"query_type": "general"}
 
def handle_coding(state: State) -> dict:
    return {"response": f"[Coding Expert] {state['query']}"}
 
def handle_math(state: State) -> dict:
    return {"response": f"[Math Expert] {state['query']}"}
 
def handle_general(state: State) -> dict:
    return {"response": f"[General] {state['query']}"}
 
def route_query(state: State) -> Literal["handle_coding", "handle_math", "handle_general"]:
    return f"handle_{state['query_type']}"
 
graph = StateGraph(State)
graph.add_node("classify", classify_query)
graph.add_node("handle_coding", handle_coding)
graph.add_node("handle_math", handle_math)
graph.add_node("handle_general", handle_general)
 
graph.add_edge(START, "classify")
graph.add_conditional_edges("classify", route_query)
graph.add_edge("handle_coding", END)
graph.add_edge("handle_math", END)
graph.add_edge("handle_general", END)
 
app = graph.compile()
result = app.invoke({"query": "Write a sorting algorithm", "query_type": "", "response": ""})

Note: The Annotated[list, add] pattern is the most common reducer. It ensures every node's output messages are appended rather than replaced -- essential for agent conversation history.

6. Building Agents with LangGraph

The Agent Loop

An agent dynamically chooses its next step based on observations, following the ReAct pattern: Reason, Act, Observe, repeat.

Loading diagram…

ReAct Agent with create_react_agent

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
 
@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    return f"Search results for '{query}': LangChain is a framework for LLM apps..."
 
@tool
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"
 
@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is 22C and sunny."
 
agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o", temperature=0),
    tools=[search_web, calculator, get_weather],
    prompt="You are a helpful assistant. Use tools when needed.",
)
 
result = agent.invoke({
    "messages": [{"role": "user", "content": "What is 15 * 37 and what's the weather in Seoul?"}]
})
 
for msg in result["messages"]:
    print(f"[{msg.__class__.__name__}] {msg.content}")

Custom Agent with Tool Calling

from typing import TypedDict, Annotated, Literal
from langchain_core.messages import HumanMessage, ToolMessage, BaseMessage
from langgraph.graph import StateGraph, START, END
from operator import add
 
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add]
 
@tool
def lookup_database(query: str) -> str:
    """Look up information in the internal database."""
    db = {"langchain": "LangChain is a framework for LLM apps.",
          "langgraph": "LangGraph builds stateful agent workflows."}
    return db.get(query.lower().strip(), f"No results for '{query}'.")
 
tools = [lookup_database]
tool_map = {t.name: t for t in tools}
model = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)
 
def call_model(state: AgentState) -> dict:
    return {"messages": [model.invoke(state["messages"])]}
 
def call_tools(state: AgentState) -> dict:
    last_message = state["messages"][-1]
    results = []
    for tc in last_message.tool_calls:
        result = tool_map[tc["name"]].invoke(tc["args"])
        results.append(ToolMessage(content=str(result), tool_call_id=tc["id"]))
    return {"messages": results}
 
def should_continue(state: AgentState) -> Literal["call_tools", "end"]:
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "call_tools"
    return "end"
 
graph = StateGraph(AgentState)
graph.add_node("call_model", call_model)
graph.add_node("call_tools", call_tools)
graph.add_edge(START, "call_model")
graph.add_conditional_edges("call_model", should_continue, {"call_tools": "call_tools", "end": END})
graph.add_edge("call_tools", "call_model")  # Loop back
 
agent = graph.compile()
result = agent.invoke({"messages": [HumanMessage(content="Look up 'langchain' in the database.")]})

Adding Memory with Checkpointing

from langgraph.checkpoint.memory import MemorySaver
 
memory = MemorySaver()
agent_with_memory = graph.compile(checkpointer=memory)
 
config = {"configurable": {"thread_id": "user-123"}}
result1 = agent_with_memory.invoke(
    {"messages": [HumanMessage(content="Look up langchain")]}, config=config,
)
# Second turn - agent remembers conversation
result2 = agent_with_memory.invoke(
    {"messages": [HumanMessage(content="Summarize what you found.")]}, config=config,
)

Note: MemorySaver stores state in memory (lost on restart). For production, use SqliteSaver or PostgresSaver for durable persistence.

7. Multi-Agent Workflows

Why Multi-Agent?

Complex tasks benefit from decomposing work across multiple specialized agents. Each agent has its own tools, prompts, and expertise. A supervisor delegates tasks and synthesizes results.

Loading diagram…

Supervisor + Worker Implementation

from typing import TypedDict, Annotated, Literal
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, BaseMessage, SystemMessage
from langgraph.graph import StateGraph, START, END
from operator import add
 
class MultiAgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add]
    next_agent: str
    research_result: str
    code_result: str
    final_answer: str
 
model = ChatOpenAI(model="gpt-4o", temperature=0)
 
def research_agent(state: MultiAgentState) -> dict:
    response = model.invoke([
        SystemMessage(content="You are a research specialist. Gather relevant information."),
        HumanMessage(content=f"Research: {state['messages'][-1].content}"),
    ])
    return {"research_result": response.content, "messages": [response]}
 
def code_agent(state: MultiAgentState) -> dict:
    response = model.invoke([
        SystemMessage(content="You are a coding specialist. Write clean code."),
        HumanMessage(content=f"Based on research:\n{state.get('research_result', '')}\n\nImplement."),
    ])
    return {"code_result": response.content, "messages": [response]}
 
def writing_agent(state: MultiAgentState) -> dict:
    response = model.invoke([
        SystemMessage(content="You are a technical writer."),
        HumanMessage(content=f"Research:\n{state.get('research_result', '')}\n"
                     f"Code:\n{state.get('code_result', '')}\n\nWrite documentation."),
    ])
    return {"final_answer": response.content, "messages": [response]}
 
def supervisor(state: MultiAgentState) -> dict:
    response = model.invoke([
        SystemMessage(content="""Decide which agent to call next.
Agents: research, code, writing, FINISH. Respond with only the name."""),
        HumanMessage(content=f"Task: {state['messages'][0].content}\n"
                     f"Research: {'done' if state.get('research_result') else 'pending'}\n"
                     f"Code: {'done' if state.get('code_result') else 'pending'}\n"
                     f"Final: {'done' if state.get('final_answer') else 'pending'}"),
    ])
    return {"next_agent": response.content.strip().lower()}
 
def route_supervisor(state: MultiAgentState) -> Literal["research", "code", "writing", "end"]:
    n = state.get("next_agent", "")
    if "research" in n: return "research"
    elif "code" in n: return "code"
    elif "writing" in n: return "writing"
    return "end"
 
graph = StateGraph(MultiAgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("research", research_agent)
graph.add_node("code", code_agent)
graph.add_node("writing", writing_agent)
 
graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", route_supervisor,
    {"research": "research", "code": "code", "writing": "writing", "end": END})
graph.add_edge("research", "supervisor")
graph.add_edge("code", "supervisor")
graph.add_edge("writing", "supervisor")
 
multi_agent = graph.compile()
result = multi_agent.invoke({
    "messages": [HumanMessage(content="Create a Python rate limiter with token bucket")],
    "next_agent": "", "research_result": "", "code_result": "", "final_answer": "",
})

Sequential Pipeline

For fixed execution order, use a simple sequential pattern.

graph = StateGraph(MultiAgentState)
graph.add_node("research", research_agent)
graph.add_node("code", code_agent)
graph.add_node("writing", writing_agent)
graph.add_edge(START, "research")
graph.add_edge("research", "code")
graph.add_edge("code", "writing")
graph.add_edge("writing", END)
pipeline = graph.compile()

Parallel Execution

When agents are independent, run them concurrently.

from langchain_core.runnables import RunnableParallel
 
parallel_research = RunnableParallel(
    web=web_research_chain,
    db=database_research_chain,
    api=api_research_chain,
)
results = parallel_research.invoke({"query": "LangChain best practices"})

Human-in-the-Loop

LangGraph supports human intervention using interrupt_before or interrupt_after.

from langgraph.checkpoint.memory import MemorySaver
 
agent = graph.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["call_tools"],  # Pause before tool execution
)
 
config = {"configurable": {"thread_id": "review-123"}}
 
# First invocation - pauses before tools
result = agent.invoke(
    {"messages": [HumanMessage(content="Delete all old records")]}, config=config,
)
 
# Inspect planned action
print(f"Agent wants to call: {result['messages'][-1].tool_calls}")
 
# Approve and resume
result = agent.invoke(None, config=config)
 
# Or modify before resuming
agent.update_state(config, {"messages": [HumanMessage(content="Only delete records > 1 year old")]})
result = agent.invoke(None, config=config)

Note: Human-in-the-loop is critical for high-stakes operations like database modifications, financial transactions, or sending communications.

8. LangSmith Observability

What is LangSmith?

LangSmith is LangChain's observability platform for tracing, evaluating, and monitoring LLM applications. It provides visibility into every step -- prompts sent, LLM responses, latency, and cost.

Setup

pip install langsmith
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="ls__your_api_key_here"
export LANGCHAIN_PROJECT="my-langchain-project"

Note: Once environment variables are set, all LangChain and LangGraph operations are automatically traced with no code changes needed.

Tracing

# Automatic tracing - just invoke your chain
result = chain.invoke(
    {"question": "What is LangSmith?"},
    config={"run_name": "langsmith-demo"},  # Custom name for identification
)

Each trace shows full input/output at every step, latency breakdown, token usage, cost estimates, and error details.

Custom Tracing with Decorators

from langsmith import traceable
 
@traceable(name="my-rag-pipeline")
def rag_query(question: str) -> str:
    docs = retriever.invoke(question)
    context = "\n".join(doc.page_content for doc in docs)
    return chain.invoke({"context": context, "question": question})

Evaluation

from langsmith import Client
from langsmith.evaluation import evaluate
 
client = Client()
 
dataset = client.create_dataset("qa-eval-dataset")
client.create_examples(
    inputs=[{"question": "What is LangChain?"}, {"question": "What is LangGraph?"}],
    outputs=[
        {"answer": "A framework for building LLM applications."},
        {"answer": "A library for stateful agent workflows."},
    ],
    dataset_id=dataset.id,
)
 
def predict(inputs: dict) -> dict:
    return {"answer": chain.invoke(inputs)}
 
results = evaluate(predict, data=dataset.name, evaluators=["qa", "relevance"],
                   experiment_prefix="rag-v1")

Prompt Hub

from langchain import hub
 
prompt = hub.pull("rlm/rag-prompt")               # Pull shared prompt
hub.push("my-org/my-prompt", prompt, new_repo_is_public=False)  # Push your own

9. Production Deployment

LangServe (FastAPI)

LangServe exposes any LangChain Runnable as a REST API with automatic OpenAPI docs and streaming.

# server.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
 
chain = (
    ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant."),
        ("human", "{question}"),
    ])
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)
 
app = FastAPI(title="LangChain API", version="1.0")
add_routes(app, chain, path="/chat")
# Run: uvicorn server:app --host 0.0.0.0 --port 8000

# Auto-generated endpoints:
# POST /chat/invoke     - Single invocation
# POST /chat/batch      - Batch invocation
# POST /chat/stream     - Streaming
# GET  /chat/playground - Interactive UI

Client:

from langserve import RemoteRunnable
 
chain = RemoteRunnable("http://localhost:8000/chat")
result = chain.invoke({"question": "What is LangChain?"})

Error Handling

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
 
# Retry with backoff
model = ChatOpenAI(model="gpt-4o", max_retries=3, request_timeout=30)
 
# Fallback models
resilient_model = ChatOpenAI(model="gpt-4o").with_fallbacks([
    ChatAnthropic(model="claude-sonnet-4-20250514")
])
 
# Error handling in LangGraph nodes
def safe_node(state: dict) -> dict:
    try:
        result = model.invoke(state["messages"])
        return {"messages": [result]}
    except Exception as e:
        logger.error(f"Node failed: {e}")
        return {"messages": [AIMessage(content=f"Error: {e}. Please try again.")]}

Streaming

from fastapi.responses import StreamingResponse
import json
 
@app.post("/stream")
async def stream_response(request: dict):
    async def generate():
        async for chunk in chain.astream(request):
            yield f"data: {json.dumps({'content': chunk})}\n\n"
        yield "data: [DONE]\n\n"
    return StreamingResponse(generate(), media_type="text/event-stream")
 
# LangGraph event streaming
async for event in agent.astream_events(
    {"messages": [HumanMessage(content="Search for LangChain")]}, version="v2",
):
    if event["event"] == "on_chat_model_stream":
        print(event["data"]["chunk"].content, end="")

Caching

from langchain_core.globals import set_llm_cache
from langchain_community.cache import InMemoryCache, SQLiteCache
 
set_llm_cache(InMemoryCache())                              # Fast, in-memory
set_llm_cache(SQLiteCache(database_path=".langchain.db"))   # Persistent
 
# Semantic cache for similar queries
from langchain_community.cache import RedisSemanticCache
set_llm_cache(RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings(),
    score_threshold=0.95,
))

Rate Limiting

import asyncio
from collections import deque
from time import time
 
class RateLimiter:
    def __init__(self, max_calls: int, time_window: float = 60.0):
        self.max_calls = max_calls
        self.time_window = time_window
        self.calls = deque()
 
    async def acquire(self):
        now = time()
        while self.calls and self.calls[0] < now - self.time_window:
            self.calls.popleft()
        if len(self.calls) >= self.max_calls:
            await asyncio.sleep(self.calls[0] + self.time_window - now)
        self.calls.append(time())
 
rate_limiter = RateLimiter(max_calls=50, time_window=60)
 
async def rate_limited_invoke(chain, input_data):
    await rate_limiter.acquire()
    return await chain.ainvoke(input_data)

Monitoring Best Practices

Area	Tool / Approach	What to Track
Tracing	LangSmith	Every invocation, latency, tokens
Metrics	Prometheus + Grafana	Request rate, error rate, p50/p95 latency
Logging	Structured JSON logs	Input/output summaries, errors, tool calls
Alerts	PagerDuty / Opsgenie	Error rate spikes, latency, cost anomalies
Cost	LangSmith / custom	Token usage per chain, daily cost trends
Quality	LangSmith evaluations	Correctness, relevance, hallucination rate

Deployment Checklist

Environment variables: API keys in secrets manager, not in code
Rate limiting: Configured within provider quotas
Fallbacks: Backup model for critical chains
Caching: Enabled for repeated queries
Streaming: Implemented for user-facing endpoints
Error handling: Graceful degradation on API failures
Timeouts: Set at HTTP and LLM client levels
Tracing: LangSmith or equivalent enabled
Evaluation: Baseline metrics established
Cost monitoring: Alerts for unexpected token usage
Input validation: Sanitized before reaching prompts
Prompt injection defense: System prompts hardened

References

LangChain Documentation: https://python.langchain.com/docs/
LangGraph Documentation: https://langchain-ai.github.io/langgraph/
LangSmith Documentation: https://docs.smith.langchain.com/
LangServe Documentation: https://python.langchain.com/docs/langserve/
LangChain GitHub: https://github.com/langchain-ai/langchain
LangGraph GitHub: https://github.com/langchain-ai/langgraph
LCEL Conceptual Guide: https://python.langchain.com/docs/concepts/lcel/
LangChain RAG Tutorial: https://python.langchain.com/docs/tutorials/rag/
LangGraph Agent Tutorial: https://langchain-ai.github.io/langgraph/tutorials/introduction/

— Data Dynamics Engineering Team