(AI #13) LangChain

Agentic AI refers to AI systems that can act towards a goal, rather than just responding to a single prompt. Agentic AI can plan, decide, act and iterate to complete tasks with minimal human input. Before digging deeper into required tools, lets understand what are the best tools which we need to design, build, deploy & maintain AI agents in detail.

We need to understand the below important aspects of agents before discussing further topics. Think of agent system like a company :

Root Agent - CEO (decides overall goal & strategy)
Sub Agent - Managers (handles specific tasks)
Tool - Employees/ Utilities (do actual work)

One root agent can be associated with multiple sub-agents, similarly one sub-agent will be associated with multiple tools. Tool could be a API call, RAG system or a DB etc. We will see more details further.

We need a set of tools or frameworks to build Agentic AI systems as below.

Lang chain (Lang Chain is a framework and below are sub-systems in it)

Lang chain (Lang chain is also a sub-system)
Lang Graph
Lang Smith (meant for observability - logging & debugging)
Lang Graph Cloud

In production environments, we will integrate Lang Graph with Lang Smith and Lang Smith will start logging information. Suppose if I want to deploy this Lang Graph into a cloud environment(Azure), we can use Lang Graph cloud.

Lang represents Large Language Models, chain means along with LLMs, we need to integrate others aspects as below.

LLM is not sufficient to build an application. We should aware of below integrations :

Text Processing
Text Splitting
Retrieval
Memory
Indexing
Output Formats
Runnables
Agents

Let us consider, we need to build an agent to process input data, need LLM as a thinking engine and convert it into a JSON format. We need a chaining system to connect all these individual components and we need LLMs to complete this functionality. This is the reason we are calling it as LangChain.

Alternate Tools for Lang Chain : Each having its own pros & cons, but Lang Chain is production ready.

Microsoft AutoGen
Crew AI
AutoGPT
Semantic Kernel
Lang Graph
Smolagents etc.
Google ADK (Agent Development Kit)

Note : We are going discuss about following topics in future blogs. We need to think about all these aspects while designing a agent. But the thinking engine is always LLM.

Lang Chain
Lang Graph
Context Engineering

Prompt Engineering - Production grade
RAG

Text Splitters/Chunking
Embeddings
Retrievers
Vector DB
Re-Ranking (it will set similarity score to decide source of truth)
Cross Encoders
Filtering (to restrict data from unauthorized access)
Retriever Techniques
RAG + Prompt Engineering

ICL (In-context learning)

What if a model is not returning expected results ? When a model is not giving accurate answers as expected, then changing the model is not always the correct solution. We need to understand the underlying problems.

LLM is not performing as expected ? then we need to do

PEFT - Parameter Efficient Fine Tuning(LoRA, QLoRA)
FFT - Full Fine Tuning
ICL - In-Context learning

Prompt Engineering

zero/one/few short prompt
chain of thought
tree of thoughts

Above techniques will prevent hallucination, and increase accuracy

Observability : Below techniques used to keep track of Agentic AI application. We will see Lang Smith & also custom logic.

Lang Smith
Custom Logic
Grafana
Watch Dog
Opik
Open Telemetry

Agentic AI design patterns are very important for building Agentic AI applications. It is very difficult to design the application from scratch without understand Design Patterns.

Security : We will discuss about Guard Rails.

Input & Output Guard Rails

Finally we will discuss about Deployment & Operations of AI applications. This is full stack Agentic AI.

Production Tools :

Generally, for a Data Engineer, it will take months to build a proper pipeline end-to-end and to deploy it in production which includes design, development, testing, configurations etc. but if we use production tools and create a accurate prompt to fulfil same work, it will just take 1 day to complete this activity ! This is fact and we will see this in Agentic AI development.

In current market,

Anthropic Claude Code will help to achieve this task (but prompt should be very accurate)

And our responsibility is just to test the code. Using codex 5.x we can generate a website within a day. Similarly, we can use Cursor AI, GitHub copilot, Roocode as well to do these type of activities.

Going forward, it is predicted that Agentic AI jobs will be on productivity tools only. But we should be aware of whether this code is correct or not by knowing all the above techniques. This will increase the productivity and reduce dependency on developers.

Langchain

LangChain is a framework used to build applications powered by Large Language Models. LangChain helps you connect LLMs with tools, data, and workflows to build real world AI apps.

LLMs alone can :

Answer questions
Generate text

But they can't directly :

Access your database
Call your API's
Remember past conversations
Perform multi step reasoning

LangChain solves this by acting as a bridge & orchestrator. LangChain documentation is available at official website https://docs.langchain.com/

Real time use case :

Consider we implemented a POC for building an AI Agent using Agentic AI, and we have used OpenAI LLM model for this application. Once POC is completed, client recommended to use LLMs from either Google Gemini/Anthropic Claude! Now it is very complex to rebuild entire application using different LLMs as they have different standards. To fix this issue, LangChain came up with wrappers. Using these wrappers, even if we build POC using LangChain_OpenAI wrapper, incase if we have to change to Gemini then simply use LangChain_Google_Gemini (instead of OpenAI wrapper). As simple as that.

LangChain Ecosystem : It is important to note that LangChain is the core engine. On the top of LangChain they have built below libraries/modules.

LangChain_Core - serving LLMs, Chat Models, Prompt Templates, Runnable, Output Parsers
LanChain_Community - APIs, Vector DBs, Data Loaders, Tools (deal with external sources)
LangChain_Experimental - New Features like incremental data loading etc. (may not ready for production - need to read documentation to decide on readiness for production)

Above information is very important for developers to select on libraries while building AI Agents.

Core components of Lang Chain

1) Models - Models are of three types as below :

LLM
Chat Model
Embedding Model

LLM :

Traditional or old model
Input & Output both are text/string
This is deprecated, no one using it now.

Chat Model :

To understand chat model, we must aware of :

System Message - Instruction to model (Ex : You are a helpful assistant user chat bot )
User Message - User question (Ex : What is the capital of India ? )
AI Message - Response from model (Ex : Capital of India is New Delhi )

Embedding Model :

An embedding model converts text into numbers (vectors) that captures its meaning (Semantics).
Instead of generating text like an LLM, it creates a mathematical representation of text as below.

"I love dogs" will be converted into some vector like [0.21, -0.45, 0.88, ..., 0.12]

Embeddings will help when we are constructing a RAG application

Sample code for Embedding Model as below :

from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

##https://platform.openai.com/docs/guides/embeddings

load_dotenv()
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
result = embedding.embed_query("Delhi is the capital of India")

print(len(result))

Please refer following open ai documentation if you need more information about above embedding model : https://developers.openai.com/api/docs/models/text-embedding-3-large

Also note that, I will use open AI models going forward. Hence we need open AI key. We can generate it here : https://platform.openai.com/api-keys (Please recharge at least $5 and we can use it for learning AI. Do not share that key with others !)

Incase if you are not comfortable generating open AI key, then we can use a free keys from HuggingFace : https://huggingface.co/

Remember accuracy of the model will be more for Open AI as they are closed source models.

Please see below image to understand the flow of a RAG(retrieval part, not full flow of RAG). We have to divide the input files into chunks to fit context_window size and then convert them into Embeddings to store in Vector DB.

Vector DB could be anything, FIASS DB, Mango DB, MySQL, Oracle etc. We will see more implementation in future blogs of Agentic AI.

Where to keep API keys of models while programming ? Put them under .env file under your project directory. We should not hardcode them. We will be charged per token by model based on this API key. If you hardcode, it will be public and anyone can use your key. In the run time, we will call this .env file and use these API keys. Below code does this work.

from dotenv import load_dotenv

load_dotenv()

Lets see some code !!

Implementation of LLM model :

from langchain_openai import OpenAI #Wrapper for openAI from langchain
from dotenv import load_dotenv # dotenv helps to import load_dotenv

load_dotenv() # initiating .env file load process

llm = OpenAI(model='gpt-3.5-turbo-instruct') 
# Created instance with name as llm for OpenAI class by passing model name
result = llm.invoke("What is the capital on India ?")
# calling llm.invoke() to send request & get response from model 

print(result)

We can see all the available GPT models from this official website of OpenAI : https://developers.openai.com/api/docs/models/all

Output :

Now this program is working as a ChatGPT. Even if we ask same question in chatGPT, we will get same response. Only difference is, current version on chatGPT is using GPT5.2 but in the above program we are using older version.

Implementation of Chat model :

# importing ChatOpenAI as we are using Chat Model 
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

# Setting temperature=0 for deterministic output 
# If we need creativity, increase the value like temperature = 1.5
llm = ChatOpenAI(model='gpt-4', temperature=0)
result = llm.invoke("Suggest 5 indian names")

# It will return JSON. Hence using correct key 'content' from result which has names 
associated with it
print(result.content)

Please see comments in the code for more information.

Difference between LLM & Chat model is the class name and the return type. LLM model returns text/string but Chat model return JSON type.

Implementation of Embedding model :

from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()

embedding = OpenAIEmbeddings(model='text-embedding-3-large')
result =embedding.embed_query("Delhi is the captial of India")
print(len(result))

Output :
3072

Note : Try to print just result, 
then you will see embedding vector in 3072 values/dimentions

We will use OpenAIEmbeddings class from langchain_openai wrapper
While developing the RAG, after extracting data from input, it will be converted into chunks, then into vector DB. Here we use Embeddings.
Note that we used embed_query() to convert single query into vector/embeddings.

In the above code, we have used only one query i.e. "Delhi is the capital of India" ; Lets see how to convert multiple queries into Embeddings using Embedding model as below.

from langchain_openai import OpenAIEmbeddings

from dotenv import load_dotenv

##https://platform.openai.com/docs/guides/embeddings

load_dotenv()

documents = [

    "Delhi is the capital of India",

    "Hyderabad is the capital of Telangana",

    "Paris is the capital of France"

]

embedding = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=10)

result = embedding.embed_documents(documents)

print(str(result))

Output : Note, it printed embeddings for 3 input queries.

[[-0.28195515275001526, 0.47907423973083496, -0.012972092255949974, 0.778949320316314, -0.01074627973139286, 0.20619246363639832, -0.048287373036146164, 0.1165928840637207

, -0.1593511700630188, 0.010469825007021427],

[0.2670007646083832, 0.5190368890762329

, -0.07652971148490906, 0.09313521534204483, 0.05723319947719574, 0.1342223584651947

, 0.08466838300228119, -0.15410958230495453, -0.4568154513835907, -0.6195887923240662],

[-0.22880549728870392, 0.678465723991394, -0.07834821194410324, 0.4880889356136322

, -0.2670133709907532, -0.12555591762065887, 0.2731972932815552, -0.1368194967508316

, -0.01236096303910017, -0.24978668987751007]]

Note that we have used embed_documents() here to convert input queries into Embeddings/vector.
Also, for simplicity we have controlled total dimensions to 10 (from 3076 ) - otherwise above print statement would have printed 3076 dimension vector which is the size of "text-embedding-3-large" model
Important thing to remember : controlling the dimensions will control the cost, in real time, try to run multiple times with lower no. of dimensions and check the accuracy/groundness of response, if we are getting a good response as expected with lower number of dimensions then set that value for dimensions permanently. It will reduce the cost a lot.

2) Prompts - We are going to discuss about basic - intermediate level of Prompts in this blog. But I will write production grade prompt in upcoming blog with real time use cases.

Implementation of Static Prompt :

# Not flexible for dynamic user input.

# Cannot adapt to changing conversation flow.

from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv()

# It will consider default model(latest model), if we don't mention model name

model = ChatOpenAI()

result = model.invoke("Explain langchain?")

print(result.content)

Output :

LangChain is a decentralized open-source blockchain project that aims to revolutionize

the language service industry. It uses Artificial Intelligence and blockchain

technology to translate language. The main goal of LangChain is to reduce the cost of

language services and provide fair share of profits to participating translators.

It includes features like trustable translation, personalized learning,

and secured data. The platform typically uses LangCoin tokens to support

its ecosystem and facilitate transactions.

Above example represents a Static Prompt
We can't change that prompt at run time
This is not a correct approach, we are learning incorrect approaches before landing at correct place!

Implementation of Dynamic Prompt :

from langchain_core.prompts import PromptTemplate

from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI()

prompt = PromptTemplate(

    input_variables=["topic"],

    template="Explain {topic} in simple terms."

)

formatted_prompt = prompt.format(topic="Agentic AI")

print(formatted_prompt)

print("++++++++++++++++++++++++++++++++++++++++")

result = model.invoke(formatted_prompt)

print(result.content)

Explain Agentic AI in simple terms. ++++++++++++++++++++++++++++++++++++++++ Agentic AI refers to artificial intelligence systems that are designed to act

autonomously and make decisions on behalf of users, without direct input or

intervention. These systems can perform tasks such as gathering and analyzing data,

making recommendations, and controlling systems or devices, all without human

intervention. Agentic AI is a powerful tool that can help automate complex

processes and improve efficiency in a wide range of industries.

For dynamic prompt, we use a class called PromptTemplate from core engine langchain_core.prompts
Observe the instance for PromptTemplate

We have a input_variables list and a template

Instead of hardcoding, we are dynamically sending the topic

In real time, we use PromptTemplate for generating the prompt.

Implementation of Dynamic prompt by randomly selecting templates from a list of templates :

# 💡 Definition:
# A flexible and adaptive prompt that changes based on user input, external data, or context.

# 📌 Characteristics:
# - The prompt adjusts dynamically.
# - Uses real-time variables (user input, API results, etc.).
# - Good for personalized, multi-step conversations.

import random
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv  # ✅ Updated import (newer LangChain versions)

load_dotenv()
model = ChatOpenAI()
# Define multiple prompt templates
templates = [
    "Summarize {topic} in one paragraph.",
    "Give a brief explanation of {topic}.",
    "Explain {topic} as if I am a house wife."
]

# Randomly select a template
selected_template = random.choice(templates)
print("🔹 Selected Template:", selected_template)

# Create a PromptTemplate object
prompt = PromptTemplate(
    input_variables=["topic"],
    template=selected_template
)

# Format the prompt with a specific topic
formatted_prompt = prompt.format(topic="Artificial Intelligence")
print("✅ Formatted Prompt:", formatted_prompt)

result = model.invoke(formatted_prompt)
print("✅ Result:", result.content)

We have randomly selected the prompt from a list of templates in the above example.

Implementing context aware chat prompt :

# 📌 Use Cases:
# ✔️ Conversational AI & Chatbots (context-aware prompts).
# ✔️ Adaptive Question Answering (modifies based on previous responses).
# ✔️ Personalized User Interactions (e.g., changing prompts based on user profiles).

# ❌ Limitations:

# More complex to implement than static prompting.
# Requires external logic (e.g., history tracking).

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_classic.schema import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a sales assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{question}")
])

formatted_messages = chat_prompt.format_messages(
    chat_history=[
        HumanMessage(content="Suggest a destination for a summer vacation."),
        AIMessage(content="How about Bali, Indonesia? It's great for summer!")
    ],
    question="What are the best activities to do there?"
)

llm = ChatOpenAI(model='gpt-4')
result = llm.invoke(formatted_messages)
print(result.content)

Explanation :

This code is building a chat based AI assistant with memory(chat history) and sending it to LLM
Flow :

Load environmental variables
Create a prompt template
Inject chat history + new question
Send to LLM
Print response

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

ChatPromptTemplate - Helps structure chat prompts (system + user + history)
MessagesPlaceholder - Allow dynamic insertion of chat history

from langchain_classic.schema import AIMessage, HumanMessage

HumanMessage - represents user input
AIMessage - represents model reponse
Above 2 classes simulate real conversation

from langchain_openai import ChatOpenAI

Connects to OpenAI chat model

Prompt template creation :

("system", "You are a sales assistant.") - set AI role, now model responds like a sales assistant
MessagesPlaceholder(variable_name="chat_history") - This is dynamic memory injection. At run time, this will be replaced with past conversation
("human", "{question}") - {question} is a variable that will be filled later

Formatting the prompt :

formatted_messages = chat_prompt.format_messages( - You now inject real data into template
chat_history=[
HumanMessage(content="Suggest a destination for a summer vacation."),
AIMessage(content="How about Bali, Indonesia? It's great for summer!")
] - This simulates, User - Suggest a destination; AI - Bali

New question :

question="What are the best activities to do there?"

Final prompt sent to LLM :

System: You are a sales assistant.
User: Suggest a destination for a summer vacation.
AI: How about Bali, Indonesia? It's great for summer!
User: What are the best activities to do there?
This is why the model understands “there” = Bali

Initialize LLM

llm = ChatOpenAI(model='gpt-4')
This is a chat model, not a simple LLM

Invoke the model

result = llm.invoke(formatted_messages)
Sends the structured conversation to the model
Model generates a response

Output

print(result.content)
Prints only the text response (not metadata)

Key Concepts we should take away :

PromptTemplates - Reusable structure for AI inputs
Chat History = Memory
Message Types

System - behavior
Human - user input
AI - model response

Context Awareness

Because of history, "there" understood as Bali

Implementing a prompt template with multiple variables :

from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI()
# Define a template with multiple placeholders
template = PromptTemplate(
    input_variables=["name", "hobby"],
    template="Hello {name}! I heard you like {hobby}. Can you tell me more about it?"
)

# Format with different values
formatted_prompt = template.format(name="Anil", hobby="cricket")

print(formatted_prompt)

print("*************************************")

result = model.invoke(formatted_prompt)
print(result.content)

We have already seen this code, only change is that we re handling multiple variables for prompt in this case.

Implementation of Few Shot Prompting with PromptTemplate :

from langchain_core.prompts import FewShotPromptTemplate

from langchain_core.prompts import PromptTemplate

from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI()

# Define examples

examples = [

    {"input": "Explain AI", "output": "AI is the simulation of human intelligence in machines."},

    {"input": "Explain Blockchain", "output": "Blockchain is a decentralized digital ledger."}

]

# Define an example template

example_template = PromptTemplate(

    input_variables=["input", "output"],

    template="Q: {input}\nA: {output}"

)

# Create Few-Shot PromptTemplate

few_shot_prompt = FewShotPromptTemplate(

    examples=examples,

    example_prompt=example_template,

    prefix="Answer the following questions:",

    suffix="Q: {question}\nA:",

    input_variables=["question"]

)

# Format the prompt

formatted_prompt = few_shot_prompt.format(question="Explain Data Science")

print(formatted_prompt)

result = model.invoke(formatted_prompt)

print(result.content)

Output :

Answer the following questions: Q: Explain AI A: AI is the simulation of human intelligence in machines. Q: Explain Blockchain A: Blockchain is a decentralized digital ledger. Q: Explain Data Science A: Data Science is the study of data, involving the collection, analysis,

interpretation, and presentation of large amounts of data to gain insights

and make decisions.

Implementing a chat bot in a simplified way :

from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI()

chat_history = [
    SystemMessage(content='You are a helpful AI assistant')
]

while True:
    user_input = input('You: ')
    chat_history.append(HumanMessage(content=user_input))
    if user_input == 'exit':
        break
    result = model.invoke(chat_history)
    chat_history.append(AIMessage(content=result.content))
    print("AI: ",result.content)

print(chat_history)

See, how simple is implementing a chat bot !

Note : We have covered Models, Prompts concepts. We will dig deeper into these in future. To dig deep, we need basics. Hence covered the basics. We have hardcoded all the prompts in code but in real time, we will get them at run time from source. We will see clearly about it.

What we have discussed so far in this blog ?

LangChain

Models

LLM Model
Chat Model
Embedding Model

Prompts

Static Prompt
Dynamic Prompt using PromptTemplate, MessagePlaceholder, AIMessages, HumanMessages, SystemMessages, FewShotPromptTemplate

We will continue with remaining topics of Lang Chain. Pending topics are as below.

Chains
Output Parsers
Indexes

Data Loader, Spliting, Embedding, Vector DB, Retriever

Memory
Agents
Callbacks (Logging & Monitoring)

Next important concept is called Chains or Runnable or LCEL.

3) Chains/Runnable/LCEL(LangChain Expression Language)

When we give a prompt to LLM, it will process and give us back response in some format, lets say JSON in this case. Prompt, LLM, Response are 3 different entities connected using a concept called Chains. Using chains, we will integrate one step to another in a sequential/parallel manner.

In LangChain, how to represent a chain ? By using pipe symbol '|' , it represents the connection of different entities.

Example : Prompt | LLM | Output

Implementation of Simple Chain :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

model = ChatOpenAI()

prompt = PromptTemplate(
    template= "Generate 2 important topics about {topic}",
    input_variables=["topic"]
)

parser = StrOutputParser()

chain = prompt | model | parser

result = chain.invoke({"topic":"what is chains in langchain"})

print(result)

Note that we have implemented StrOutputParser class and included it while creating chain. Pipe symbol represents Chains in langchain.

Implementation of a Sequential Chain :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

model = ChatOpenAI()

prompt_1 = PromptTemplate(
    template= "Generate a detailed report about a {topic}",
    input_variables=["topic"]
)

prompt_2 = PromptTemplate(
    template= "create a top 5 interview questions {topic}",
    input_variables=["topic"]
)

parser = StrOutputParser()

chain = prompt_1 | model | parser | prompt_2 | model | parser

result = chain.invoke({"topic":"challenges with deep neural network"})

print(result)

chain.get_graph().print_ascii()

Explanation :

Instead of creating one prompt, we have created 2 prompts
While creating a chain, we have sequentially chained prompt_1 followed by prompt_2
During the execution of chain, output from first parser will act as input context to prompt_2

Implementing a Parallel chain :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnableParallel

load_dotenv()

model = ChatOpenAI()

prompt_1 = PromptTemplate(
    template= "Generate a detailed report about a \n {topic}",
    input_variables=["topic"]
)

prompt_2 = PromptTemplate(
    template= "create a top 5 interview questions \n {topic}",
    input_variables=["topic"]
)

prompt_3 = PromptTemplate(
    template= " merge the provided detailed report and 5 interview questions into a single document \n notes -> {notes} and quiz -> {quiz}",
    input_variables=["notes", "quiz"]
)

parser = StrOutputParser()

parallel_chain = RunnableParallel({
    "notes": prompt_1 | model | parser,
    "quiz" : prompt_2 | model |parser
}
)

merge_chain = prompt_3 | model | parser

chain = parallel_chain | merge_chain


topic = """
Large Language Models (LLMs) are advanced artificial intelligence models designed to process and generate human-like text based on vast amounts of training data. These models, such as OpenAI’s GPT series, Google’s Gemini, and Meta’s LLaMA, use deep learning techniques, particularly transformer architectures, to understand and generate contextually relevant responses. LLMs power a wide range of applications, including chatbots, content creation, code generation, and research assistance. They excel in natural language understanding and generation, making them valuable for automating tasks that require linguistic intelligence. However, they also pose challenges, such as bias, misinformation, and high computational requirements, necessitating careful deployment and ethical considerations.
"""

result = chain.invoke({"topic":topic})
print(result)

chain.get_graph().print_ascii()

Explanation :

We have created 3 prompts each with a separate prompt template

1st, 2nd prompts are straight forward as we can see
3rd prompt combines 1st, 2nd prompts and creating a template combining both

Created parallel_chain instance for class RunnableParallel which is of type dict

It has 2 keys i.e. notes, quiz with values as a separate chain
values will the output from parser output

Created first chain merge_chain = prompt_3 | model | parser
Created second chain, chain = parallel_chain | merge_chain
It will have detailed report from 1st prompt and 5 interview questions from 2nd prompt

Main take away from this implementation is using RunnableParallel class to create parallelism.

Lets understand RunnableLambda :

RunnableLambda means wrap my python function so LangChain can use it pipelines.

# 1. RunnableLambda
# A RunnableLambda allows wrapping a simple Python function into a 
# LangChain-compatible Runnable.

from langchain_core.runnables import RunnableLambda

# Define a simple function
def reverse_string(s: str) -> str:
    return s[::-1]

# Convert it into a Runnable
runnable = RunnableLambda(reverse_string)

# Execute
print(runnable.invoke("LangChain"))  # Output: "niahCgnaL"

Explanation :

Remember that we are talking about Chains, Runnable & LCEL. Hence everything must be a type of Runnable object to use it in runnable pipelines
Above code shows how to convert a simple definition into a Runnable so that we can use in in chains.
RunnableLambda class help us to achieve this task.
We have a Lambda function to reverse a string, which we have converted into of type RunnableLambda and used it to reverse a string "LangChain"
We haven't used any model here. This explanation is all about converting a simple lambda function into a Runnable

RunnablePassthrough : Generally in python, if we don't want to define any code inside a function, then we simply use 'pass'. Similarly, in chains we have a class called RunnablePassthrough to achieve this task as below. We will see why we need this in later stage.

# 2. RunnablePassthrough
# A RunnablePassthrough is a basic implementation that returns 
# the input as output without any modifications. It is useful when 
# integrating components that do not require processing at a certain stage.

from langchain_core.runnables import RunnablePassthrough

runnable = RunnablePassthrough()
print(runnable.invoke("Hello, World!"))  # Output: "Hello, World!"

RunnableParallel : This class will be useful to run multiple Runnable components in parallel and it outputs a dictionary of results.

# 3. RunnableParallel
# RunnableParallel allows running multiple Runnable components in parallel, 
and it outputs a dictionary of results.

from langchain.schema.runnable import RunnableParallel, RunnableLambda, RunnablePassthrough

# Define multiple runnables
# uppercase_runnable = RunnableLambda(lambda x: x.upper())
# reverse_runnable = RunnableLambda(lambda x: x[::-1])

# Run them in parallel
parallel_runnable = RunnableParallel({
    "uppercase": RunnableLambda(lambda x: x.upper()),
    "reverse": RunnableLambda(lambda x: x[::-1]),
    "same data": RunnablePassthrough()
})

print(parallel_runnable.invoke("LangChain"))
# Output: {'uppercase': 'LANGCHAIN', 'reverse': 'niahCgnaL'}

Program is self explanatory, we are using RunnableParallel class to create a dictionary of RunnableLambda classes, each of its own implementation and all the RunnableLambda class outputs will be associated with keys like uppercase, reverse, same data.

RunnableMap : We don't have a class called RunnableMap, but we can achieve it using RunnableLambda class itself.

# 4. RunnableMap
# A RunnableMap works like RunnableParallel but applies a single Runnable to each element of an input list.

from langchain_core.runnables import RunnableLambda

# Define a RunnableLambda that applies upper() to each element in a list
uppercase_runnable = RunnableLambda(lambda x: [word.upper() for word in x])

# Invoke with a list of strings
print(uppercase_runnable.invoke(["hello", "world"]))  
# Output: ['HELLO', 'WORLD']

RunnableBranch :

# 6. RunnableBranch
# A RunnableBranch allows conditional branching, executing different Runnable components based on a condition.

from langchain_core.runnables import RunnableBranch,RunnableLambda

# Define different functions
uppercase = RunnableLambda(lambda x: x.upper())
reverse = RunnableLambda(lambda x: x[::-1])
default = RunnableLambda(lambda x: f"Unknown: {x}")

# Create a branch
branch = RunnableBranch(
    (lambda x: "uppercase" in x, uppercase),
    (lambda x: "reverse" in x, reverse),
    default  # Default branch
)

print(branch.invoke("uppercase me"))  # Output: "UPPERCASE ME"
print(branch.invoke("reverse me"))    # Output: "em esrever"
print(branch.invoke("something else"))  # Output: "Unknown: something else"

Another example of RunnableBranch :

from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from dotenv import load_dotenv
from langchain_core.runnables import RunnableSequence, RunnableParallel, RunnablePassthrough, RunnableBranch, RunnableLambda

load_dotenv()

prompt1 = PromptTemplate(
    template='Write a detailed report on {topic}',
    input_variables=['topic']
)

prompt2 = PromptTemplate(
    template='Summarize the following text \n {text}',
    input_variables=['text']
)

model = ChatOpenAI()

parser = StrOutputParser()

report_gen_chain = prompt1 | model | parser

branch_chain = RunnableBranch(
    (lambda x: len(x.split())>300, prompt2 | model | parser),
    RunnablePassthrough()
)

final_chain = RunnableSequence(report_gen_chain, branch_chain)

print(final_chain.invoke({'topic':'IPL 2025'}))

final_chain.get_graph().print_ascii()

Explanation :

Take away logic

RunnableBranch have a condition which is working as
if the topic from prompt1 is > 300 words after splitting based on condition, then only it will execute prompt2, else it will just RunnablePassthrough

4) Output Parsers

Output parsers are used to instruct the LLM to write data in a specific format. We have below types of output parsers which we can use extensively.

String output parser
JSON output parser
Structured output parser
Pydantic output parser
CSV output parser

StrOutputParser :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

load_dotenv()


model = ChatOpenAI()

# 1st prompt -> detailed report
template1 = PromptTemplate(
    template='Write a detailed report on {topic}',
    input_variables=['topic']
)

# 2nd prompt -> summary
template2 = PromptTemplate(
    template='Write a 5 line summary on the following text. /n {text}',
    input_variables=['text']
)

parser = StrOutputParser()

chain = template1 | model | parser | template2 | model | parser

result = chain.invoke({'topic':'Generative AI'})

print(result)

JsonOutputParser :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser

load_dotenv()


model = ChatOpenAI()

parser = JsonOutputParser()

template = PromptTemplate(
    template='Give me 5 facts about {topic} \n {format_instruction}',
    input_variables=['topic'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = template | model | parser

result = chain.invoke({'topic':'Generative AI'})

print(result)

# 👉 In PromptTemplate, there are two types of variables:

# 1. input_variables
# Provided at runtime
# Example:
# chain.invoke({'topic': 'Generative AI'})

# 2. partial_variables
# Provided at template creation time
# Automatically injected
# User does NOT pass them later
# ✅ So this line means:

# 👉 “Before running the chain, always fill {format_instruction} 
# with this value.”

Please see comments in the above program for detailed explanation about code. Main take away is, langchain has 2 types of variables i.e. input_variables, partial_variables.

input_variables will comes into picture during runtime as shown in code
partial_variables will be injected during template declaration phase itself

under template, format_instruction will set based on the parser
if we are using JsonOutputParser, then output will be set as a json/dict type
if we are using CSVOutputParser, then output will be set as a comma seperated

parser = JsonOutputParser()

template = PromptTemplate(
    template='Give me 5 facts about {topic} \n {format_instruction}',
    input_variables=['topic'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

As per above code, format_instruction will set based on the Parser (parser.get_format_instructions()). Incase if we need our own instructions, then we need to extend this class and override get_format_instructions() to write our own implementation.

We will extensively use this JSON output parser while building Agents.

StructuredOutputParser :

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StructuredOutputParser, ResponseSchema

load_dotenv()

# Define the model
model = ChatOpenAI()

schema = [
    ResponseSchema(name='fact_1', description='Fact 1 about the topic'),
    ResponseSchema(name='fact_2', description='Fact 2 about the topic'),
    ResponseSchema(name='fact_3', description='Fact 3 about the topic'),
]

parser = StructuredOutputParser.from_response_schemas(schema)

template = PromptTemplate(
    template='Give 3 fact about {topic} \n {format_instruction}',
    input_variables=['topic'],
    partial_variables={'format_instruction':parser.get_format_instructions()}
)

chain = template | model | parser

result = chain.invoke({'topic':'Generative AI'})

print(result)

This is like creating a table schema, which we need to create and pass while creating the instance for parser and shown in above code. It will create output in JSON format. We can use this to query the data from a DB, or we can dump this data to DBs like MangoDB, Cassandra, Oracle etc.

PydanticOutputParser : This class is extensively used to generate the output in a structured format to save/load in DB in real time. We just created a class called Person inheriting BaseModel to create a structured format with fields name, age & city of type string, float & string. We are using this class as a object type while creating the instance for PydanticOutputParser as shown in the below code.

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

load_dotenv()

# Define the model
load_dotenv()

# Define the model
model = ChatOpenAI()

class Person(BaseModel):

    name: str = Field(description='Name of the person')
    age: float = Field(gt=18, description='Age of the person')
    city: str = Field(description='Name of the city the person belongs to')

parser = PydanticOutputParser(pydantic_object=Person)

template = PromptTemplate(
    template='Generate the name, age and city of a fictional {place} person \n {format_instruction}',
    input_variables=['place'],
    partial_variables={'format_instruction':parser.get_format_instructions()}
)

chain = template | model | parser

final_result = chain.invoke({'place':'Iran'})

print(final_result)

Mostly we will use either JSON or Pydantic output parsers in real time.

5) Indexing

RAG - Retrieval Augmentation Generation

Retrieval
Augmentation
Generation

Let us say our projects data in the following sources.

DB, Files
Confluence, PPTs
Excel
Excelxs, html
JSON, web based etc.

Retrieval mechanism

Step1 : Data extraction from source.

Step2 : Chunking

Step3 : Convert chunks into Embeddings

Step4 : Store Embeddings into vector DB

Augmentation mechanism

User query will be split into chunks, converted into Embeddings. These embeddings will start searching for similarity in the vector DB. If match identified(This is called Similarity, Keyword search), we will get response. This response, along with user input, it sends back to LLM. This entire process is called Augmentation.

Generation mechanism

LLM will finally articulate the output. This is called Generation.

This entire process of Retrieval, Augmentation & Generation is called RAG.

Document Loaders : Document Loaders in RAG are utilities that help you load text data from various sources like PDF, CSV, URL etc. into a standard format (Document objects) for downstream processing like chunking, embedding, retrieval.

Why are document loaders important ? Unified format - All documents, no matter the source, are turned into document objects.

Metadata retention - You can retain source information like name, URL, author, page number etc.

Flexible ingestion - Load from local files, APIs, databases etc.

Example 1 : Text Loader

from langchain_core.document_loaders import TextLoader

# Load a simple text file
loader = TextLoader("example.txt")
documents = loader.load()

# Print loaded documents
for doc in documents:
    print("Content:", doc.page_content)
    print("Metadata:", doc.metadata)

loader is the instance for TextLoader class
load entire data using method loader.load()
printing content, metadata seperately

Example 2 : PDF Loader

from langchain_core.document_loaders import PyPDFLoader

# Load a PDF document
loader = PyPDFLoader("AttentionAllYouNeed.pdf")
documents = loader.load()

# Each page becomes a separate Document
for i, doc in enumerate(documents):
    print(f"Page {i+1} Content:\n{doc.page_content[:100]}...")
    print("Metadata:", doc.metadata)
    print("-" * 50)

For PyPDFLoader , for each page in the PDF file one object will be created. In the above .pdf file, we have 15 pages available. Hence 15 document objects will be created.

As per above code, for each page/document object 100 words/tokens will be displayed in the output.

Example 3 : Load web page using WebBaseLoader

from langchain_core.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://en.wikipedia.org/wiki/LangChain")
documents = loader.load()

for doc in documents:
    print("Page Content (excerpt):", doc.page_content[:200])
    print("Metadata:", doc.metadata)

Note : In real time, we have to write custom logic as most of the real time projects won't use Lang Chain. Just note that Document Loader are useful to load data from various sources.

When we implement a project, we will write custom logic to handle data extraction from all kinds of sources like Jira, Confluence, GitHub, DB etc. also we will see logic to handle all file formats, complex PDFs, with incremental data loading support.

Now, once data loading part is done, next step in RAG, especially retrieval part is chunking. For chunking we have Text Splitters in Lang Chain.

In a RAG system, data chunking is crucial because LLMs and embedding models have input token limits. Chunking ensures information is split meaningfully to preserve content and semantics while remaining within token constraints.

We have 3 types of splitting :

Character based
Word based
Token based - Subword, tiktoken will take care of it.

Chunking techniques

Fixed-size chunking (character based) :

def fixed_size_chunking(text, chunk_size=50):
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks

# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."

# Call the function
chunks = fixed_size_chunking(text, chunk_size=50)

# Display results
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(chunk)
    print("-" * 40)

Just printing chunks with size 50. But we are loosing the meaning of context as we are chunking character by character. This issue will be there for all types of fixed based chunking.

Fixed based chunking is a basic version of chunking technique. Because of this issue, we will use Semantic, LLM, Parent child, Document based chunking in real time.

Output :

Chunk 1: This is a simple example to demonstrate fixed size ---------------------------------------- Chunk 2: chunking. We split the text into chunks of equal ---------------------------------------- Chunk 3: length. ----------------------------------------

Fixed-size chunking (word based) :

def fixed_word_chunking(text, chunk_size=10):
    words = text.split()
    chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    return chunks

# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."

# Call the function
chunks = fixed_word_chunking(text, chunk_size=10)

# Display results
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(chunk)
    print("-" * 40)

Output :

Chunk 1: This is a simple example to demonstrate fixed size chunking. ---------------------------------------- Chunk 2: We split the text into chunks of equal length. ----------------------------------------

Fixed-size chunking (Token based with tiktoken) :

import tiktoken

def fixed_token_chunking(text, chunk_size=10):
    enc = tiktoken.get_encoding("cl100k_base")  # Use encoding for OpenAI models
    tokens = enc.encode(text)
    chunks = [tokens[i:i+chunk_size] for i in range(0, len(tokens), chunk_size)]
    return [enc.decode(chunk) for chunk in chunks]

# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."

# Call the function
chunks = fixed_token_chunking(text, chunk_size=10)

# Display results
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(chunk)
    print("-" * 40)

For token based chunking, we need to use a library/module called tiktoken. It has 2 methods, encode will convert word into token ID and decode covert token ID back into word.

cl100k_base is encoding algorithm for OpenAI models. Almost same code but we are just using encode and decode methods.

Output :

Chunk 1: This is a simple example to demonstrate fixed size chunk ---------------------------------------- Chunk 2: ing. We split the text into chunks of equal ---------------------------------------- Chunk 3: length. ----------------------------------------

But in all the above methods, we are seeing missing context. Hence overlap comes into picture. Observe below diagram.

Sliding window chunking chunks overlap by certain percentage of number of chunks.

Purpose : Preserve context between chunks

Parameters :

chunk_size : 512 tokens

overlap : 50 - 100 tokens

Great for minimizing context loss during splitting.

Example : Sliding window chunking(word based)

def sliding_window_chunking(text, chunk_size=10, overlap=3):
    words = text.split()
    step = chunk_size - overlap
    chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), step)]
    return chunks

# Sample text
text = "This is a simple example to demonstrate sliding window chunking. It helps preserve context between chunks by overlapping."

# Call the function
chunks = sliding_window_chunking(text, chunk_size=10, overlap=3)

# Display results
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(chunk)
    print("-" * 40)

Output :

Chunk 1: This is a simple example to demonstrate sliding window chunking. ---------------------------------------- Chunk 2: sliding window chunking. It helps preserve context between chunks by ---------------------------------------- Chunk 3: between chunks by overlapping. ----------------------------------------

Once chunking is done, we need to convert these chunks to Embeddings. Lets see embeddings process.

Implementing Embeddings :

from langchain.embeddings import OpenAIEmbeddings
from langchain_core.vectorstores import FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

# 1. Load a sample text file
loader = TextLoader("C:/Personal/2024/Learning/Generative AI/Agents_Practice/Langchain/5_Indexes/9_Document Loaders/example.txt")
documents = loader.load()

# 2. Split text into chunks
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

# 3. Initialize OpenAI Embeddings
embedding_model = OpenAIEmbeddings()

# 4. Create a FAISS vector store from documents using embeddings
vectorstore = FAISS.from_documents(docs, embedding_model)

# 5. Confirm the vectorstore is built
print("✅ FAISS vector store created with", len(docs), "chunks.")

Output :

C:\Users\anilk\AppData\Local\Temp\ipykernel_13952\1272361546.py:15: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import OpenAIEmbeddings``.
  embedding_model = OpenAIEmbeddings()

✅ FAISS vector store created with 2 chunks.

Instead of using custom logic, we are using existing class in langchain to chunk data and convert them into embeddings in the above code.

FAISS is one of the vector DB in langchain. We use it to store embeddings. In real time, we use our regular databases like mySQL, Mango DB etc for storing data into vector DB.

Note : This entire process of data extraction from source, converting them into chunks, and then into embedding and storing these embeddings into a vector DB is called Retrieval process in RAG. Some books referring this step is called as Indexing.

Retrieval & Indexing both are same. This is a basic RAG, we will see production grade RAG soon.

Augmentation :

As part of augmentation process, we need to do similarity search for user query in vector DB as implemented in the below code.

query = "What is LangChain?"
results = vectorstore.similarity_search(query, k=1)

for i, doc in enumerate(results):
    print(f"🔍 Result {i+1}:\n{doc.page_content}\n")

Output :

🔍 Result 1: LangChain is a powerful framework for building applications with language models. It provides abstractions and utilities to make LLM-powered apps easier to develop.

similarity_search() will convert the entire string into chunks(if needed - if string size > context window size), convert into embeddings and then do similarity search in the vector DB. We can use keyword search as well.

k=1 means, it will give top 1 result. If k=3, then it will print top 3 results.

As per above image, once it maches the similarity embeddings in vector DB, in addition with Query embeddings, it will send both Query and similarity search embeddings to LLM for processing. This is very important to understand.

Incase my requirement is some % need similarity search and rest of the % need keyword search ! Then it is called Hybrid Search.

Hybrid Search = Semantic Search + Keyword Search

For Keyword search, we have BM25 algorithm.

In real time, we will use vector DB's like Pine cone, Qdrant, Milvs etc. We will see more details going forward.

Retrievers

Implementing Vector Store Retriever:

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

# Step 1: Your source documents
documents = [
    Document(page_content="LangChain helps developers build LLM applications easily."),
    Document(page_content="Chroma is a vector database optimized for LLM-based search."),
    Document(page_content="Embeddings convert text into high-dimensional vectors."),
    Document(page_content="OpenAI provides powerful embedding models."),
]

# Step 2: Initialize embedding model
embedding_model = OpenAIEmbeddings()

# Step 3: Create Chroma vector store in memory
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embedding_model,
    collection_name="my_collection"
)

# Step 4: Convert vectorstore into a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

query = "What is Chroma used for?"
results = retriever.invoke(query)

for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)

--- Result 1 ---
Chroma is a vector database optimized for LLM-based search.

--- Result 2 ---
LangChain helps developers build LLM applications easily.

Note that we are still in Retrieval & Augmentation phase, haven't went into Generation phase to use LLM.

Also, as we are in learning phase, we are using Fixed Base chunking, open source LLMs, Chroma & vector DB. In real time, we use different mechanism altogether which we will see in future blogs.

6) Agents

Lets see one basic Agent Implementation

from langchain.agents import initialize_agent, AgentType
from langchain.agents import Tool
from langchain.chat_models import ChatOpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.llms import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Define a tool (SerpAPI for web search)
search = SerpAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Useful for answering questions about current events",
    ),
]

# LLM
llm = ChatOpenAI(temperature=0)

# Initialize agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

# Run
#response = agent.run("What's the latest news about generative AI?")
response = agent.run("""A company is designing a water tank shaped like a cylinder with a hemisphere on top (like a capsule). The cylindrical part has a height of 20 meters and a radius of 7 meters. The hemispherical dome on top has the same radius (7 meters).Compute the total volume of the tank.

If the tank is filled with water up to 80% of its capacity, calculate the volume of water stored.

Suppose the cost of painting the outer surface (only the curved cylinder + hemisphere, not the base) is $15 per square meter. Compute the total painting cost.

Provide both the exact symbolic answer (in terms of π) and the approximate numerical value rounded to two decimal places.""")
print(response)

In the above logic, we are using SerpAPIWrapper from Google APIs. We can also implement this with custom logic as below.

from langchain.agents import Tool, initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI

# Define a custom tool
def multiply_numbers(query: str) -> str:
    numbers = [int(x) for x in query.split() if x.isdigit()]
    return str(numbers[0] * numbers[1]) if len(numbers) >= 2 else "Need 2 numbers."

tools = [
    Tool(
        name="MultiplyTool",
        func=multiply_numbers,
        description="Multiplies two numbers given in a query like '3 and 4'",
    ),
]

# LLM
llm = ChatOpenAI()

# Agent
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

# Run
print(agent.run("multiply 3 4"))

Simply, we have implemented a tool and using the definition we created in it, and then calling this tool inside the agent.

Above agent output help us to understand how a agent think almost like a human being.

7) Memory

By default, LLMs(like GPT) are stateless - each prompt is processes independently. Memory stores context of conversation or task(chat history, summaries, embeddings, key-value pairs). This allows the model to maintain continuity, personalization, and context awareness across multiple turns.

Types of memory in LangChain :

1. ConversationBufferMemory - Stores full conversation history as plain text

2. ConversationBufferWindowMemory - Stores only last k interactions(Sliding window). Useful when prompt gets too large

3. ConversationSummaryMemory - Keeps a summarized version of chat history. Helps reduce token usage.

4. VectorStoreRetrieverMemory - Stores past conversations or knowledge in a vector database (FAISS, Pinecone, Chroma DB). Retrieves relevant past context using embeddings.

5. EntityMemory - Tracks facts about entities(Ex : User name, preferences etc.). Useful for personalization.

8) Callbacks/Tracing/Logging (with & without LangSmith)

That's all for this blog. See you in next blog.

Thank you for reading this blog !

Arun Mathe

DataSphere

Search This Blog

(AI #13) LangChain

Labels

Comments

Post a Comment

Popular posts from this blog

(AI #1) Deep Learning and Neural Networks

AWS : Working with Lambda, Glue, S3/Redshift

Spark Core : Understanding RDD & Partitions in Spark