Agentic AI refers to AI systems that can act towards a goal, rather than just responding to a single prompt. Agentic AI can plan, decide, act and iterate to complete tasks with minimal human input. Before digging deeper into required tools, lets understand what are the best tools which we need to design, build, deploy & maintain AI agents in detail.
We need to understand the below important aspects of agents before discussing further topics. Think of agent system like a company :
- Root Agent - CEO (decides overall goal & strategy)
- Sub Agent - Managers (handles specific tasks)
- Tool - Employees/ Utilities (do actual work)
One root agent can be associated with multiple sub-agents, similarly one sub-agent will be associated with multiple tools. Tool could be a API call, RAG system or a DB etc. We will see more details further.
We need a set of tools or frameworks to build Agentic AI systems as below.
- Lang chain (Lang Chain is a framework and below are sub-systems in it)
- Lang chain (Lang chain is also a sub-system)
- Lang Graph
- Lang Smith (meant for observability - logging & debugging)
- Lang Graph Cloud
In production environments, we will integrate Lang Graph with Lang Smith and Lang Smith will start logging information. Suppose if I want to deploy this Lang Graph into a cloud environment(Azure), we can use Lang Graph cloud.
Lang represents Large Language Models, chain means along with LLMs, we need to integrate others aspects as below.
LLM is not sufficient to build an application. We should aware of below integrations :
- Text Processing
- Text Splitting
- Retrieval
- Memory
- Indexing
- Output Formats
- Runnables
- Agents
Let us consider, we need to build an agent to process input data, need LLM as a thinking engine and convert it into a JSON format. We need a chaining system to connect all these individual components and we need LLMs to complete this functionality. This is the reason we are calling it as LangChain.
Alternate Tools for Lang Chain : Each having its own pros & cons, but Lang Chain is production ready.
- Microsoft AutoGen
- Crew AI
- AutoGPT
- Semantic Kernel
- Lang Graph
- Smolagents etc.
- Google ADK (Agent Development Kit)
Note : We are going discuss about following topics in future blogs. We need to think about all these aspects while designing a agent. But the thinking engine is always LLM.
- Lang Chain
- Lang Graph
- Context Engineering
- Prompt Engineering - Production grade
- RAG
- Text Splitters/Chunking
- Embeddings
- Retrievers
- Vector DB
- Re-Ranking (it will set similarity score to decide source of truth)
- Cross Encoders
- Filtering (to restrict data from unauthorized access)
- Retriever Techniques
- RAG + Prompt Engineering
- ICL (In-context learning)
What if a model is not returning expected results ? When a model is not giving accurate answers as expected, then changing the model is not always the correct solution. We need to understand the underlying problems.
- LLM is not performing as expected ? then we need to do
- PEFT - Parameter Efficient Fine Tuning(LoRA, QLoRA)
- FFT - Full Fine Tuning
- ICL - In-Context learning
- Prompt Engineering
- zero/one/few short prompt
- chain of thought
- tree of thoughts
- Above techniques will prevent hallucination, and increase accuracy
Observability : Below techniques used to keep track of Agentic AI application. We will see Lang Smith & also custom logic.
- Lang Smith
- Custom Logic
- Grafana
- Watch Dog
- Opik
- Open Telemetry
Agentic AI design patterns are very important for building Agentic AI applications. It is very difficult to design the application from scratch without understand Design Patterns.
Security : We will discuss about Guard Rails.
- Input & Output Guard Rails
Finally we will discuss about Deployment & Operations of AI applications. This is full stack Agentic AI.
Production Tools :
Generally, for a Data Engineer, it will take months to build a proper pipeline end-to-end and to deploy it in production which includes design, development, testing, configurations etc. but if we use production tools and create a accurate prompt to fulfil same work, it will just take 1 day to complete this activity ! This is fact and we will see this in Agentic AI development.
In current market,
- Anthropic Claude Code will help to achieve this task (but prompt should be very accurate)
And our responsibility is just to test the code. Using codex 5.x we can generate a website within a day. Similarly, we can use Cursor AI, GitHub copilot, Roocode as well to do these type of activities.
Going forward, it is predicted that Agentic AI jobs will be on productivity tools only. But we should be aware of whether this code is correct or not by knowing all the above techniques. This will increase the productivity and reduce dependency on developers.
Langchain
LangChain is a framework used to build applications powered by Large Language Models. LangChain helps you connect LLMs with tools, data, and workflows to build real world AI apps.
LLMs alone can :
- Answer questions
- Generate text
But they can't directly :
- Access your database
- Call your API's
- Remember past conversations
- Perform multi step reasoning
LangChain solves this by acting as a bridge & orchestrator. LangChain documentation is available at official website https://docs.langchain.com/
Real time use case :
Consider we implemented a POC for building an AI Agent using Agentic AI, and we have used OpenAI LLM model for this application. Once POC is completed, client recommended to use LLMs from either Google Gemini/Anthropic Claude! Now it is very complex to rebuild entire application using different LLMs as they have different standards. To fix this issue, LangChain came up with wrappers. Using these wrappers, even if we build POC using LangChain_OpenAI wrapper, incase if we have to change to Gemini then simply use LangChain_Google_Gemini (instead of OpenAI wrapper). As simple as that.
LangChain Ecosystem : It is important to note that LangChain is the core engine. On the top of LangChain they have built below libraries/modules.
- LangChain_Core - serving LLMs, Chat Models, Prompt Templates, Runnable, Output Parsers
- LanChain_Community - APIs, Vector DBs, Data Loaders, Tools (deal with external sources)
- LangChain_Experimental - New Features like incremental data loading etc. (may not ready for production - need to read documentation to decide on readiness for production)
Above information is very important for developers to select on libraries while building AI Agents.
Core components of Lang Chain
1) Models - Models are of three types as below :
- LLM
- Chat Model
- Embedding Model
LLM :
- Traditional or old model
- Input & Output both are text/string
- This is deprecated, no one using it now.
Chat Model :
- To understand chat model, we must aware of :
- System Message - Instruction to model (Ex : You are a helpful assistant user chat bot )
- User Message - User question (Ex : What is the capital of India ? )
- AI Message - Response from model (Ex : Capital of India is New Delhi )
Embedding Model :
- An embedding model converts text into numbers (vectors) that captures its meaning (Semantics).
- Instead of generating text like an LLM, it creates a mathematical representation of text as below.
- "I love dogs" will be converted into some vector like [0.21, -0.45, 0.88, ..., 0.12]
- Embeddings will help when we are constructing a RAG application
Sample code for Embedding Model as below :
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
##https://platform.openai.com/docs/guides/embeddings
load_dotenv()
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
result = embedding.embed_query("Delhi is the capital of India")
print(len(result))
Please refer following open ai documentation if you need more information about above embedding model : https://developers.openai.com/api/docs/models/text-embedding-3-large
Also note that, I will use open AI models going forward. Hence we need open AI key. We can generate it here : https://platform.openai.com/api-keys (Please recharge at least $5 and we can use it for learning AI. Do not share that key with others !)
Incase if you are not comfortable generating open AI key, then we can use a free keys from HuggingFace : https://huggingface.co/
Remember accuracy of the model will be more for Open AI as they are closed source models.
Please see below image to understand the flow of a RAG(retrieval part, not full flow of RAG). We have to divide the input files into chunks to fit context_window size and then convert them into Embeddings to store in Vector DB.
Vector DB could be anything, FIASS DB, Mango DB, MySQL, Oracle etc. We will see more implementation in future blogs of Agentic AI.
Where to keep API keys of models while programming ? Put them under .env file under your project directory. We should not hardcode them. We will be charged per token by model based on this API key. If you hardcode, it will be public and anyone can use your key. In the run time, we will call this .env file and use these API keys. Below code does this work.
from dotenv import load_dotenv
load_dotenv()
Lets see some code !!
Implementation of LLM model :
from langchain_openai import OpenAI #Wrapper for openAI from langchain
from dotenv import load_dotenv # dotenv helps to import load_dotenv
load_dotenv() # initiating .env file load process
llm = OpenAI(model='gpt-3.5-turbo-instruct')
# Created instance with name as llm for OpenAI class by passing model name
result = llm.invoke("What is the capital on India ?")
# calling llm.invoke() to send request & get response from model
print(result)
We can see all the available GPT models from this official website of OpenAI : https://developers.openai.com/api/docs/models/all
Output :
Now this program is working as a ChatGPT. Even if we ask same question in chatGPT, we will get same response. Only difference is, current version on chatGPT is using GPT5.2 but in the above program we are using older version.
Implementation of Chat model :# importing ChatOpenAI as we are using Chat Model
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
# Setting temperature=0 for deterministic output
# If we need creativity, increase the value like temperature = 1.5
llm = ChatOpenAI(model='gpt-4', temperature=0)
result = llm.invoke("Suggest 5 indian names")
# It will return JSON. Hence using correct key 'content' from result which has names
associated with it
print(result.content)
Please see comments in the code for more information.
Difference between LLM & Chat model is the class name and the return type. LLM model returns text/string but Chat model return JSON type.
Implementation of Embedding model :
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
load_dotenv()
embedding = OpenAIEmbeddings(model='text-embedding-3-large')
result =embedding.embed_query("Delhi is the captial of India")
print(len(result))
Output :
3072
Note : Try to print just result,
then you will see embedding vector in 3072 values/dimentions
- We will use OpenAIEmbeddings class from langchain_openai wrapper
- While developing the RAG, after extracting data from input, it will be converted into chunks, then into vector DB. Here we use Embeddings.
- Note that we used embed_query() to convert single query into vector/embeddings.
In the above code, we have used only one query i.e. "Delhi is the capital of India" ; Lets see how to convert multiple queries into Embeddings using Embedding model as below.
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
##https://platform.openai.com/docs/guides/embeddings
load_dotenv()
documents = [
"Delhi is the capital of India",
"Hyderabad is the capital of Telangana",
"Paris is the capital of France"
]
embedding = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=10)
result = embedding.embed_documents(documents)
print(str(result))
Output : Note, it printed embeddings for 3 input queries.
[[-0.28195515275001526, 0.47907423973083496, -0.012972092255949974, 0.778949320316314, -0.01074627973139286, 0.20619246363639832, -0.048287373036146164, 0.1165928840637207
, -0.1593511700630188, 0.010469825007021427],
[0.2670007646083832, 0.5190368890762329
, -0.07652971148490906, 0.09313521534204483, 0.05723319947719574, 0.1342223584651947
, 0.08466838300228119, -0.15410958230495453, -0.4568154513835907, -0.6195887923240662],
[-0.22880549728870392, 0.678465723991394, -0.07834821194410324, 0.4880889356136322
, -0.2670133709907532, -0.12555591762065887, 0.2731972932815552, -0.1368194967508316
, -0.01236096303910017, -0.24978668987751007]]
- Note that we have used embed_documents() here to convert input queries into Embeddings/vector.
- Also, for simplicity we have controlled total dimensions to 10 (from 3076 ) - otherwise above print statement would have printed 3076 dimension vector which is the size of "text-embedding-3-large" model
- Important thing to remember : controlling the dimensions will control the cost, in real time, try to run multiple times with lower no. of dimensions and check the accuracy/groundness of response, if we are getting a good response as expected with lower number of dimensions then set that value for dimensions permanently. It will reduce the cost a lot.
2) Prompts - We are going to discuss about basic - intermediate level of Prompts in this blog. But I will write production grade prompt in upcoming blog with real time use cases.
Implementation of Static Prompt :
# Not flexible for dynamic user input.
# Cannot adapt to changing conversation flow.
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
# It will consider default model(latest model), if we don't mention model name
model = ChatOpenAI()
result = model.invoke("Explain langchain?")
print(result.content)
Output :
LangChain is a decentralized open-source blockchain project that aims to revolutionize
the language service industry. It uses Artificial Intelligence and blockchain
technology to translate language. The main goal of LangChain is to reduce the cost of
language services and provide fair share of profits to participating translators.
It includes features like trustable translation, personalized learning,
and secured data. The platform typically uses LangCoin tokens to support
its ecosystem and facilitate transactions.
- Above example represents a Static Prompt
- We can't change that prompt at run time
- This is not a correct approach, we are learning incorrect approaches before landing at correct place!
Implementation of Dynamic Prompt :
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI()
prompt = PromptTemplate(
input_variables=["topic"],
template="Explain {topic} in simple terms."
)
formatted_prompt = prompt.format(topic="Agentic AI")
print(formatted_prompt)
print("++++++++++++++++++++++++++++++++++++++++")
result = model.invoke(formatted_prompt)
print(result.content)
Explain Agentic AI in simple terms.
++++++++++++++++++++++++++++++++++++++++
Agentic AI refers to artificial intelligence systems that are designed to act
autonomously and make decisions on behalf of users, without direct input or
intervention. These systems can perform tasks such as gathering and analyzing data,
making recommendations, and controlling systems or devices, all without human
intervention. Agentic AI is a powerful tool that can help automate complex
processes and improve efficiency in a wide range of industries.
- For dynamic prompt, we use a class called PromptTemplate from core engine langchain_core.prompts
- Observe the instance for PromptTemplate
- We have a input_variables list and a template
- Instead of hardcoding, we are dynamically sending the topic
In real time, we use PromptTemplate for generating the prompt.
Implementation of Dynamic prompt by randomly selecting templates from a list of templates :
# 💡 Definition:
# A flexible and adaptive prompt that changes based on user input, external data, or context.
# 📌 Characteristics:
# - The prompt adjusts dynamically.
# - Uses real-time variables (user input, API results, etc.).
# - Good for personalized, multi-step conversations.
import random
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv # ✅ Updated import (newer LangChain versions)
load_dotenv()
model = ChatOpenAI()
# Define multiple prompt templates
templates = [
"Summarize {topic} in one paragraph.",
"Give a brief explanation of {topic}.",
"Explain {topic} as if I am a house wife."
]
# Randomly select a template
selected_template = random.choice(templates)
print("🔹 Selected Template:", selected_template)
# Create a PromptTemplate object
prompt = PromptTemplate(
input_variables=["topic"],
template=selected_template
)
# Format the prompt with a specific topic
formatted_prompt = prompt.format(topic="Artificial Intelligence")
print("✅ Formatted Prompt:", formatted_prompt)
result = model.invoke(formatted_prompt)
print("✅ Result:", result.content)
We have randomly selected the prompt from a list of templates in the above example.
Implementing context aware chat prompt :
# 📌 Use Cases:
# ✔️ Conversational AI & Chatbots (context-aware prompts).
# ✔️ Adaptive Question Answering (modifies based on previous responses).
# ✔️ Personalized User Interactions (e.g., changing prompts based on user profiles).
# ❌ Limitations:
# More complex to implement than static prompting.
# Requires external logic (e.g., history tracking).
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_classic.schema import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a sales assistant."),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{question}")
])
formatted_messages = chat_prompt.format_messages(
chat_history=[
HumanMessage(content="Suggest a destination for a summer vacation."),
AIMessage(content="How about Bali, Indonesia? It's great for summer!")
],
question="What are the best activities to do there?"
)
llm = ChatOpenAI(model='gpt-4')
result = llm.invoke(formatted_messages)
print(result.content)
Explanation :
- This code is building a chat based AI assistant with memory(chat history) and sending it to LLM
- Flow :
- Load environmental variables
- Create a prompt template
- Inject chat history + new question
- Send to LLM
- Print response
- from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
- ChatPromptTemplate - Helps structure chat prompts (system + user + history)
- MessagesPlaceholder - Allow dynamic insertion of chat history
- from langchain_classic.schema import AIMessage, HumanMessage
- HumanMessage - represents user input
- AIMessage - represents model reponse
- Above 2 classes simulate real conversation
- from langchain_openai import ChatOpenAI
- Connects to OpenAI chat model
- Prompt template creation :
MessagesPlaceholder(variable_name="chat_history") - This is dynamic memory injection. At run time, this will be replaced with past conversation
("human", "{question}") - {question} is a variable that will be filled later
- Formatting the prompt :
chat_history=[
HumanMessage(content="Suggest a destination for a summer vacation."),
AIMessage(content="How about Bali, Indonesia? It's great for summer!")
] - This simulates, User - Suggest a destination; AI - Bali
- New question :
- question="What are the best activities to do there?"
- Final prompt sent to LLM :
- System: You are a sales assistant.
- User: Suggest a destination for a summer vacation.
- AI: How about Bali, Indonesia? It's great for summer!
- User: What are the best activities to do there?
- This is why the model understands “there” = Bali
- Initialize LLM
- llm = ChatOpenAI(model='gpt-4')
- This is a chat model, not a simple LLM
- Invoke the model
- Output
Key Concepts we should take away :
- PromptTemplates - Reusable structure for AI inputs
- Chat History = Memory
- Message Types
- System - behavior
- Human - user input
- AI - model response
- Context Awareness
- Because of history, "there" understood as Bali
Implementing a prompt template with multiple variables :
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI()
# Define a template with multiple placeholders
template = PromptTemplate(
input_variables=["name", "hobby"],
template="Hello {name}! I heard you like {hobby}. Can you tell me more about it?"
)
# Format with different values
formatted_prompt = template.format(name="Anil", hobby="cricket")
print(formatted_prompt)
print("*************************************")
result = model.invoke(formatted_prompt)
print(result.content)
We have already seen this code, only change is that we re handling multiple variables for prompt in this case.
Implementation of Few Shot Prompting with PromptTemplate :
from langchain_core.prompts import FewShotPromptTemplate
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI()
# Define examples
examples = [
{"input": "Explain AI", "output": "AI is the simulation of human intelligence in machines."},
{"input": "Explain Blockchain", "output": "Blockchain is a decentralized digital ledger."}
]
# Define an example template
example_template = PromptTemplate(
input_variables=["input", "output"],
template="Q: {input}\nA: {output}"
)
# Create Few-Shot PromptTemplate
few_shot_prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_template,
prefix="Answer the following questions:",
suffix="Q: {question}\nA:",
input_variables=["question"]
)
# Format the prompt
formatted_prompt = few_shot_prompt.format(question="Explain Data Science")
print(formatted_prompt)
result = model.invoke(formatted_prompt)
print(result.content)
Output :
Answer the following questions:
Q: Explain AI
A: AI is the simulation of human intelligence in machines.
Q: Explain Blockchain
A: Blockchain is a decentralized digital ledger.
Q: Explain Data Science
A:
Data Science is the study of data, involving the collection, analysis,
interpretation, and presentation of large amounts of data to gain insights
and make decisions.
Implementing a chat bot in a simplified way :
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI()
chat_history = [
SystemMessage(content='You are a helpful AI assistant')
]
while True:
user_input = input('You: ')
chat_history.append(HumanMessage(content=user_input))
if user_input == 'exit':
break
result = model.invoke(chat_history)
chat_history.append(AIMessage(content=result.content))
print("AI: ",result.content)
print(chat_history)
See, how simple is implementing a chat bot !
Note : We have covered Models, Prompts concepts. We will dig deeper into these in future. To dig deep, we need basics. Hence covered the basics. We have hardcoded all the prompts in code but in real time, we will get them at run time from source. We will see clearly about it.
What we have discussed so far in this blog ?
- LangChain
- Models
- LLM Model
- Chat Model
- Embedding Model
- Prompts
- Static Prompt
- Dynamic Prompt using PromptTemplate, MessagePlaceholder, AIMessages, HumanMessages, SystemMessages, FewShotPromptTemplate
We will continue with remaining topics of Lang Chain. Pending topics are as below.
- Chains
- Output Parsers
- Indexes
- Data Loader, Spliting, Embedding, Vector DB, Retriever
- Memory
- Agents
- Callbacks (Logging & Monitoring)
Next important concept is called Chains or Runnable or LCEL.
3) Chains/Runnable/LCEL(LangChain Expression Language)
When we give a prompt to LLM, it will process and give us back response in some format, lets say JSON in this case. Prompt, LLM, Response are 3 different entities connected using a concept called Chains. Using chains, we will integrate one step to another in a sequential/parallel manner.In LangChain, how to represent a chain ? By using pipe symbol '|' , it represents the connection of different entities.
Example : Prompt | LLM | Output
Implementation of Simple Chain :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
model = ChatOpenAI()
prompt = PromptTemplate(
template= "Generate 2 important topics about {topic}",
input_variables=["topic"]
)
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"topic":"what is chains in langchain"})
print(result)
Note that we have implemented StrOutputParser class and included it while creating chain. Pipe symbol represents Chains in langchain.
Implementation of a Sequential Chain :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
model = ChatOpenAI()
prompt_1 = PromptTemplate(
template= "Generate a detailed report about a {topic}",
input_variables=["topic"]
)
prompt_2 = PromptTemplate(
template= "create a top 5 interview questions {topic}",
input_variables=["topic"]
)
parser = StrOutputParser()
chain = prompt_1 | model | parser | prompt_2 | model | parser
result = chain.invoke({"topic":"challenges with deep neural network"})
print(result)
chain.get_graph().print_ascii()
Explanation :
- Instead of creating one prompt, we have created 2 prompts
- While creating a chain, we have sequentially chained prompt_1 followed by prompt_2
- During the execution of chain, output from first parser will act as input context to prompt_2
Implementing a Parallel chain :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnableParallel
load_dotenv()
model = ChatOpenAI()
prompt_1 = PromptTemplate(
template= "Generate a detailed report about a \n {topic}",
input_variables=["topic"]
)
prompt_2 = PromptTemplate(
template= "create a top 5 interview questions \n {topic}",
input_variables=["topic"]
)
prompt_3 = PromptTemplate(
template= " merge the provided detailed report and 5 interview questions into a single document \n notes -> {notes} and quiz -> {quiz}",
input_variables=["notes", "quiz"]
)
parser = StrOutputParser()
parallel_chain = RunnableParallel({
"notes": prompt_1 | model | parser,
"quiz" : prompt_2 | model |parser
}
)
merge_chain = prompt_3 | model | parser
chain = parallel_chain | merge_chain
topic = """
Large Language Models (LLMs) are advanced artificial intelligence models designed to process and generate human-like text based on vast amounts of training data. These models, such as OpenAI’s GPT series, Google’s Gemini, and Meta’s LLaMA, use deep learning techniques, particularly transformer architectures, to understand and generate contextually relevant responses. LLMs power a wide range of applications, including chatbots, content creation, code generation, and research assistance. They excel in natural language understanding and generation, making them valuable for automating tasks that require linguistic intelligence. However, they also pose challenges, such as bias, misinformation, and high computational requirements, necessitating careful deployment and ethical considerations.
"""
result = chain.invoke({"topic":topic})
print(result)
chain.get_graph().print_ascii()
Explanation :
- We have created 3 prompts each with a separate prompt template
- 1st, 2nd prompts are straight forward as we can see
- 3rd prompt combines 1st, 2nd prompts and creating a template combining both
- Created parallel_chain instance for class RunnableParallel which is of type dict
- It has 2 keys i.e. notes, quiz with values as a separate chain
- values will the output from parser output
- Created first chain merge_chain = prompt_3 | model | parser
- Created second chain, chain = parallel_chain | merge_chain
- It will have detailed report from 1st prompt and 5 interview questions from 2nd prompt
Main take away from this implementation is using RunnableParallel class to create parallelism.
Lets understand RunnableLambda :
RunnableLambda means wrap my python function so LangChain can use it pipelines.
# 1. RunnableLambda
# A RunnableLambda allows wrapping a simple Python function into a
# LangChain-compatible Runnable.
from langchain_core.runnables import RunnableLambda
# Define a simple function
def reverse_string(s: str) -> str:
return s[::-1]
# Convert it into a Runnable
runnable = RunnableLambda(reverse_string)
# Execute
print(runnable.invoke("LangChain")) # Output: "niahCgnaL"
Explanation :
- Remember that we are talking about Chains, Runnable & LCEL. Hence everything must be a type of Runnable object to use it in runnable pipelines
- Above code shows how to convert a simple definition into a Runnable so that we can use in in chains.
- RunnableLambda class help us to achieve this task.
- We have a Lambda function to reverse a string, which we have converted into of type RunnableLambda and used it to reverse a string "LangChain"
- We haven't used any model here. This explanation is all about converting a simple lambda function into a Runnable
RunnablePassthrough : Generally in python, if we don't want to define any code inside a function, then we simply use 'pass'. Similarly, in chains we have a class called RunnablePassthrough to achieve this task as below. We will see why we need this in later stage.
# 2. RunnablePassthrough
# A RunnablePassthrough is a basic implementation that returns
# the input as output without any modifications. It is useful when
# integrating components that do not require processing at a certain stage.
from langchain_core.runnables import RunnablePassthrough
runnable = RunnablePassthrough()
print(runnable.invoke("Hello, World!")) # Output: "Hello, World!"
RunnableParallel : This class will be useful to run multiple Runnable components in parallel and it outputs a dictionary of results.
# 3. RunnableParallel
# RunnableParallel allows running multiple Runnable components in parallel,
and it outputs a dictionary of results.
from langchain.schema.runnable import RunnableParallel, RunnableLambda, RunnablePassthrough
# Define multiple runnables
# uppercase_runnable = RunnableLambda(lambda x: x.upper())
# reverse_runnable = RunnableLambda(lambda x: x[::-1])
# Run them in parallel
parallel_runnable = RunnableParallel({
"uppercase": RunnableLambda(lambda x: x.upper()),
"reverse": RunnableLambda(lambda x: x[::-1]),
"same data": RunnablePassthrough()
})
print(parallel_runnable.invoke("LangChain"))
# Output: {'uppercase': 'LANGCHAIN', 'reverse': 'niahCgnaL'}
Program is self explanatory, we are using RunnableParallel class to create a dictionary of RunnableLambda classes, each of its own implementation and all the RunnableLambda class outputs will be associated with keys like uppercase, reverse, same data.
RunnableMap : We don't have a class called RunnableMap, but we can achieve it using RunnableLambda class itself.
# 4. RunnableMap
# A RunnableMap works like RunnableParallel but applies a single Runnable to each element of an input list.
from langchain_core.runnables import RunnableLambda
# Define a RunnableLambda that applies upper() to each element in a list
uppercase_runnable = RunnableLambda(lambda x: [word.upper() for word in x])
# Invoke with a list of strings
print(uppercase_runnable.invoke(["hello", "world"]))
# Output: ['HELLO', 'WORLD']
RunnableBranch :
# 6. RunnableBranch
# A RunnableBranch allows conditional branching, executing different Runnable components based on a condition.
from langchain_core.runnables import RunnableBranch,RunnableLambda
# Define different functions
uppercase = RunnableLambda(lambda x: x.upper())
reverse = RunnableLambda(lambda x: x[::-1])
default = RunnableLambda(lambda x: f"Unknown: {x}")
# Create a branch
branch = RunnableBranch(
(lambda x: "uppercase" in x, uppercase),
(lambda x: "reverse" in x, reverse),
default # Default branch
)
print(branch.invoke("uppercase me")) # Output: "UPPERCASE ME"
print(branch.invoke("reverse me")) # Output: "em esrever"
print(branch.invoke("something else")) # Output: "Unknown: something else"
Another example of RunnableBranch :
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from dotenv import load_dotenv
from langchain_core.runnables import RunnableSequence, RunnableParallel, RunnablePassthrough, RunnableBranch, RunnableLambda
load_dotenv()
prompt1 = PromptTemplate(
template='Write a detailed report on {topic}',
input_variables=['topic']
)
prompt2 = PromptTemplate(
template='Summarize the following text \n {text}',
input_variables=['text']
)
model = ChatOpenAI()
parser = StrOutputParser()
report_gen_chain = prompt1 | model | parser
branch_chain = RunnableBranch(
(lambda x: len(x.split())>300, prompt2 | model | parser),
RunnablePassthrough()
)
final_chain = RunnableSequence(report_gen_chain, branch_chain)
print(final_chain.invoke({'topic':'IPL 2025'}))
final_chain.get_graph().print_ascii()
Explanation :
- Take away logic
- RunnableBranch have a condition which is working as
- if the topic from prompt1 is > 300 words after splitting based on condition, then only it will execute prompt2, else it will just RunnablePassthrough
3) Output Parsers
Output parsers are used to instruct the LLM to write data in a specific format. We have below types of output parsers which we can use extensively.
- String output parser
- JSON output parser
- Structured output parser
- Pydantic output parser
- CSV output parser
StrOutputParser :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
model = ChatOpenAI()
# 1st prompt -> detailed report
template1 = PromptTemplate(
template='Write a detailed report on {topic}',
input_variables=['topic']
)
# 2nd prompt -> summary
template2 = PromptTemplate(
template='Write a 5 line summary on the following text. /n {text}',
input_variables=['text']
)
parser = StrOutputParser()
chain = template1 | model | parser | template2 | model | parser
result = chain.invoke({'topic':'Generative AI'})
print(result)
JsonOutputParser :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
load_dotenv()
model = ChatOpenAI()
parser = JsonOutputParser()
template = PromptTemplate(
template='Give me 5 facts about {topic} \n {format_instruction}',
input_variables=['topic'],
partial_variables={'format_instruction': parser.get_format_instructions()}
)
chain = template | model | parser
result = chain.invoke({'topic':'Generative AI'})
print(result)
# 👉 In PromptTemplate, there are two types of variables:
# 1. input_variables
# Provided at runtime
# Example:
# chain.invoke({'topic': 'Generative AI'})
# 2. partial_variables
# Provided at template creation time
# Automatically injected
# User does NOT pass them later
# ✅ So this line means:
# 👉 “Before running the chain, always fill {format_instruction}
# with this value.”
Please see comments in the above program for detailed explanation about code. Main take away is, langchain has 2 types of variables i.e. input_variables, partial_variables.
- input_variables will comes into picture during runtime as shown in code
- partial_variables will be injected during template declaration phase itself
- under template, format_instruction will set based on the parser
- if we are using JsonOutputParser, then output will be set as a json/dict type
- if we are using CSVOutputParser, then output will be set as a comma seperated
parser = JsonOutputParser()
template = PromptTemplate(
template='Give me 5 facts about {topic} \n {format_instruction}',
input_variables=['topic'],
partial_variables={'format_instruction': parser.get_format_instructions()}
)
As per above code, format_instruction will set based on the Parser (parser.get_format_instructions()). Incase if we need our own instructions, then we need to extend this class and override get_format_instructions() to write our own implementation.
We will extensively use this JSON output parser while building Agents.
StructuredOutputParser :
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StructuredOutputParser, ResponseSchema
load_dotenv()
# Define the model
model = ChatOpenAI()
schema = [
ResponseSchema(name='fact_1', description='Fact 1 about the topic'),
ResponseSchema(name='fact_2', description='Fact 2 about the topic'),
ResponseSchema(name='fact_3', description='Fact 3 about the topic'),
]
parser = StructuredOutputParser.from_response_schemas(schema)
template = PromptTemplate(
template='Give 3 fact about {topic} \n {format_instruction}',
input_variables=['topic'],
partial_variables={'format_instruction':parser.get_format_instructions()}
)
chain = template | model | parser
result = chain.invoke({'topic':'Generative AI'})
print(result)
This is like creating a table schema, which we need to create and pass while creating the instance for parser and shown in above code. It will create output in JSON format. We can use this to query the data from a DB, or we can dump this data to DBs like MangoDB, Cassandra, Oracle etc.
PydanticOutputParser : This class is extensively used to generate the output in a structured format to save/load in DB in real time. We just created a class called Person inheriting BaseModel to create a structured format with fields name, age & city of type string, float & string. We are using this class as a object type while creating the instance for PydanticOutputParser as shown in the below code.
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
load_dotenv()
# Define the model
load_dotenv()
# Define the model
model = ChatOpenAI()
class Person(BaseModel):
name: str = Field(description='Name of the person')
age: float = Field(gt=18, description='Age of the person')
city: str = Field(description='Name of the city the person belongs to')
parser = PydanticOutputParser(pydantic_object=Person)
template = PromptTemplate(
template='Generate the name, age and city of a fictional {place} person \n {format_instruction}',
input_variables=['place'],
partial_variables={'format_instruction':parser.get_format_instructions()}
)
chain = template | model | parser
final_result = chain.invoke({'place':'Iran'})
print(final_result)
Mostly we will use either JSON or Pydantic output parsers in real time.
RAG - Retrieval Augmentation Generation
- Retrieval
- Augmentation
- Generation
Let us say our projects data in the following sources.
- DB, Files
- Confluence, PPTs
- Excel
- Excelxs, html
- JSON, web based etc.
Retrieval mechanism
Step1 : Data extraction from source.
Step2 : Chunking
Step3 : Convert chunks into Embeddings
Step4 : Store Embeddings into vector DB
Augmentation mechanism User query will be split into chunks, converted into Embeddings. These embeddings will start searching for similarity in the vector DB. If match identified(This is called Similarity, Keyword search), we will get response. This response, along with user input, it sends back to LLM. This entire process is called Augmentation.
Generation mechanismLLM will finally articulate the output. This is called Generation.
This entire process of Retrieval, Augmentation & Generation is called RAG.
Document Loaders : Document Loaders in RAG are utilities that help you load text data from various sources like PDF, CSV, URL etc. into a standard format (Document objects) for downstream processing like chunking, embedding, retrieval.
Why are document loaders important ? Unified format - All documents, no matter the source, are turned into document objects.
Metadata retention - You can retain source information like name, URL, author, page number etc.
Flexible ingestion - Load from local files, APIs, databases etc.
Example 1 : Text Loader
from langchain_core.document_loaders import TextLoader
# Load a simple text file
loader = TextLoader("example.txt")
documents = loader.load()
# Print loaded documents
for doc in documents:
print("Content:", doc.page_content)
print("Metadata:", doc.metadata)
- loader is the instance for TextLoader class
- load entire data using method loader.load()
- printing content, metadata seperately
Example 2 : PDF Loader
from langchain_core.document_loaders import PyPDFLoader
# Load a PDF document
loader = PyPDFLoader("AttentionAllYouNeed.pdf")
documents = loader.load()
# Each page becomes a separate Document
for i, doc in enumerate(documents):
print(f"Page {i+1} Content:\n{doc.page_content[:100]}...")
print("Metadata:", doc.metadata)
print("-" * 50)
For PyPDFLoader , for each page in the PDF file one object will be created. In the above .pdf file, we have 15 pages available. Hence 15 document objects will be created.
As per above code, for each page/document object 100 words/tokens will be displayed in the output.
Example 3 : Load web page using WebBaseLoader
from langchain_core.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://en.wikipedia.org/wiki/LangChain")
documents = loader.load()
for doc in documents:
print("Page Content (excerpt):", doc.page_content[:200])
print("Metadata:", doc.metadata)
Note : In real time, we have to write custom logic as most of the real time projects won't use Lang Chain. Just note that Document Loader are useful to load data from various sources.
When we implement a project, we will write custom logic to handle data extraction from all kinds of sources like Jira, Confluence, GitHub, DB etc. also we will see logic to handle all file formats, complex PDFs, with incremental data loading support.
Now, once data loading part is done, next step in RAG, especially retrieval part is chunking. For chunking we have Text Splitters in Lang Chain.
In a RAG system, data chunking is crucial because LLMs and embedding models have input token limits. Chunking ensures information is split meaningfully to preserve content and semantics while remaining within token constraints.
We have 3 types of splitting :
- Character based
- Word based
- Token based - Subword, tiktoken will take care of it.
Chunking techniques
Fixed-size chunking (character based) :
def fixed_size_chunking(text, chunk_size=50):
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
return chunks
# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."
# Call the function
chunks = fixed_size_chunking(text, chunk_size=50)
# Display results
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:")
print(chunk)
print("-" * 40)
Just printing chunks with size 50. But we are loosing the meaning of context as we are chunking character by character. This issue will be there for all types of fixed based chunking.
Fixed based chunking is a basic version of chunking technique. Because of this issue, we will use Semantic, LLM, Parent child, Document based chunking in real time.
Output :
Chunk 1:
This is a simple example to demonstrate fixed size
----------------------------------------
Chunk 2:
chunking. We split the text into chunks of equal
----------------------------------------
Chunk 3:
length.
----------------------------------------
Fixed-size chunking (word based) :
def fixed_word_chunking(text, chunk_size=10):
words = text.split()
chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
return chunks
# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."
# Call the function
chunks = fixed_word_chunking(text, chunk_size=10)
# Display results
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:")
print(chunk)
print("-" * 40)
Output :
Chunk 1:
This is a simple example to demonstrate fixed size chunking.
----------------------------------------
Chunk 2:
We split the text into chunks of equal length.
----------------------------------------
Fixed-size chunking (Token based with tiktoken) :
import tiktoken
def fixed_token_chunking(text, chunk_size=10):
enc = tiktoken.get_encoding("cl100k_base") # Use encoding for OpenAI models
tokens = enc.encode(text)
chunks = [tokens[i:i+chunk_size] for i in range(0, len(tokens), chunk_size)]
return [enc.decode(chunk) for chunk in chunks]
# Sample text
text = "This is a simple example to demonstrate fixed size chunking. We split the text into chunks of equal length."
# Call the function
chunks = fixed_token_chunking(text, chunk_size=10)
# Display results
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:")
print(chunk)
print("-" * 40)
For token based chunking, we need to use a library/module called tiktoken. It has 2 methods, encode will convert word into token ID and decode covert token ID back into word.
cl100k_base is encoding algorithm for OpenAI models. Almost same code but we are just using encode and decode methods.
Output :
Chunk 1:
This is a simple example to demonstrate fixed size chunk
----------------------------------------
Chunk 2:
ing. We split the text into chunks of equal
----------------------------------------
Chunk 3:
length.
----------------------------------------
But in all the above methods, we are seeing missing context. Hence overlap comes into picture. Observe below diagram.
Sliding window chunking chunks overlap by certain percentage of number of chunks. Purpose : Preserve context between chunks
Parameters :
chunk_size : 512 tokens
overlap : 50 - 100 tokens
Great for minimizing context loss during splitting.
Example : Sliding window chunking(word based)
def sliding_window_chunking(text, chunk_size=10, overlap=3):
words = text.split()
step = chunk_size - overlap
chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), step)]
return chunks
# Sample text
text = "This is a simple example to demonstrate sliding window chunking. It helps preserve context between chunks by overlapping."
# Call the function
chunks = sliding_window_chunking(text, chunk_size=10, overlap=3)
# Display results
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:")
print(chunk)
print("-" * 40)
Output :
Chunk 1:
This is a simple example to demonstrate sliding window chunking.
----------------------------------------
Chunk 2:
sliding window chunking. It helps preserve context between chunks by
----------------------------------------
Chunk 3:
between chunks by overlapping.
----------------------------------------
Once chunking is done, we need to convert these chunks to Embeddings. Lets see embeddings process.
Implementing Embeddings :
from langchain.embeddings import OpenAIEmbeddings
from langchain_core.vectorstores import FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
# 1. Load a sample text file
loader = TextLoader("C:/Personal/2024/Learning/Generative AI/Agents_Practice/Langchain/5_Indexes/9_Document Loaders/example.txt")
documents = loader.load()
# 2. Split text into chunks
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
docs = text_splitter.split_documents(documents)
# 3. Initialize OpenAI Embeddings
embedding_model = OpenAIEmbeddings()
# 4. Create a FAISS vector store from documents using embeddings
vectorstore = FAISS.from_documents(docs, embedding_model)
# 5. Confirm the vectorstore is built
print("✅ FAISS vector store created with", len(docs), "chunks.")
Output :
C:\Users\anilk\AppData\Local\Temp\ipykernel_13952\1272361546.py:15: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import OpenAIEmbeddings``.
embedding_model = OpenAIEmbeddings()
✅ FAISS vector store created with 2 chunks.
Instead of using custom logic, we are using existing class in langchain to chunk data and convert them into embeddings in the above code.
FAISS is one of the vector DB in langchain. We use it to store embeddings. In real time, we use our regular databases like mySQL, Mango DB etc for storing data into vector DB.
Note : This entire process of data extraction from source, converting them into chunks, and then into embedding and storing these embeddings into a vector DB is called Retrieval process in RAG. Some books referring this step is called as Indexing.
Retrieval & Indexing both are same. This is a basic RAG, we will see production grade RAG soon.
Augmentation :
As part of augmentation process, we need to do similarity search for user query in vector DB as implemented in the below code.
query = "What is LangChain?"
results = vectorstore.similarity_search(query, k=1)
for i, doc in enumerate(results):
print(f"🔍 Result {i+1}:\n{doc.page_content}\n")
Output :
🔍 Result 1:
LangChain is a powerful framework for building applications with language models.
It provides abstractions and utilities to make LLM-powered apps easier to develop.
similarity_search() will convert the entire string into chunks(if needed - if string size > context window size), convert into embeddings and then do similarity search in the vector DB. We can use keyword search as well.
k=1 means, it will give top 1 result. If k=3, then it will print top 3 results.
As per above image, once it maches the similarity embeddings in vector DB, in addition with Query embeddings, it will send both Query and similarity search embeddings to LLM for processing. This is very important to understand.Incase my requirement is some % need similarity search and rest of the % need keyword search ! Then it is called Hybrid Search.
Hybrid Search = Semantic Search + Keyword Search
For Keyword search, we have BM25 algorithm.
In real time, we will use vector DB's like Pine cone, Qdrant, Milvs etc. We will see more details going forward.
Retrievers
Implementing Vector Store Retriever:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
# Step 1: Your source documents
documents = [
Document(page_content="LangChain helps developers build LLM applications easily."),
Document(page_content="Chroma is a vector database optimized for LLM-based search."),
Document(page_content="Embeddings convert text into high-dimensional vectors."),
Document(page_content="OpenAI provides powerful embedding models."),
]
# Step 2: Initialize embedding model
embedding_model = OpenAIEmbeddings()
# Step 3: Create Chroma vector store in memory
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embedding_model,
collection_name="my_collection"
)
# Step 4: Convert vectorstore into a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
query = "What is Chroma used for?"
results = retriever.invoke(query)
for i, doc in enumerate(results):
print(f"\n--- Result {i+1} ---")
print(doc.page_content)
--- Result 1 ---
Chroma is a vector database optimized for LLM-based search.
--- Result 2 ---
LangChain helps developers build LLM applications easily.
Note that we are still in Retrieval & Augmentation phase, haven't went into Generation phase to use LLM.
Also, as we are in learning phase, we are using Fixed Base chunking, open source LLMs, Chroma & vector DB. In real time, we use different mechanism altogether which we will see in future blogs.
Lets see one basic Agent Implementation
from langchain.agents import initialize_agent, AgentType
from langchain.agents import Tool
from langchain.chat_models import ChatOpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.llms import OpenAI
from dotenv import load_dotenv
load_dotenv()
# Define a tool (SerpAPI for web search)
search = SerpAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="Useful for answering questions about current events",
),
]
# LLM
llm = ChatOpenAI(temperature=0)
# Initialize agent
agent = initialize_agent(
tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
# Run
#response = agent.run("What's the latest news about generative AI?")
response = agent.run("""A company is designing a water tank shaped like a cylinder with a hemisphere on top (like a capsule). The cylindrical part has a height of 20 meters and a radius of 7 meters. The hemispherical dome on top has the same radius (7 meters).Compute the total volume of the tank.
If the tank is filled with water up to 80% of its capacity, calculate the volume of water stored.
Suppose the cost of painting the outer surface (only the curved cylinder + hemisphere, not the base) is $15 per square meter. Compute the total painting cost.
Provide both the exact symbolic answer (in terms of π) and the approximate numerical value rounded to two decimal places.""")
print(response)
In the above logic, we are using SerpAPIWrapper from Google APIs. We can also implement this with custom logic as below.
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
# Define a custom tool
def multiply_numbers(query: str) -> str:
numbers = [int(x) for x in query.split() if x.isdigit()]
return str(numbers[0] * numbers[1]) if len(numbers) >= 2 else "Need 2 numbers."
tools = [
Tool(
name="MultiplyTool",
func=multiply_numbers,
description="Multiplies two numbers given in a query like '3 and 4'",
),
]
# LLM
llm = ChatOpenAI()
# Agent
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
# Run
print(agent.run("multiply 3 4"))
Simply, we have implemented a tool and using the definition we created in it, and then calling this tool inside the agent.
Above agent output help us to understand how a agent think almost like a human being.
That's all for this blog. See you in next blog.
Thank you for reading this blog !
Comments
Post a Comment