Skip to main content

(AI #21) Agentic AI Design Patterns

Agentic AI systems are evolving from simple prompt-response applications into autonomous systems capable of reasoning, planning, and taking actions using tools and external knowledge sources. Depending on the complexity of the workflow, these systems can be designed using either single-agent or multi-agent architectures. A single-agent system centralizes reasoning and decision-making within one intelligent agent, making it suitable for simpler workflows and lightweight automation. In contrast, multi-agent systems distribute responsibilities across specialized agents that collaborate to solve complex tasks more efficiently. Modern production-grade AI platforms increasingly adopt multi-agent and graph-based orchestration patterns to improve scalability, reliability, and observability.


Large Language Models (LLMs)

LLMs are AI models trained on vast amount of text data to understand and generate human-like text. They power chat-bots, code assistants, translation tools, content generation, and more. We have also discussed about how input text is converted into tokens using the concept of tokenization, then convert into embeddings for further model processing using Transformer architecture. We have been discussed about transformer architecture, encoding & decoding methods, attention mechanisms like masked multi-headed attention etc. 

We also need to discuss about the limitations of LLMs. 

  • May generate incorrect or misleading information(hallucination)
  • Lacks real-time knowledge unless connected to external tools
  • Biased outputs are possible if training data is biased
  • High computational cost for training and serving
  • Doesn't truly understand like humans 


Retrieval Augmented Generation (RAG)

We have discussed that, before constructing a RAG, we need to prepare our knowledge base - like extract data from source, if needed chunk them using available chunking strategies, then convert chunks into embeddings and finally store them in a vector store/database. This entire process is called Indexing. Once indexing is done, we can work on building a RAG system using intent validation, query expansion, query reformulation, pre/post filtering, semantic/keyword searching, reranking etc. 

Now, lets start our journey about Agentic AI.


Agentic AI

Agentic AI systems are autonomous agents that perceive their environment, reason, make decisions, takes actions using tools, and learn from outcomes to achieve goals with minimal human intervention.

Simply:

  • LLMs (Reasoning/Thinking power)
  • Tools (RAG, MCP)
  • Memory (Agentic AI memory - part of architecture)
  • Observability (Tracing entire Agentic AI execution)
  • Guardrails (Enable security)
Combine together is nothing but Agentic AI.


Typical Agent Loop

Goal(input) - Planning - Retrieve - Act - Observe - Reflect(validation) - Repeat until goal achieved 

Above 7 steps are very important to follow whether it is a single/multi-agent system.

  • Start small, define clear goals
  • Provide high-quality tools & data
  • Set guardrails & monitor closely
  • Iterate, learn & scale

LLM vs RAG vs Agentic AI

Please observe below image carefully and understand the difference between LLM, RAG, Agentic AI.




Design Patterns of Agentic AI

There are lot of design patterns for building agentic-AI systems but below 3 are proven patterns.
  • React Pattern (Reason + Act in a loop)
  • Hierarchical Pattern (Delegate & Decompose OR Supervisor-Worker)
  • Planner - Executor - Reviewer Pattern (Plan, execute & self critique)


React Agent Pattern (Reason + Act in a loop)


The agent reasons about the current state, decides an action, executes it in the environment, observes the result, and repeated until the goal is achieved. 

THOUGHT  -> ACTION -> OBSERVATION -> Repeat till goal achieved.

This is not a feasible for a complex agentic-AI solutions. It is feasible for simple workflows. If we try to fit a travel planner agent into this design pattern. we can't event fit all the actions into one agent(remember in react agent, we have only one agent and it is like a straight line). We can't use React design pattern for building complex agentic-AI systems.

ReAct is fundamentally a single agent reasoning-and-tool-use design pattern where the agent iteratively thinks, acts and observes. It works well for simple to moderately complex workflows. However for large scale production agentic AI systems, a pure single agent ReAct architecture can become difficult to scale due to context growth, tool overload, latency, and reliability concerns. Modern systems therefore extend ReAct using graph based orchestration, supervisor-worker multi-agent architectures, memory layers, and guardrails. Even in multi agent systems, many individual agents still internally use the ReAct pattern.  


Hierarchical Agent Design Pattern

Delegate & Decompose - Break down complex goals into subgoals and delegate to specialized agents organized in a hierarchy.  


A top level manager agent receives a goal, decomposes into subgoals, and delegates them to specialized sub-agents. Sub-agents may further decompose and delegate, forming a hierarchy. Results are aggregated bottom-up to produce the final response.  

Main drawback of this design pattern is Single Point of Failure, manager is the bottleneck. Here we can write data into memory instead of maintaining a manger agent.

Note :

Assume, we have a requirement where 2 sub-agents need to interact with each other. This is where Agent-to-Agent (A2A) protocol comes into picture. It is an additional integration to multi-agent systems. Google introduced this protocol on April 09th 2025, it will be useful to communicate between local agents and agents residing in cloud(AWS, GCP, Azure).


Planner - Executor - Reviewer Agentic AI Design Pattern

The Planner creates a plan to achieve the goal. The executor carries out the plan using tools and data. The Reviewer evaluates the result, suggests improvements, and decides whether to approve or iterate.



Agentic AI Memory

Agentic AI memory enables agentic AI systems to retain information, leverage past experiences, and continuously improve decision-making and task execution. In simple terms, we are making our agentic AI systems to remember, learn and act smarter over time. 



Types of Agentic AI memory:

  • Short-Term memory (Working memory)
    • Holds the information in the current context or conversation
  • Long-Term memory (Episodic/Semantic memory)
    • Stores information across sessions, includes facts, preferences, interactions & experiences
    • Episodic memory - Past data
    • Semantic memory - Facts
  • User/Entity memory (Profile memory)
    • Stores knowledge specific to a user, entity or a domain
  • Procedural memory (Skill memory)
    • Stores procedures, workflows, and how-to-knowledge
    • Example : Common things like cycling, walking etc. will store permanently
  • Reflective memory (Insights/Lessons)
    • Stores warnings, feedback, and self-reflections
    • Stores lessons learned
    • Example : When a program failure happen, it is like storing the fix for this failure and store it for future purpose


A2A protocol :

To establish a connection between 2 agents using A2A - we need Agent skill & Agent card. If we need to define about a person, we need his skills & some personal information right ? Similar way, we need to prepare couple of JSON files about agent i.e. called Agent skills & Agent card.


In a Agent-to-Agent protocol :

  • Agent card describe who an agent is and how to talk to it 
  • Agent skill describe what the agent can do

These two JSON files needs to be created once we are ready with agents code. Based on the skill set of that particular agents, we will create these 2 JSON files. Note, we are not going to create this manually. Once your agent is ready, we can simply give it to LLM and it can generate Agent card & skills JSON files. 


Common problems in Agentic AI Systems



Lets discuss about Agentic AI design patterns in detail using a use case about bank loan processing.


REACT AGENT

User Input - I want a personal loan of $25k. My annual income is $85k and my SSN is 123-456-789. Similarly every day, bank will receive n number of applications.

It is extremely important to understand that we shouldn't pass above input immediately to processing layer. First and foremost thing that we need to do it enable Guardrails for this user input. Lets see what does it mean. 

Guardrails (refer point#2 in the below  image)

  • Input validation
    • Blocked pattern data
    • Input length check(<= 5000 chars)
  • Domain validation
    • Loan amount > 0 AND <= $10,00,000
    • Credit score between 300 - 850
    • Minimum income >= $12,000
  • Output sanitization
    • Mask sensitive data
    • SSN, passwords, keys etc.



Once the user input pass through proper input Guardrails and passed then only we need to allow user request  into step3 to process.

Create your own user queries, both valid and invalid queries and present them to customer, then only client will understand the concept of having Guardrails in our Agentic AI system. Always create more queries, save them in DB for future reference. And all these queries are dependent on what guardrails you implement in your multi-agent system. Guardrails needs to be validation both in keyword way and semantic way.

For user query validation, you need to ask your customer about the documentation and create a knowledge base out of it. Once this knowledge base/graph is ready, then for every input query, you need to search in this knowledge base/graph whether a particular keyword or at-least semanticity is available based on the user input. Then validate the user input using this knowledge, once all input guardrails are passed, then only allow user input to process further. Otherwise you need to communicate the end user about this situation asking them to re-validate their input. We can take the help of LLM to create these knowledge graphs.

Once guardrails step is passed, next step is ReAct agent where we will use system prompt by using Chain-of-Thought prompt technique as mentioned in above image.

The flow will be moved to Thought -> Action -> Observation -> Answer layer.  Developer need to see what kind of system and tools are needed here during ACTION. All the policy related documents and required data will be stored in RAG system especially in a vector DB. Then retrieval process will start using keyword(bm25) + Semantic techniques, followed by re-ranking and send user query + context pulled from tools to LLM, then LLM will produce final answer.   

Real time data will be pulled from MCP tools, this is the responsibility of ACTION. 

OBSERVATION will verify whether the results are grounded or not. These are nothing but evolution metrics.

Once LLM generates output, we need to enable output guardrails to mask sensitive data etc. Then we will produce final response to user.

Important Note :
  • Fallback mechanism implementation
  • Handling errors
  • Collect data wherever possible in this entire path and store in a database
  • Mention maximum number of retries to avoid infinite loops


SUPERVISOR + WORKER multi agent system



We will discuss about same example i.e. bank loan processing system. In previous case during ReAct agent design pattern, we have only once agent but here we will be having multiple agents. We do have Guardrails layer here and we have discussed about it. 

As you can see in the above image, Supervisor Agent act as a orchestrator which is also called as Root Agent. 
  • First responsibility of Supervisor agent is clearly understand the goal of the user
  • Orchestrates workflow and assigns tasks to specialized workers
  • Monitors progress, aggregates results, and makes routing decisions
  • Handles escalations, guardrail checks, and final decision
  • Maintain shared state across the system  
Look at the subagents 3.1, 3.2, 3.3, 3.4, 3.5, 3.6 which are specialized in their specific work. Supervisor agent must aware of the specialization of sub agents.

Share state block & Supervisor decision logic will handle above task of maintains skills of sub agents, generally we can create a skills.md file which is having the details of sub agent specialization skills. Based on the skills mentioned in the skills.md file, supervisor decision logic will assign tasks to sub agents.

Formal way of doing this is by maintaining a proper agent skill registry in our company. You put that information in GitHub, Confluence or can use Google agent registry. But maintaining this information is extremely important. Whenever you create an agent with some skills, you should register in the agent registry. This is to provide information to other teams and programmers to avoid duplicate tools and agents. 

After this, whenever a request coming from user, our supervisor agent will refer agent registry and try to select tools from registry.

Handshake/Handoff between sub agents will happen based on the state information. Each agent is associated with tools and tools are nothing but your python functions. These functions will return the current state information to SHARED STATE. Entire loan application state will be here and can be accessible to all the sub agents. 
  


Planner-Executor-Reviewer loan processing system




Everything is same except the flow of data from one step to another. Here first step is planner, then Executor followed by Reviewer. 

Based on the user request, your agent (internally it will call LLM) need to take care of planning. 

Don't go with one single plan, always have a fallback plan as well. DO NOT fail system, instead 
maintain a fallback. 

We have to incorporate plan-B in the prompt itself during planning.

Responsibility of Executor is simply to execute things step-by-step. Run MCP server,  use tools to get data from external sources or pull data from RAG etc. have fallback mechanism here as well.

Finally Reviewer will review the data and make decisions out of it. 

Always include max number of iterations. Otherwise it will end up with infinite loop, which consumes cost, latency and every possible issue that we can't think of.

Incase if agent is unable to make the decision, in such case - redirect those requests to Human-in-the-loop. In Production, always try to minimize routing to Human-in-the-loop.


Conclusion : 

That's all about theory. Feel free to download code from following repo : https://github.com/amathe1/AI-code/tree/main/8_AgenticAI_DesignPatterns


Thank you for reading this blog !

Arun Mathe

Comments

Popular posts from this blog

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...

(AI #1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space. What is Artificial Intelligence, Machine Learning & Deep Learning ? AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.  BTW, starting from this blog, it is not just a technical journal, we talk about internals here. ...

Spark Core : Understanding RDD & Partitions in Spark

Let us see how to create an RDD in Spark.   RDD (Resilient Distributed Dataset): We can create RDD in 2 ways. From Collections For small amount of data We can't use it for large amount of data From Datasets  For huge amount of data Text, CSV, JSON, PDF, image etc. When data is large we should go with Dataset approach     How to create an RDD ? Using collections val list = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(list) SC is Spark Context parallelize() method will convert input(collection in this case) into RDD Type of RDD will be based on the values assigned to collection, if we assign integers and RDD will be of type int Let's see below Scala code : # Created an RDD by providing a Collection(List) as input scala> val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23 # Printing RDD using collect() method scala> rdd.collect() res0: Array[Int] = Array(1, 2, 3, 4...