Agentic AI systems are evolving from simple prompt-response applications into autonomous systems capable of reasoning, planning, and taking actions using tools and external knowledge sources. Depending on the complexity of the workflow, these systems can be designed using either single-agent or multi-agent architectures. A single-agent system centralizes reasoning and decision-making within one intelligent agent, making it suitable for simpler workflows and lightweight automation. In contrast, multi-agent systems distribute responsibilities across specialized agents that collaborate to solve complex tasks more efficiently. Modern production-grade AI platforms increasingly adopt multi-agent and graph-based orchestration patterns to improve scalability, reliability, and observability.
Large Language Models (LLMs)
LLMs are AI models trained on vast amount of text data to understand and generate human-like text. They power chat-bots, code assistants, translation tools, content generation, and more. We have also discussed about how input text is converted into tokens using the concept of tokenization, then convert into embeddings for further model processing using Transformer architecture. We have been discussed about transformer architecture, encoding & decoding methods, attention mechanisms like masked multi-headed attention etc.
We also need to discuss about the limitations of LLMs.
- May generate incorrect or misleading information(hallucination)
- Lacks real-time knowledge unless connected to external tools
- Biased outputs are possible if training data is biased
- High computational cost for training and serving
- Doesn't truly understand like humans
Retrieval Augmented Generation (RAG)
We have discussed that, before constructing a RAG, we need to prepare our knowledge base - like extract data from source, if needed chunk them using available chunking strategies, then convert chunks into embeddings and finally store them in a vector store/database. This entire process is called Indexing. Once indexing is done, we can work on building a RAG system using intent validation, query expansion, query reformulation, pre/post filtering, semantic/keyword searching, reranking etc.
Now, lets start our journey about Agentic AI.
Agentic AI
Agentic AI systems are autonomous agents that perceive their environment, reason, make decisions, takes actions using tools, and learn from outcomes to achieve goals with minimal human intervention.
Simply:
- LLMs (Reasoning/Thinking power)
- Tools (RAG, MCP)
- Memory (Agentic AI memory - part of architecture)
- Observability (Tracing entire Agentic AI execution)
- Guardrails (Enable security)
Typical Agent Loop
Goal(input) - Planning - Retrieve - Act - Observe - Reflect(validation) - Repeat until goal achieved
Above 7 steps are very important to follow whether it is a single/multi-agent system.
- Start small, define clear goals
- Provide high-quality tools & data
- Set guardrails & monitor closely
- Iterate, learn & scale
- React Pattern (Reason + Act in a loop)
- Hierarchical Pattern (Delegate & Decompose OR Supervisor-Worker)
- Planner - Executor - Reviewer Pattern (Plan, execute & self critique)
THOUGHT -> ACTION -> OBSERVATION -> Repeat till goal achieved.
This is not a feasible for a complex agentic-AI solutions. It is feasible for simple workflows. If we try to fit a travel planner agent into this design pattern. we can't event fit all the actions into one agent(remember in react agent, we have only one agent and it is like a straight line). We can't use React design pattern for building complex agentic-AI systems.
ReAct is fundamentally a single agent reasoning-and-tool-use design pattern where the agent iteratively thinks, acts and observes. It works well for simple to moderately complex workflows. However for large scale production agentic AI systems, a pure single agent ReAct architecture can become difficult to scale due to context growth, tool overload, latency, and reliability concerns. Modern systems therefore extend ReAct using graph based orchestration, supervisor-worker multi-agent architectures, memory layers, and guardrails. Even in multi agent systems, many individual agents still internally use the ReAct pattern.
Hierarchical Agent Design Pattern
Delegate & Decompose - Break down complex goals into subgoals and delegate to specialized agents organized in a hierarchy.
Main drawback of this design pattern is Single Point of Failure, manager is the bottleneck. Here we can write data into memory instead of maintaining a manger agent.
Note :
Assume, we have a requirement where 2 sub-agents need to interact with each other. This is where Agent-to-Agent (A2A) protocol comes into picture. It is an additional integration to multi-agent systems. Google introduced this protocol on April 09th 2025, it will be useful to communicate between local agents and agents residing in cloud(AWS, GCP, Azure).
Planner - Executor - Reviewer Agentic AI Design Pattern
The Planner creates a plan to achieve the goal. The executor carries out the plan using tools and data. The Reviewer evaluates the result, suggests improvements, and decides whether to approve or iterate.
Agentic AI Memory
Agentic AI memory enables agentic AI systems to retain information, leverage past experiences, and continuously improve decision-making and task execution. In simple terms, we are making our agentic AI systems to remember, learn and act smarter over time.
Types of Agentic AI memory:
- Short-Term memory (Working memory)
- Holds the information in the current context or conversation
- Long-Term memory (Episodic/Semantic memory)
- Stores information across sessions, includes facts, preferences, interactions & experiences
- Episodic memory - Past data
- Semantic memory - Facts
- User/Entity memory (Profile memory)
- Stores knowledge specific to a user, entity or a domain
- Procedural memory (Skill memory)
- Stores procedures, workflows, and how-to-knowledge
- Example : Common things like cycling, walking etc. will store permanently
- Reflective memory (Insights/Lessons)
- Stores warnings, feedback, and self-reflections
- Stores lessons learned
- Example : When a program failure happen, it is like storing the fix for this failure and store it for future purpose
A2A protocol :
To establish a connection between 2 agents using A2A - we need Agent skill & Agent card. If we need to define about a person, we need his skills & some personal information right ? Similar way, we need to prepare couple of JSON files about agent i.e. called Agent skills & Agent card.
In a Agent-to-Agent protocol :
- Agent card describe who an agent is and how to talk to it
- Agent skill describe what the agent can do
These two JSON files needs to be created once we are ready with agents code. Based on the skill set of that particular agents, we will create these 2 JSON files. Note, we are not going to create this manually. Once your agent is ready, we can simply give it to LLM and it can generate Agent card & skills JSON files.
Common problems in Agentic AI Systems
Lets discuss about Agentic AI design patterns in detail using a use case about bank loan processing.
REACT AGENT
User Input - I want a personal loan of $25k. My annual income is $85k and my SSN is 123-456-789. Similarly every day, bank will receive n number of applications.
It is extremely important to understand that we shouldn't pass above input immediately to processing layer. First and foremost thing that we need to do it enable Guardrails for this user input. Lets see what does it mean.
Guardrails (refer point#2 in the below image)
- Input validation
- Blocked pattern data
- Input length check(<= 5000 chars)
- Domain validation
- Loan amount > 0 AND <= $10,00,000
- Credit score between 300 - 850
- Minimum income >= $12,000
- Output sanitization
- Mask sensitive data
- SSN, passwords, keys etc.
- Fallback mechanism implementation
- Handling errors
- Collect data wherever possible in this entire path and store in a database
- Mention maximum number of retries to avoid infinite loops
- First responsibility of Supervisor agent is clearly understand the goal of the user
- Orchestrates workflow and assigns tasks to specialized workers
- Monitors progress, aggregates results, and makes routing decisions
- Handles escalations, guardrail checks, and final decision
- Maintain shared state across the system
Thank you for reading this blog !
Arun Mathe
Comments
Post a Comment