(AI #15) Prompt Engineering

Prompt Engineering focuses on how we instruct an AI model, while Context Engineering determines what information the model receives. Although well crafted prompts can guide responses, the quality and relevance of context ultimately drive meaningful, accurate outputs. In modern AI systems, true intelligence emerges from combining both - clear instructions supported by right data.

Prompt Engineering :

Prompt Engineering is about how your write instructions to the model to get desired output.

Focus will be on :

Wording of the input
Structure of instructions
Examples(few-shot learning)
Tone, format, constraints etc.

Example : You are helpful AI assistant. Summarize following text in 3 bullet points.

Techniques :

Zero/Few shot prompting
Chain-of-thought prompting
Role prompting(ex: You are a dev-ops expert.)
Output formatting(output, table etc.)

Goal : Get better responses by improving the prompt itself.

Context Engineering :

Context Engineering is about what information you provide to the model, not just what you ask. It focus on feeding the right data to the prompt.

Example : Instead of just asking summarize this document, you retrieve relevant documents from vector DB, add chat history, add user preferences, then send : "User prefers short summaries..., here is the document..., previous conversation...."

Techniques :

RAG(Retrieval Augmented Generation)
Memory management(Short & Long term)
Context window optimization
Chunking and Ranking
Tool augmentation

Goal : Improve output by improving the data/context.

Prompt Engineering technique helps in communicating between two different agent and also between a Agent and LLM.

Prompt can useful in all the below communications:

Agent <--------> LLM
Agent <--------> RAG
Agent <--------> MCP

We have 2 types of techniques.

Prompt Techniques
Defence Prompt Techniques

Even in Prompt Techniques, we have basic and advances techniques. We will talk about all techniques.

Prompt Techniques : We have 10 types of Prompt techniques

Zero shot
Few shot
Role Based
CoT(Chain of Thoughts)
Context Aware
Prompt Chaining
SRL(Semantic Role Labelling)
ReACT
ToT(Tree of Thoughts)
Meta Prompting

1) Zero-Shot Prompting

Zero-Shot prompting asks the model to perform a task without any examples. It relies entirely on the models pre-trained knowledge and instruction following capabilities.

Key-principles

Be explicit about the task - ambiguity causes hallucinations
Assign a specific role to activate domain priors
Specific output format constraints upfront(JSON, markdown, text etc.)
Use imperative verbs: "Classify", "Extract", "Summarize"
Temperature should be low (0.0-0.3)

Designing a 3 node LangGraph for Zero-Shot Prompting :

First node is for prompt creation
Second node is for LLM, to accept build prompt and to generate raw response
Third node is for formatting raw response from LLM and generate final response

Implementing Zero-Shot Prompting :

# ─────────────────────────────────────────────────────────
# Zero-Shot Prompting with LangGraph
# Pattern: Linear pipeline — Input → Prompt → LLM → Output
# ─────────────────────────────────────────────────────────

from typing import TypedDict, Optional
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate
import json
import os
from dotenv import load_dotenv
load_dotenv()

# ── State Schema ──────────────────────────────────────────
class ZeroShotState(TypedDict):
    task: str                    # Raw task description
    role: str                    # Persona to adopt
    output_format: str           # Expected format (json/markdown/text)
    constraints: list[str]       # Negative constraints
    built_prompt: str            # Constructed prompt
    raw_response: str            # LLM raw output
    final_output: dict | str     # Parsed final result
    error: Optional[str]         # Error tracking

# ── LLM Setup ─────────────────────────────────────────────
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.1,          # Low temp for deterministic zero-shot
    max_tokens=2048
)


# ── Node 1: Build Prompt ──────────────────────────────────
def build_prompt(state: ZeroShotState) -> ZeroShotState:
    """
    Constructs a structured zero-shot prompt using:
    - Role assignment (activates domain-specific priors)
    - Task decomposition  
    - Format specification
    - Negative constraints
    """
    constraints_text = "\n".join(
        [f"- Do NOT {c}" for c in state["constraints"]]
    ) if state["constraints"] else "None"
    #DO NOT include subjective opinions
    #DO NOT use vague categories

    format_instruction = {
        "json": "Respond ONLY with valid JSON. No markdown fences.",
        "markdown": "Use proper markdown with headers and lists.",
        "text": "Respond with clear, concise plain text.",
    }.get(state["output_format"], "Respond clearly.")

    prompt = f"""You are {state['role']}.

TASK:
{state['task']}

OUTPUT FORMAT:
{format_instruction}

CONSTRAINTS:
{constraints_text}

Begin your response now:"""

    return {**state, "built_prompt": prompt}


# ── Node 2: Call LLM ──────────────────────────────────────
def call_llm(state: ZeroShotState) -> ZeroShotState:
    """Sends prompt to LLM and captures raw response."""
    try:
        messages = [
            SystemMessage(content="Follow the user's instructions precisely."),
            HumanMessage(content=state["built_prompt"])
        ]
        response = llm.invoke(messages)
        return {**state, "raw_response": response.content, "error": None}
    except Exception as e:
        return {**state, "error": str(e), "raw_response": ""}


# ── Node 3: Format Output ─────────────────────────────────
def format_output(state: ZeroShotState) -> ZeroShotState:
    """Parses and validates the LLM response."""
    if state.get("error"):
        return {**state, "final_output": {"error": state["error"]}}

    raw = state["raw_response"].strip()

    if state["output_format"] == "json":
        try:
            parsed = json.loads(raw)
            return {**state, "final_output": parsed}
        except json.JSONDecodeError:
            # Strip markdown fences if model added them
            clean = raw.replace("```json", "").replace("```", "").strip()
            parsed = json.loads(clean)
            return {**state, "final_output": parsed}

    return {**state, "final_output": raw}

# ── Graph Construction ────────────────────────────────────
def build_zero_shot_graph():
    builder = StateGraph(ZeroShotState)

    builder.add_node("build_prompt", build_prompt)
    builder.add_node("call_llm", call_llm)
    builder.add_node("format_output", format_output)

    builder.add_edge(START, "build_prompt")
    builder.add_edge("build_prompt", "call_llm")
    builder.add_edge("call_llm", "format_output")
    builder.add_edge("format_output", END)

    return builder.compile()

# ── Usage Example ─────────────────────────────────────────
from IPython.display import Image, display
graph = build_zero_shot_graph()

# View
display(Image(graph.get_graph().draw_mermaid_png()))



result = graph.invoke({
    "task": "Classify this customer review sentiment and extract key themes: 
             'The product arrived late but the quality was exceptional. Customer 
                                            support resolved my issue quickly.'",
    "role": "an expert sentiment analysis system for e-commerce",
    "output_format": "json",
    "constraints": ["include subjective opinions", "use vague categories"],
    "built_prompt": "",
    "raw_response": "",
    "final_output": {},
    "error": None
})
print(result["built_prompt"])
print("*"*100)
print(result["raw_response"])
print("*"*100)
print(result["final_output"])
# → {"sentiment": "mixed", "themes": ["delivery", "quality", "support"],
#    "scores": {"delivery": -0.7, "quality": 0.9, "support": 0.8}}

Output :

Please hold this blog here some time : try to understand each line in the above program and execute it. Once you are familiar with the concept of Zero Prompting, especially how to build a prompt, provide constraints and format instructions clearly and formatting the raw output from LLM, then rest of the prompt techniques are easy to understand.

First understand the concept of Zero-Shot prompting
Second, understand the design of state schema, LLM call, building prompt, defining constraints & formatting instructions etc.
Third, understand the design of 3 nodes for building prompt, LLM & formatting
Fourth, understand the creation of LangGraph by creating nodes and edges/conditional edges
Finally, understand how we are invoking LLM and generating required response.

While building prompt, below sequence is important :

prompt = f"""You are {state['role']}.

TASK:
{state['task']}

OUTPUT FORMAT:
{format_instruction}

CONSTRAINTS:
{constraints_text}

Begin your response now:"""

2) Few-Shot Prompting

This is same as Zero-Shot prompting except the fact that we will mention few examples while feeding the prompt to LLM.

Lets understand the design, as shown in the above image : First we have to mention few examples related to the task we are asking LLM to process. Assume these examples are called Examples Bank. Now, once we invoke LLM with a task, first LLM will look into these examples bank and see if there is a related example to process user input. If yes, then it will use that particular example to process the user input. Incase, if examples are not related to user input then only LLM call will go to model and process the user input. In this scenario, already our LLM knew certain examples mentioned in the examples bank. Hence processing the user input is bit comfortable and it will increase the accuracy.

Research shows that examples quality matters more than quantity. Diverse, representative examples covering edge cases outperform many similar examples. The model implicitly learns the output schema, reasoning style, and domain vocabulary from demonstrations.

In LangGraph, a dedicated example_selector node dynamically retrieves the most relevant examples from a vector store, enabling adaptive few shot prompting at scale.

Best practices :

Use 3-8 examples
Ensure examples cover diverse edge cases
Order matters: Hardest examples last work best
Examples must follow the exact target format
Use semantic search to pick relevant examples dynamically
Include a mix of positive and negative examples

What is SemanticSimilarityExampleSelector ?

SemanticSimilarityExampleSelector is a dynamic few-shot example selector. Instead of hardcoding examples in your prompt, it stores set of examples converts them into embeddings at run time, selects the most semantically similar examples to the user query.

In simple terms :

"Pick the most relevant examples based on meaning(not keywords)".

Why it matters, especially in LangGraph :

In LangGraph, multi agent systems, prompts are everything. Agents behave better when given relevant context/examples.

Without selector :

Static examples, irrelevant content, poor reasoning.

With selector :

Context adapts dynamically, better accuracy, smarter agents.

How it works internally :

Converts all examples into embeddings
Convert user query into embedding
Compute similarity (usually cosine similarity)
Pick top-k closest examples

Implementation of Few-Shot Prompting :

# ─────────────────────────────────────────────────────────
# Few-Shot Prompting with Dynamic Example Selection
# Pattern: Vector store retrieval → Example injection → LLM
# ─────────────────────────────────────────────────────────

from typing import TypedDict, Optional
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_community.vectorstores import FAISS
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
import json
import os
from dotenv import load_dotenv
from IPython.display import Image, display
load_dotenv()

# ── Example Bank (Production: load from database) ─────────
EXAMPLE_BANK = [
    {
        "input": "Extract entities from: 'Apple Inc. CEO Tim Cook announced iPhone 16 in Cupertino'",
        "output": json.dumps({"persons": ["Tim Cook"], "organizations": ["Apple Inc."],
                       "products": ["iPhone 16"], "locations": ["Cupertino"]})
    },
    {
        "input": "Extract entities from: 'NASA astronaut Sally Ride flew the Space Shuttle'",
        "output": json.dumps({"persons": ["Sally Ride"], "organizations": ["NASA"],
                       "products": ["Space Shuttle"], "locations": []})
    },
    {
        "input": "Extract entities from: 'Microsoft released Windows 11 in Redmond, Washington'",
        "output": json.dumps({"persons": [], "organizations": ["Microsoft"],
                       "products": ["Windows 11"], "locations": ["Redmond", "Washington"]})
    },
    {
        "input": "Extract entities from: 'Elon Musk founded SpaceX and Tesla Motors'",
        "output": json.dumps({"persons": ["Elon Musk"], "organizations": ["SpaceX", "Tesla Motors"],
                       "products": [], "locations": []})
    },
]

# ── State Schema ──────────────────────────────────────────
class FewShotState(TypedDict):
    query: str
    num_examples: int            # How many examples to select
    selected_examples: list      # Retrieved examples
    built_prompt: str
    response: str
    parsed_result: dict


# ── Semantic Example Selector ─────────────────────────────
def create_example_selector(examples: list, k: int = 3):
    """Creates a semantic similarity-based example selector."""
    embeddings = OpenAIEmbeddings()
    return SemanticSimilarityExampleSelector.from_examples(
        examples=examples,
        embeddings=embeddings,
        vectorstore_cls=FAISS,
        k=k,
        input_keys=["input"]
    )


# ── Node 1: Dynamic Example Selection ────────────────────
def select_examples(state: FewShotState) -> FewShotState:
    """
    Uses semantic similarity to pick the most relevant examples
    for the current query from the example bank.
    This is CRITICAL for production — random examples perform poorly.
    """
    selector = create_example_selector(EXAMPLE_BANK, k=state["num_examples"])
    selected = selector.select_examples({"input": state["query"]})
    return {**state, "selected_examples": selected}

# ── Node 2: Build Few-Shot Prompt ─────────────────────────
def build_few_shot_prompt(state: FewShotState) -> FewShotState:
    """
    Assembles prompt with demonstrations using the 
    Input: / Output: delimiter pattern for maximum clarity.
    """
    examples_text = ""
    for i, ex in enumerate(state["selected_examples"], 1):
        examples_text += f"""
Example {i}:
Input: {ex['input']}
Output: {ex['output']}
---"""
# Example 1:
# Input: What is AI?
# Output: Artificial Intelligence is...
# ---

# Example 2:
# Input: What is ML?
# Output: Machine Learning is...
# ---

    prompt = f"""You are a Named Entity Recognition (NER) system.
Extract all entities and return valid JSON with keys:
persons, organizations, products, locations.

Here are {len(state['selected_examples'])} demonstration examples:
{examples_text}

Now extract entities from the following input.
Return ONLY the JSON object, no explanation.

Input: {state['query']}
Output:"""

    return {**state, "built_prompt": prompt}


# ── Node 3: Call LLM and Parse ───────────────────────────
def call_and_parse(state: FewShotState) -> FewShotState:
    llm = ChatOpenAI(model="gpt-4o", temperature=0.0)
    response = llm.invoke([HumanMessage(content=state["built_prompt"])])
    raw = response.content.strip()
    parsed = json.loads(raw)
    return {**state, "response": raw, "parsed_result": parsed}


# ── Graph Construction ────────────────────────────────────
def build_few_shot_graph():
    builder = StateGraph(FewShotState)

    builder.add_node("select_examples", select_examples)
    builder.add_node("build_few_shot_prompt", build_few_shot_prompt)
    builder.add_node("call_and_parse", call_and_parse)

    builder.add_edge(START, "select_examples")
    builder.add_edge("select_examples", "build_few_shot_prompt")
    builder.add_edge("build_few_shot_prompt", "call_and_parse")
    builder.add_edge("call_and_parse", END)

    return builder.compile()


# ── Usage ─────────────────────────────────────────────────
graph = build_few_shot_graph()
# View
# display(Image(graph.get_graph().draw_mermaid_png()))

result = graph.invoke({
    "query": "Extract entities from: 'Google CEO Sundar Pichai announced Gemini AI in Mountain View'",
    "num_examples": 3,
    "selected_examples": [],
    "built_prompt": "",
    "response": "",
    "parsed_result": {}
})


print(result["built_prompt"])
print("*"*100)
print(result["parsed_result"])

Output :

In the above example, we have converted examples from example bank into embedding on the fly, but in real time, our project related data might have stored in a DB. We need to establish a connection to DB, get required data, convert them into embeddings and use them in Few-Shot Prompting.

Hardcoding examples in the prompt is not recommended as it increase latency. Please use embeddings by using the class called SemanticSimilarityExampleSelector.

3) Role-Based Prompting

Role-Based prompting assigns the LLM a specific expert identity via the system prompt. The model then reasons, priorities, and responds from within that role.

Why it works ?

LLMs are trained on text written by many kinds of experts. When you say 'You are a Doctor', the model activates patterns from medical literature - clinical reasoning, risk-first thinking, domain vocabulary - even though the underlying model weights never change.

Key elements of a good role prompt :

Identity - You are a senior data scientist
Expertise-scope - Specializing in time series forecasting
Reasoning style - Always cite evidence before giving a recommendation
Tone/Constraints - Be concise, Avoid jargon the user may not know.\
Output format - End every answer with a confidence score 1-10

In LangGraph, each role becomes its own node with its own system prompt. A router node reads the users intent and directs the message to the right expert. This gives you:

Clean separation of role logic
Easy to add/remove roles
Traceable paths through the graph

One simple analogy to understand Role-Based prompting :

Assume, we asked a question to LLM "What is prompt engineering and how it helps in Agent-to-Agent interaction ?" - Now LLM will send this to the role "Researcher" and suggest it to use the prompt related to 'researcher' which will then send to LLM for response in accordance with the context related to this particular role.

Implementing a Role-Based prompting :

# ── Imports ───────────────────────────────────────────────
from typing import TypedDict, Literal, Optional
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from IPython.display import Image, display
from dotenv import load_dotenv
import os

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


# ── State ─────────────────────────────────────────────────
class RoleState(TypedDict):
    user_message: str          # The original question from the user
    selected_role: Optional[str]  # Which role was picked: doctor / lawyer / scientist
    response: Optional[str]    # Final expert response

# ── LLM ───────────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)


# ── Role Prompts ──────────────────────────────────────────
# Each role has a carefully crafted system prompt that defines:
#   1. Identity       – who the model is
#   2. Expertise      – what domain it knows
#   3. Reasoning style – how it thinks
#   4. Tone / format  – how it responds

ROLE_PROMPTS = {

    "doctor": """
You are Dr. Priya, a senior general physician with 20 years of clinical experience.

Your reasoning style:
- Think in terms of differential diagnosis (what could this be?)
- Prioritise patient safety above all else
- Always recommend professional consultation for serious symptoms
- Use plain language — avoid heavy medical jargon unless explaining a term

Response format:
1. Possible explanations (brief)
2. What to watch out for (red flags)
3. Recommended next step

Note: You provide general health information, not a medical diagnosis.
""",

    "lawyer": """
You are Advocate Rajan, a corporate and contract law specialist with 15 years of practice.

Your reasoning style:
- Identify the legal issue precisely before advising
- Cite relevant principles or common legal frameworks (contract law, tort, IP, etc.)
- Flag risks and liabilities clearly
- Always remind the user that this is general legal information, not formal legal advice

Response format:
1. Legal issue identified
2. Relevant principle or law
3. Practical advice
4. Disclaimer
""",

    "scientist": """
You are Dr. Chen, a data scientist and machine learning researcher at a leading AI lab.

Your reasoning style:
- Think empirically: what does the data/evidence say?
- Frame problems as hypotheses to test
- Mention trade-offs, assumptions, and limitations
- Prefer precise language; define terms when necessary

Response format:
1. Problem framing
2. Evidence-based explanation
3. Recommendation with trade-offs
4. Confidence level (Low / Medium / High) and why
"""
}



# ── Node 1: Role Detector ─────────────────────────────────
# This node reads the user's message and decides which expert to route to.
# It uses the LLM itself to classify intent — more flexible than keyword matching.

def detect_role(state: RoleState) -> RoleState:
    """
    Classifies the user's question into one of three roles.
    Uses a strict system prompt so the LLM replies with just the role name.
    """
    classifier_prompt = """You are a routing assistant.
Read the user's question and reply with EXACTLY one word — nothing else:
  - doctor     (if the question is about health, symptoms, medicine, wellness)
  - lawyer     (if the question is about law, contracts, rights, legal disputes)
  - scientist  (if the question is about data, AI, science, research, statistics)

Reply with only the single word. No punctuation, no explanation."""

    messages = [
        SystemMessage(content=classifier_prompt),
        HumanMessage(content=state["user_message"])
    ]

    result = llm.invoke(messages)
    role = result.content.strip().lower()

    # Fallback if LLM returns something unexpected
    if role not in ("doctor", "lawyer", "scientist"):
        role = "scientist"

    print(f"[Router] Detected role: {role}")

    return {**state, "selected_role": role}


# ── Node 2a: Doctor Node ──────────────────────────────────
def doctor_node(state: RoleState) -> RoleState:
    """Responds as a medical professional."""
    messages = [
        SystemMessage(content=ROLE_PROMPTS["doctor"]),
        HumanMessage(content=state["user_message"])
    ]
    response = llm.invoke(messages)
    return {**state, "response": response.content}


# ── Node 2b: Lawyer Node ──────────────────────────────────
def lawyer_node(state: RoleState) -> RoleState:
    """Responds as a legal professional."""
    messages = [
        SystemMessage(content=ROLE_PROMPTS["lawyer"]),
        HumanMessage(content=state["user_message"])
    ]
    response = llm.invoke(messages)
    return {**state, "response": response.content}


# ── Node 2c: Scientist Node ───────────────────────────────
def scientist_node(state: RoleState) -> RoleState:
    """Responds as a data scientist / researcher."""
    messages = [
        SystemMessage(content=ROLE_PROMPTS["scientist"]),
        HumanMessage(content=state["user_message"])
    ]
    response = llm.invoke(messages)
    return {**state, "response": response.content}


# ── Router Function ───────────────────────────────────────
# Called after detect_role — reads selected_role and returns the node name

def route_to_expert(
    state: RoleState
) -> Literal["doctor", "lawyer", "scientist"]:
    return state["selected_role"]



# ── Graph Construction ─────────────────────────────────────
def build_role_graph():
    builder = StateGraph(RoleState)

    # Add nodes
    builder.add_node("detect_role", detect_role)
    builder.add_node("doctor", doctor_node)
    builder.add_node("lawyer", lawyer_node)
    builder.add_node("scientist", scientist_node)

    # Entry point
    builder.add_edge(START, "detect_role")

    # Conditional routing based on detected role
    builder.add_conditional_edges(
        "detect_role",
        route_to_expert,
        {
            "doctor": "doctor",
            "lawyer": "lawyer",
            "scientist": "scientist"
        }
    )

    # All expert nodes lead to END
    builder.add_edge("doctor", END)
    builder.add_edge("lawyer", END)
    builder.add_edge("scientist", END)

    return builder.compile()


graph = build_role_graph()
display(Image(graph.get_graph().draw_mermaid_png()))



# ── Helper: pretty print ──────────────────────────────────
def ask(question: str):
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print('='*60)

    result = graph.invoke({
        "user_message": question,
        "selected_role": None,
        "response": None
    })

    print(f"Role used : {result['selected_role'].upper()}")
    print(f"\nResponse:\n{result['response']}")


ask("I have had a persistent headache for 3 days and feel dizzy. 
What could be causing this?")

Output :

4) CoT(Chain-of-Thought)

Chain-of-Thought gets the explicit intermediate reasoning steps before the final answer. CoT enables models to decompose complex problems into tractable sub-problems.

The key insight: by verbalizing reasoning, the model allocates more "compute" to harder problems. This is like how humans solve math problems by writing out steps. CoT dramatically improves performance on tasks requiring multi-step logic, arithmetic, and causal reasoning.

In LangGraph, CoT is implemented as a two-phase graph:

First a seasoning node generates the thought chain
Then an answer node attracts the answer from that chain

This separation allows independent quality checks on each phase.

Implementation of Chain-of-Thought :

# ─────────────────────────────────────────────────────────
# Chain-of-Thought with LangGraph
# Pattern: Problem → Reasoning Chain → Answer Extraction
# ─────────────────────────────────────────────────────────

from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from pydantic import BaseModel
import re
import json
import os
from dotenv import load_dotenv
from IPython.display import Image, display
load_dotenv()

llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
llm_precise = ChatOpenAI(model="gpt-4o", temperature=0.0)


# ── Structured Output Schema ──────────────────────────────
class ReasoningOutput(BaseModel):
    steps: list[str]             # Individual reasoning steps
    intermediate_values: dict    # Key computed values
    reasoning_valid: bool        # Self-assessment of logic
    confidence: float            # 0.0 – 1.0

class CoTState(TypedDict):
    problem: str
    problem_type: str            # math / logic / analysis / planning
    reasoning_chain: str         # Full CoT text
    reasoning_steps: list[str]   # Parsed steps
    is_valid_reasoning: bool
    final_answer: str
    confidence: float
    retry_count: int

# ── Node 1: Generate Reasoning Chain ─────────────────────
def generate_reasoning(state: CoTState) -> CoTState:
    """
    The core CoT node. Uses structured prompting to elicit
    step-by-step reasoning before committing to an answer.
    """
    system = """You are an expert problem solver. 
For every problem, you MUST:
1. Break it into numbered steps
2. Show your work at each step  
3. State intermediate conclusions clearly
4. Flag any assumptions you make
5. Write "REASONING COMPLETE" when done thinking

Format each step as:
Step N: [what you're doing]
→ [computation or logic]
→ [intermediate result]"""

    human = f"""Problem Type: {state['problem_type']}

Problem: {state['problem']}

Let's solve this step by step.
Think carefully before giving the final answer."""

    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=human)
    ])
    chain = response.content

    # Parse steps from response
    steps = re.findall(r'Step \d+:.*?(?=Step \d+:|REASONING COMPLETE|$)', chain, re.DOTALL)
    # Step 1: Analyze the input
    # Some explanation...

    # Step 2: Call tool
    # More explanation...

    # Step 3: Generate output
    # Final reasoning...
    # REASONING COMPLETE
    
    # 1️⃣ Step \d+:
    # Matches:
    # Step 1:
    # Step 2:
    # etc.
    # Explanation:
    # Step → literal word
    # \d+ → one or more digits
    # : → colon

    # ✅ So it identifies the start of each step

    # 2️⃣ .*? (non-greedy match)
    # Matches everything after the step
    # ? makes it non-greedy (very important!)

    # 👉 It stops as soon as the next condition is met

    # 3️⃣ Lookahead:
    # (?=Step \d+:|REASONING COMPLETE|$)

    # This is the most important part.

    # 👉 It says:

    # “Stop when you see the next step OR end marker OR end of text”

    # It checks for:
    # Next step → Step 2:
    # End marker → REASONING COMPLETE
    # End of string → $

    # ⚠️ Important:

    # It does not include these in the match
    # Just uses them as stopping boundaries
    # 4️⃣ re.DOTALL
    # Makes . match newlines also

    # 👉 Without this:

    # . stops at line breaks ❌

    # 👉 With this:

    # Multi-line step content is captured ✅

    steps = [s.strip() for s in steps if s.strip()]

    return {
        **state,
        "reasoning_chain": chain,
        "reasoning_steps": steps,
    }


# ── Node 2: Validate Reasoning ────────────────────────────
def validate_reasoning(state: CoTState) -> CoTState:
    """
    A second LLM call to verify the reasoning is logically sound.
    This is the 'critic' in a producer-critic architecture.
    """
    system = """You are a logical reasoning validator. 
Check if reasoning is: complete, consistent, correct.
Respond with JSON only: {"valid": bool, "issues": [str], "confidence": float}"""

    human = f"""Problem: {state['problem']}
Reasoning: {state['reasoning_chain'][:2000]}

Is this reasoning logically valid and complete?"""

    import json
    response = llm_precise.invoke([
        SystemMessage(content=system),
        HumanMessage(content=human)
    ])
    try:
        validation = json.loads(response.content)
        return {
            **state,
            "is_valid_reasoning": validation.get("valid", True),
            "confidence": validation.get("confidence", 0.7)
        }
    except:
        return {**state, "is_valid_reasoning": True, "confidence": 0.6}


# ── Node 3: Extract Final Answer ──────────────────────────
def extract_answer(state: CoTState) -> CoTState:
    """Extracts the clean final answer from the reasoning chain."""
    system = """Extract the single, precise final answer from the reasoning chain.
Be concise. State ONLY the answer. No explanation."""

    human = f"""Problem: {state['problem']}
Reasoning chain: {state['reasoning_chain']}

What is the final, definitive answer?"""

    response = llm_precise.invoke([
        SystemMessage(content=system),
        HumanMessage(content=human)
    ])
    return {**state, "final_answer": response.content.strip()}


# ── Node 4: Retry with Better Prompt ─────────────────────
def retry_reasoning(state: CoTState) -> CoTState:
    """Called when reasoning is invalid. Adds explicit structure."""
    structured_prompt = f"""The previous reasoning had issues. 
Let's be more systematic.

Problem: {state['problem']}

Use this EXACT format:
GIVEN: [list all given information]
FIND: [what we need to determine]  
APPROACH: [which method/formula to use]
STEP 1: [first calculation]
STEP 2: [second calculation]
...
ANSWER: [final answer with units]"""

    response = llm.invoke([HumanMessage(content=structured_prompt)])
    return {
        **state,
        "reasoning_chain": response.content,
        "retry_count": state["retry_count"] + 1,
        "is_valid_reasoning": True  # Assume valid after retry
    }



# ── Conditional Routing ───────────────────────────────────
def route_after_validation(state: CoTState) -> Literal["extract_answer", "retry_reasoning"]:
    if state["is_valid_reasoning"] or state["retry_count"] >= 2:
        return "extract_answer"
    return "retry_reasoning"



# ── Graph Construction ────────────────────────────────────
def build_cot_graph():
    builder = StateGraph(CoTState)

    builder.add_node("generate_reasoning", generate_reasoning)
    builder.add_node("validate_reasoning", validate_reasoning)
    builder.add_node("extract_answer", extract_answer)
    builder.add_node("retry_reasoning", retry_reasoning)

    builder.add_edge(START, "generate_reasoning")
    builder.add_edge("generate_reasoning", "validate_reasoning")
    builder.add_conditional_edges("validate_reasoning", route_after_validation)
    builder.add_edge("retry_reasoning", "extract_answer")
    builder.add_edge("extract_answer", END)

    return builder.compile()



graph = build_cot_graph()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

result = graph.invoke({
    "problem": "A train travels 120km in 2 hours, then 80km in 1.5 hours. What is its average speed for the entire journey?",
    "problem_type": "math",
    "reasoning_chain": "",
    "reasoning_steps": [],
    "is_valid_reasoning": False,
    "final_answer": "",
    "confidence": 0.0,
    "retry_count": 0
})

for i in result["reasoning_steps"]:
    print(i)
print("*"*100)

print(f"Answer: {result['final_answer']}")
print(f"Steps: {len(result['reasoning_steps'])}")
print(f"Confidence: {result['confidence']:.0%}")

Output :

Step 1: Calculate the total distance traveled by the train.

→ The train travels 120 km in the first part of the journey and 80 km in the second part.

→ Total distance = 120 km + 80 km

→ Total distance = 200 km

Step 2: Calculate the total time taken for the entire journey.

→ The train takes 2 hours for the first part and 1.5 hours for the second part.

→ Total time = 2 hours + 1.5 hours

→ Total time = 3.5 hours

Step 3: Calculate the average speed for the entire journey.

→ Average speed is given by the formula: Average speed = Total distance / Total time

→ Average speed = 200 km / 3.5 hours

→ Average speed = 57.14 km/h (rounded to two decimal places)

Intermediate Conclusion: The average speed of the train for the entire journey is 57.14 km/h.

****************************************************************************************************

Answer: 57.14 km/h

Steps: 3

Confidence: 95%

5) Context-Aware Prompting

Context Aware prompting dynamically assembles relevant information - user profile, retrieved documents, conversation history - into prompt at runtime.

The model's answer is as good as the context it receives. Without context, it guesses. With context, it reasons from facts.

System role carries - Who the model is, what it can do - When it changes ? (Once per app)
User profile carries - Name, plan, preference, past behavior - When it changes? (Once per session)
Retrieved facts carries - Relevant docs, policies fetched for this query(RAG) - When it changes ? (Every query)
Conversation History carries - Prior turns in this session - When it changes?(Every turn)
Current message carries - The actual user question - When it changes? (Every turn)

Why LangGraph fits perfectly ?

Each layer of context is a node that enriches the state before the LLM call. The state object carries everything forward - nothing is passed a raw arguments. This makes the pipeline easy to test, debug.

Scenario of below example :

A customer support bot for a financial Saas product called CloudBase. It knows the users plan, fetches the right policy doc, and maintains conversation memory - so it never asks what it already know.

Implementation of Context-Aware prompting :

# ── Imports ───────────────────────────────────────────────

from typing import TypedDict, Optional, List

from langgraph.graph import StateGraph, START, END

from langchain_openai import ChatOpenAI

from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

from IPython.display import Image, display

from dotenv import load_dotenv

import os

load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o", temperature=0.2)

# ── State ─────────────────────────────────────────────────

# The state carries ALL context through every node.

# Each node reads what it needs and writes what it produces.

class SupportState(TypedDict):

    # Input

    user_id: str                      # Who is asking

    user_message: str                 # What they asked

    # Context built up by nodes

    user_profile: Optional[dict]      # Name, plan, preferences

    retrieved_docs: Optional[str]     # Relevant policy / FAQ text

    conversation_history: List[dict]  # Previous turns [{role, content}]

    # Output

    response: Optional[str]           # Final answer

# ── Fake Data Sources ─────────────────────────────────────

# In production these would be real DB / vector store calls.

# Here we simulate them with dicts.

USER_DB = {

    "user_001": {

        "name": "Anil",

        "plan": "Pro",

        "seats": 10,

        "joined": "2023-06",

        "preferred_language": "English",

        "open_tickets": 1,

},

    "user_002": {

        "name": "Priya",

        "plan": "Starter",

        "seats": 3,

        "joined": "2024-01",

        "preferred_language": "English",

        "open_tickets": 0,

},

}

# Simulated knowledge base — key = topic keyword, value = policy snippet

KNOWLEDGE_BASE = {

    "refund": """

CloudBase Refund Policy:

- Pro plan: Full refund within 30 days of billing.

- Starter plan: Refund only within 7 days.

- Annual plans: Prorated refund after 30 days.

- To request: email billing@cloudbase.io with your invoice number.

""",

    "upgrade": """

CloudBase Upgrade Policy:

- Upgrades take effect immediately.

- You are billed the prorated difference for the current billing cycle.

- To upgrade: go to Settings > Billing > Change Plan.

- Downgrading takes effect at the next billing cycle.

""",

    "seats": """

CloudBase Seat Management:

- Pro plan: up to 50 seats. Starter: up to 5 seats.

- Adding seats: Settings > Team > Add Member.

- Each seat is billed at your plan's per-seat rate.

- Removing seats takes effect at the next billing cycle.

""",

    "api": """

CloudBase API Access:

- API available on Pro plan and above.

- Rate limit: 1000 requests/hour on Pro, 5000 on Enterprise.

- API keys: Settings > Developer > API Keys.

- Docs: docs.cloudbase.io/api

""",

    "default": """

CloudBase General Support:

- Support hours: Monday-Friday, 9am-6pm IST.

- Email: support@cloudbase.io

- Response time: Pro plan < 4 hours, Starter < 24 hours.

"""

}

# ── Node 1: load_user_profile ─────────────────────────────

# Fetches the user record from the database.

# This gives the LLM personal context — it can say "Hi Anil" and

# know the user is on the Pro plan without ever asking.

def load_user_profile(state: SupportState) -> SupportState:

    user_id = state["user_id"]

    profile = USER_DB.get(user_id, {

        "name": "Valued Customer",

        "plan": "Unknown",

        "seats": 0

})

    print(f"[Node 1] Loaded profile for {profile['name']} ({profile['plan']} plan)")

    return {**state, "user_profile": profile}

# ── Node 2: retrieve_context ──────────────────────────────

# Simulates a RAG (Retrieval-Augmented Generation) lookup.

# In production: embed the query, vector-search a knowledge base,

# return the top-k chunks.

# Here: keyword match against our small KNOWLEDGE_BASE dict.

def retrieve_context(state: SupportState) -> SupportState:

    query = state["user_message"].lower()

    # Find the most relevant knowledge base entry

    matched_doc = KNOWLEDGE_BASE["default"]

    for keyword, doc in KNOWLEDGE_BASE.items():

        if keyword in query:

            matched_doc = doc

            print(f"[Node 2] Retrieved doc for topic: '{keyword}'")

            break

    else:

        print("[Node 2] No specific topic found — using general support doc")

    return {**state, "retrieved_docs": matched_doc}

# ── Node 3: generate_response ─────────────────────────────

# This is where context-aware prompting happens.

# We assemble ALL context layers into a single prompt:

#   1. System role (who the bot is)

#   2. User profile (personalised facts)

#   3. Retrieved docs (relevant policy/FAQ)

#   4. Conversation history (prior turns)

#   5. Current message (the actual question)

def generate_response(state: SupportState) -> SupportState:

    profile = state["user_profile"]

    docs     = state["retrieved_docs"]

    history  = state.get("conversation_history", [])

    question = state["user_message"]

    # ── Layer 1: System role ──────────────────────────────

    system_prompt = """You are Aria, a friendly and knowledgeable customer support agent for CloudBase.

Your behaviour:

- Always address the user by their first name

- Tailor your answer to their specific plan (they shouldn't have to repeat it)

- Base your answer strictly on the provided policy documentation

- If the policy does not cover their question, say so honestly

- Be concise — 3-5 sentences maximum unless steps are needed

- Never make up features, prices, or policies

"""

    # ── Layer 2: User profile context ────────────────────

    profile_context = f"""--- User Profile ---

Name: {profile['name']}

Current plan: {profile['plan']}

Seats: {profile['seats']}

Member since: {profile.get('joined', 'N/A')}

Open support tickets: {profile.get('open_tickets', 0)}

"""

    # ── Layer 3: Retrieved documentation ─────────────────

    docs_context = f"""--- Relevant Policy Documentation ---

{docs.strip()}

"""

    # ── Assemble system message (layers 1 + 2 + 3) ───────

    full_system = system_prompt + "\n" + profile_context + "\n" + docs_context

    # ── Layer 4: Conversation history ─────────────────────

    # Reconstruct prior turns as LangChain message objects

    messages = [SystemMessage(content=full_system)]

    for turn in history:

        if turn["role"] == "user":

            messages.append(HumanMessage(content=turn["content"]))

        elif turn["role"] == "assistant":

            messages.append(AIMessage(content=turn["content"]))

    # ── Layer 5: Current user message ─────────────────────

    messages.append(HumanMessage(content=question))

    print(f"[Node 3] Sending {len(messages)} messages to LLM "

          f"({len(history)} history turns + current message)")

    # ── LLM call ──────────────────────────────────────────

    result = llm.invoke(messages)

    return {**state, "response": result.content}

# ── Graph Construction ─────────────────────────────────────

def build_support_graph():

    builder = StateGraph(SupportState)

    builder.add_node("load_user_profile", load_user_profile)

    builder.add_node("retrieve_context",  retrieve_context)

    builder.add_node("generate_response", generate_response)

    builder.add_edge(START,               "load_user_profile")

    builder.add_edge("load_user_profile", "retrieve_context")

    builder.add_edge("retrieve_context",  "generate_response")

    builder.add_edge("generate_response", END)

    return builder.compile()

graph = build_support_graph()

display(Image(graph.get_graph().draw_mermaid_png()))

# ── Multi-turn helper ─────────────────────────────────────

# Simulates a real conversation by accumulating history between turns.

# Each turn adds the question + answer to conversation_history,

# so the LLM remembers what was said.

class SupportSession:

    def __init__(self, user_id: str):

        self.user_id = user_id

        self.history = []  # Grows with each turn

    def ask(self, question: str) -> str:

        print(f"\n{'='*60}")

        print(f"User: {question}")

        print(f"History turns so far: {len(self.history)}")

        print('='*60)

        result = graph.invoke({

            "user_id": self.user_id,

            "user_message": question,

            "user_profile": None,

            "retrieved_docs": None,

            "conversation_history": self.history,

            "response": None,

})

        answer = result["response"]

        # Accumulate history for next turn

        self.history.append({"role": "user",      "content": question})

        self.history.append({"role": "assistant", "content": answer})

        print(f"\nAria: {answer}")

        return answer

# ── Demo: Anil (Pro plan) — 3-turn conversation ───────────
# Watch how:
#  Turn 1 — Aria knows Anil's name and plan without being told
#  Turn 2 — Aria retrieves the correct refund policy for Pro plan
#  Turn 3 — Aria remembers from Turn 2 without the user repeating

session_anil = SupportSession(user_id="user_001")

session_anil.ask("Hi, I have a question about my account.")

============================================================ User: Hi, I have a question about my account. History turns so far: 0 ============================================================ [Node 1] Loaded profile for Anil (Pro plan) [Node 2] No specific topic found — using general support doc [Node 3] Sending 2 messages to LLM (0 history turns + current message) Aria: Hi Anil, I'd be happy to help with your account question.

Could you please provide a bit more detail about what you need assistance with?

"Hi Anil, I'd be happy to help with your account question.

Could you please provide a bit more detail about what you need assistance with?"

session_anil.ask("Can I get a refund for this month's payment?")

============================================================ User: Can I get a refund for this month's payment? History turns so far: 6 ============================================================ [Node 1] Loaded profile for Anil (Pro plan) [Node 2] Retrieved doc for topic: 'refund' [Node 3] Sending 8 messages to LLM (6 history turns + current message) Aria: Since you're on the Pro plan, Anil, you are eligible for a full refund if you

request it within 30 days of billing. If you are within this timeframe,

please email billing@cloudbase.io with your invoice number to initiate

the refund process. Let me know if you need further assistance!

"Since you're on the Pro plan, Anil, you are eligible for a full refund if you

request it within 30 days of billing.

If you are within this timeframe, please email billing@cloudbase.io with your

invoice number to initiate the refund process. Let me know if you need further

assistance!"

# Turn 3: References the previous answer without re-stating it

session_anil.ask("What email should I send that refund request to?")

============================================================ User: What email should I send that refund request to? History turns so far: 2 ============================================================ [Node 1] Loaded profile for Anil (Pro plan) [Node 2] Retrieved doc for topic: 'refund' [Node 3] Sending 4 messages to LLM (2 history turns + current message) Aria: Anil, for a refund request on your Pro plan, you should email

billing@cloudbase.io with your invoice number. If it's within 30 days of billing,

you are eligible for a full refund. Let me know if there's anything else you need!

"Anil, for a refund request on your Pro plan, you should email billing@cloudbase.io

with your invoice number. If it's within 30 days of billing, you are eligible for a

full refund. Let me know if there's anything else you need!"

# ── Demo: Priya (Starter plan) — same question, different answer ──

# Because the user profile is different, the refund policy window

# will be 7 days (Starter) not 30 days (Pro) — same graph, same question.

session_priya = SupportSession(user_id="user_002")

session_priya.ask("Can I get a refund for this month's payment?")

============================================================ User: Can I get a refund for this month's payment? History turns so far: 0 ============================================================ [Node 1] Loaded profile for Priya (Starter plan) [Node 2] Retrieved doc for topic: 'refund' [Node 3] Sending 2 messages to LLM (0 history turns + current message) Aria: Hi Priya, since you're on the Starter plan, you are eligible for a refund

only if you request it within 7 days of the billing date. If you're within this

timeframe, please email billing@cloudbase.io with your invoice number to initiate

the refund process. If it's been more than 7 days, unfortunately,

a refund isn't possible according to our policy.

"Hi Priya, since you're on the Starter plan, you are eligible for a refund only if

you request it within 7 days of the billing date. If you're within this timeframe,

please email billing@cloudbase.io with your invoice number to initiate the refund

process. If it's been more than 7 days, unfortunately, a refund isn't possible

according to our policy."

# ── Demo: API question routed to correct doc ───────────────

session_anil.ask("How do I get my API key?")

============================================================ User: How do I get my API key? History turns so far: 4 ============================================================ [Node 1] Loaded profile for Anil (Pro plan) [Node 2] Retrieved doc for topic: 'api' [Node 3] Sending 6 messages to LLM (4 history turns + current message) Aria: To get your API key, Anil, you need to navigate to the Settings section of your

CloudBase account. From there, go to Developer and then API Keys.

You'll find the option to generate or view your API key.

If you have any trouble accessing it, feel free to let me know!

"To get your API key, Anil, you need to navigate to the Settings section of your

CloudBase account. From there, go to Developer and then API Keys.

You'll find the option to generate or view your API key. If you have any

trouble accessing it, feel free to let me know!"

Explanation :

We replicated a database for above example i.e. USER_DB
Also we created a knowledge base, to set some context i.e. KNOWLEDGE_BASE

Key-design principles :

Context belongs to state, not in function arguments. Every node reads from and writes to SupportState. Nothing is passed as a raw string between nodes. This makes each node independently testable.
Separate retrieval from generation retrieve_context and generate_response re different nodes. You can swap in a real vector store without touching the LLM node.
History is explicit Conversation History is a list in State. You control what goes in, what gets trimmed, and how it's formatted. No magic.
System prompt is assembled at run time. The system prompt is not static - it built fresh and turn from live profile + fresh retrieved docs + current history. This is what makes is truly context aware.

Extending information for production grade use case:

Replace retrieval_context with a real FAISS/ Chroma/ Pinecone backup
Add a classify_intent node before retrieval to pick the right index
Add a trim_history node to keep history under the token limit
Add a check_sentiment node to escalate angry users to a human agent

6) Prompt Chaining

Prompt Chaining breaks a complex task into a sequence of focused LLM calls where each steps output becomes the next steps structured input.

Why single prompt fails for complex tasks ?

Asking one prompt to research + draft + edit + quality check all at once overloads the models attention. It trades depth for breadth and produces mediocre work at every stage.

Chaining fixing this by giving each LLM call ONE clear job with one clear output format.

3 rules of good prompt chain :

One job per step - Each prompt does exactly one thing
Structured handoff - Output of step N is explicitly formatted as input for step N + 1
Gate before continuing - A quality check can loop back rather than pass bad output forward

Scenario in this implementation :

A 4-step blog spot pipeline:

research_topic -----> write_draft ------> quality_check -------> format_output

(revise quality_check ----> write_draft - if quality_score < threshold)

The quality gate is the key difference from a plain sequential pipeline. If the draft is weak, the critique is injected back into write_draft as context.

Implementation of Prompt Chaining :

# ── Imports ───────────────────────────────────────────────
from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from IPython.display import Image, display
from dotenv import load_dotenv
import json, os

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o", temperature=0.4)


# ── State ─────────────────────────────────────────────────
# State accumulates as the chain progresses.
# Each node reads upstream fields and writes its own output field.
# Nothing is lost — every intermediate result is inspectable.

class BlogState(TypedDict):
    # Input
    topic: str                   # The blog topic the user provides
    target_audience: str         # Who the article is for
    word_count: int              # Approximate target word count

    # Chain outputs — each node fills one of these
    research_notes: Optional[str]    # Step 1 output: structured facts
    draft: Optional[str]             # Step 2 output: full article draft
    quality_score: Optional[int]     # Step 3 output: 1-10 score
    quality_critique: Optional[str]  # Step 3 output: detailed critique
    final_article: Optional[str]     # Step 4 output: formatted article

    # Control
    revision_count: int          # How many times write_draft has run
    max_revisions: int           # Safety ceiling to prevent infinite loops


# ── Step 1: research_topic ────────────────────────────────
# SINGLE JOB: Gather facts about the topic.
# OUTPUT FORMAT: Numbered list of specific, verifiable facts.
#
# Key design choice: we ask for a structured list, NOT prose.
# Prose from step 1 would be hard for step 2 to consume cleanly.
# A numbered list is easy to inject directly into the draft prompt.

def research_topic(state: BlogState) -> BlogState:
    print(f"\n[Step 1] Researching: '{state['topic']}'")

    prompt = f"""You are a research assistant. Your ONLY job is to gather facts.

Topic: {state['topic']}
Target audience: {state['target_audience']}

Produce a numbered list of exactly 8 specific, interesting facts about this topic.
Rules:
- Each fact must be on its own line, numbered 1-8
- Each fact must be concrete and specific (no vague generalisations)
- Include at least 2 surprising or counterintuitive facts
- No prose, no headings, no introduction — just the numbered list
"""

    response = llm.invoke([HumanMessage(content=prompt)])
    notes = response.content.strip()

    print(f"[Step 1] Research complete — {len(notes.splitlines())} facts gathered")
    return {**state, "research_notes": notes}


# ── Step 2: write_draft ───────────────────────────────────
# SINGLE JOB: Write a blog post draft using the research notes.
# INPUT: research_notes (structured facts from step 1)
#        quality_critique (injected on revision rounds)
#
# Key design choice: when revising, we inject the FULL critique
# from the quality check. This is the "chain feedback loop" —
# the output of a downstream step flows back into an upstream step.

def write_draft(state: BlogState) -> BlogState:
    revision = state["revision_count"]
    print(f"\n[Step 2] Writing draft (revision {revision + 1})")

    # On revision rounds, inject the critique as a correction instruction
    critique_section = ""
    if state.get("quality_critique") and revision > 0:
        critique_section = f"""
IMPORTANT — This is revision #{revision}. Your previous draft was rejected.
Here is the critique you MUST address in this revision:

{state['quality_critique']}

Fix ALL the issues above while keeping the good parts.
"""

    prompt = f"""You are an expert blog writer. Your ONLY job is to write a draft.

Topic: {state['topic']}
Target audience: {state['target_audience']}
Target word count: approximately {state['word_count']} words

Research notes to base the article on:
{state['research_notes']}
{critique_section}
Write a complete, engaging blog post. Include:
- A compelling headline
- An opening hook that grabs attention
- Clear sections with subheadings
- A strong conclusion with a takeaway

Use ALL 8 research facts naturally within the article.
Do not add a word count label. Write the article only.
"""

    response = llm.invoke([HumanMessage(content=prompt)])
    draft = response.content.strip()
    word_count = len(draft.split())

    print(f"[Step 2] Draft written — {word_count} words")
    return {
        **state,
        "draft": draft,
        "revision_count": revision + 1
    }



# ── Step 3: quality_check ─────────────────────────────────
# SINGLE JOB: Score the draft and produce a detailed critique.
# OUTPUT FORMAT: JSON with score + critique (so the router can parse it cleanly)
#
# Key design choice: we ask for JSON output so the score is
# machine-readable. The router reads state["quality_score"] to decide
# whether to loop back or continue. No string parsing needed.

def quality_check(state: BlogState) -> BlogState:
    print(f"\n[Step 3] Running quality check on draft")

    prompt = f"""You are a senior editor. Your ONLY job is to evaluate this draft.

Topic: {state['topic']}
Target audience: {state['target_audience']}

Draft to evaluate:
{state['draft']}

Score this draft on these 4 dimensions (1-10 each):
1. Clarity — Is it easy to read and understand?
2. Engagement — Does it hold attention from start to finish?
3. Accuracy — Are the facts used correctly and specifically?
4. Structure — Does it have a clear flow with good headings?

Reply with ONLY this JSON — nothing else, no markdown code fences:
{{
  "clarity": <1-10>,
  "engagement": <1-10>,
  "accuracy": <1-10>,
  "structure": <1-10>,
  "overall": <average of above, rounded to nearest int>,
  "critique": "<specific, actionable feedback — what to fix and how, 3-5 sentences>",
  "strengths": "<what is already working well, 1-2 sentences>"
}}
"""

    response = llm.invoke([HumanMessage(content=prompt)])

    try:
        result = json.loads(response.content.strip())
        score   = int(result["overall"])
        critique = (
            f"Scores — Clarity: {result['clarity']}/10, "
            f"Engagement: {result['engagement']}/10, "
            f"Accuracy: {result['accuracy']}/10, "
            f"Structure: {result['structure']}/10\n\n"
            f"Strengths: {result['strengths']}\n\n"
            f"What to fix: {result['critique']}"
        )
    except (json.JSONDecodeError, KeyError):
        # Fallback if LLM doesn't return clean JSON
        score   = 6
        critique = response.content.strip()

    print(f"[Step 3] Quality score: {score}/10")
    return {
        **state,
        "quality_score": score,
        "quality_critique": critique
    }



# ── Step 4: format_output ─────────────────────────────────
# SINGLE JOB: Final polish and formatting.
# INPUT: The approved draft (score >= threshold)
#
# This step adds SEO metadata, fixes minor phrasing, and
# produces the publication-ready version.
# It does NOT rewrite — that's write_draft's job.

def format_output(state: BlogState) -> BlogState:
    print(f"\n[Step 4] Formatting final output")

    prompt = f"""You are a content editor doing final polish. Your ONLY job is formatting.

Take this approved draft and produce the publication-ready version:

{state['draft']}

Your tasks (do NOT rewrite or change substance):
1. Fix any minor grammatical or punctuation issues
2. Ensure all subheadings use consistent formatting (## for H2)
3. Add a meta description (1 sentence, max 160 chars) at the very top labelled: Meta:
4. Add suggested tags at the bottom labelled: Tags: (comma-separated, 4-6 tags)
5. Add estimated read time at the top labelled: Read time: (e.g. "4 min read")

Output the complete formatted article. Nothing else.
"""

    response = llm.invoke([HumanMessage(content=prompt)])
    final = response.content.strip()

    print(f"[Step 4] Article ready — {len(final.split())} words")
    return {**state, "final_article": final}



# ── Router: should_revise ─────────────────────────────────
# This is the quality gate — the heart of what makes this
# a chain rather than a simple pipeline.
#
# Decision logic:
#   score >= 7  → pass → format_output
#   score < 7   → fail → write_draft (with critique injected)
#   max revisions hit → force pass to avoid infinite loop

QUALITY_THRESHOLD = 7  # Minimum score to pass quality gate

def should_revise(
    state: BlogState
) -> Literal["write_draft", "format_output"]:
    score    = state.get("quality_score", 0)
    revisions = state.get("revision_count", 0)
    max_rev  = state.get("max_revisions", 2)

    if revisions >= max_rev:
        print(f"[Router] Max revisions ({max_rev}) reached — forcing pass")
        return "format_output"

    if score >= QUALITY_THRESHOLD:
        print(f"[Router] Score {score}/10 >= {QUALITY_THRESHOLD} — PASS → format_output")
        return "format_output"
    else:
        print(f"[Router] Score {score}/10 < {QUALITY_THRESHOLD} — FAIL → revise")
        return "write_draft"


# ── Graph Construction ─────────────────────────────────────
def build_blog_chain():
    builder = StateGraph(BlogState)

    # Register all nodes
    builder.add_node("research_topic", research_topic)
    builder.add_node("write_draft",    write_draft)
    builder.add_node("quality_check",  quality_check)
    builder.add_node("format_output",  format_output)

    # Linear chain: START → research → draft → quality
    builder.add_edge(START,            "research_topic")
    builder.add_edge("research_topic", "write_draft")
    builder.add_edge("write_draft",    "quality_check")

    # Quality gate: conditional branch after quality_check
    builder.add_conditional_edges(
        "quality_check",
        should_revise,
        {
            "write_draft":   "write_draft",   # loop back with critique
            "format_output": "format_output"  # continue to finish
        }
    )

    builder.add_edge("format_output", END)

    return builder.compile()


graph = build_blog_chain()
display(Image(graph.get_graph().draw_mermaid_png()))



# ── Run the chain ─────────────────────────────────────────
result = graph.invoke({
    "topic":           "Why sleep is more important than most people realise",
    "target_audience": "busy professionals aged 25-40",
    "word_count":      600,
    "research_notes":  None,
    "draft":           None,
    "quality_score":   None,
    "quality_critique":None,
    "final_article":   None,
    "revision_count":  0,
    "max_revisions":   2,
})

print("\n" + "="*60)
print(f"Total revisions: {result['revision_count']}")
print(f"Final quality score: {result['quality_score']}/10")
print("="*60)
print(f"Final Response : {result["final_article"]}")

Output :

# ── Inspect intermediate outputs ──────────────────────────
# One of the biggest advantages of chaining:
# every step's output is preserved in state for inspection.

print("RESEARCH NOTES (Step 1 output)")
print("-" * 40)
print(result["research_notes"])

print("QUALITY CRITIQUE (Step 3 output)")
print("-" * 40)
print(result["quality_critique"])

Output :

print("FINAL ARTICLE (Step 4 output)")
print("-" * 40)
print(result["final_article"])

Output :

# ── Try a different topic ─────────────────────────────────
result2 = graph.invoke({
    "topic":           "The surprising history of the QWERTY keyboard layout",
    "target_audience": "general tech enthusiasts",
    "word_count":      500,
    "research_notes":  None,
    "draft":           None,
    "quality_score":   None,
    "quality_critique":None,
    "final_article":   None,
    "revision_count":  0,
    "max_revisions":   2,
})

print(f"\nRevisions taken: {result2['revision_count']}")
print(f"Final score: {result2['quality_score']}/10")
print("\nFINAL ARTICLE:")
print(result2["final_article"])

7) Semantic Role Labelling (SRL)

Semantic Role labelling is mainly used in Natural Language Processing(NLP) for named entity recognition.

What is Named entity recognition ?

SRL answers the question: Who did what to whom, where, when, why, and how?

Code is huge & very rarely we use this technique, if interested, please feel free to download code from my GitHub repo : https://github.com/amathe1/AI-code/blob/main/3_Prompt_Engineering/07_Semantic-Role-Labeling.ipynb

Just observe the output :

Note : This concept is majorly used in NLP (not in Agentic AI), understand that it is capturing the entire semanticity of the entire sentence.

8) ReACT (Reason + Action)

ReACT is one of the powerful framework in Agentic AI. This is almost similar to CoT(Chain of Thoughts) but only difference is CoT doesn't have capability to query external tools but ReACT can query externals tools as well.

ReACT is a paradigm which interleaves Thought -> Action -> Observation cycles until reaching a final answer.

Unlike pure CoT which reasons in isolation, ReACT can query external tools - web search, code execution, databases, APIs - and incorporate the results into subsequent reasoning. This prevents hallucination by grounding claims in verified information.

LangGraph is the ideal framework for ReACT because its cycle detection, tool integration, and conditional routing map exactly to the Thought - Action - Observation loop.

The ReACT loop :

Thought - Reason about what information is needed.
Action - Call a tool with specific parameters
Observation - Incorporate tool result into context
Repeat - Until enough information to answer confidently
Final Answer - Synthesize all observations
Max steps - Always set a ceiling to prevent loops

Implementation of ReACT prompting :

# ─────────────────────────────────────────────────────────

# ReAct Agent with LangGraph

# Pattern: Cyclical Thought → Action → Observation loop

# ─────────────────────────────────────────────────────────

from typing import TypedDict, Optional, Literal, Annotated

from langgraph.graph import StateGraph, START, END

from langchain_openai import ChatOpenAI

from langchain_core.tools import tool

from langchain_core.messages import AIMessage, HumanMessage, ToolMessage

from langgraph.prebuilt import ToolNode

import operator, json, requests

from dotenv import load_dotenv

from IPython.display import Image, display

from langchain_community.tools.tavily_search import TavilySearchResults

from serpapi import GoogleSearch

load_dotenv()

import os

os.environ["TAVILY_API_KEY"]=os.getenv("TAVILY_API_KEY")

os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

os.environ["SERPAPI_API_KEY"]=os.getenv("SERPAPI_API_KEY")

# ── Tool Definitions ──────────────────────────────────────

@tool

def web_search(query: str) -> str:

    """Search the web for current information about a topic."""

    tavily = TavilySearchResults(max_results=3)

    results = tavily.invoke(query)

    return str(results)

@tool

def python_repl(code: str) -> str:

    """Execute Python code and return the output. Use for ALL calculations.

    Always call this for any math. The result of the last expression is auto-captured.

"""

    import io, contextlib, ast, math

    output = io.StringIO()

    globs = {"__builtins__": __builtins__, "math": math}

    try:

        with contextlib.redirect_stdout(output):

            try:

                # Capture bare expressions (e.g. math.sqrt(1764))

                tree = ast.parse(code, mode="eval")

                result = eval(compile(tree, "<string>", "eval"), globs)

                if result is not None:

                    print(result)

            except SyntaxError:

                # Multi-line / statement code

                exec(code, globs)

        captured = output.getvalue().strip()

        return captured if captured else "Code executed (no output)"

    except Exception as e:

        return f"Error: {e}"

@tool

def get_weather(city: str) -> str:

    """Get current weather for a city."""

    params = {

        "engine": "google",

        "q": f"weather in {city}",

        "api_key": os.getenv("SERPAPI_API_KEY")

}

    search = GoogleSearch(params)

    results = search.get_dict()

    weather = results.get("answer_box", {})

    if not weather:

        return f"Weather data not found for {city}"

    return (

        f"Weather in {city}: "

        f"{weather.get('temperature')}°C, "

        f"{weather.get('weather')}, "

        f"Humidity: {weather.get('humidity')}"

)

TOOLS = [web_search, python_repl, get_weather]

# ── State Schema ──────────────────────────────────────────

class ReActState(TypedDict):

    messages: Annotated[list, operator.add]  # Message history

    question: str

    step_count: int

    max_steps: int

    tool_calls_made: list[str]

    final_answer: Optional[str]

# ── LLM with Tool Binding ─────────────────────────────────

llm = ChatOpenAI(model="gpt-4o", temperature=0.0)

llm_with_tools = llm.bind_tools(TOOLS)

from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

# ── Node 1: Reasoning (Thought + Action Selection) ────────

def reason_and_act(state: ReActState) -> ReActState:

"""

    The core ReAct node. The LLM sees the full message history

    and decides whether to:

    1. Call a tool

    2. Provide a final answer

"""

    system_prompt = """You are a ReAct agent. You MUST follow this strict protocol:

STRICT RULES — NO EXCEPTIONS:

1. NEVER compute math in your head. ALWAYS call python_repl for any calculation.

2. NEVER guess weather. ALWAYS call get_weather for any weather question.

3. NEVER call web_search unless explicitly asked for web/news info.

4. You may call MULTIPLE tools in a single step if needed.

5. After ALL tool results are back and you have everything you need, respond with:

   Final Answer: <your complete answer here>

6. DO NOT write "Action: ..." in text. Use actual tool calls instead.

7. DO NOT answer until you have real tool output for every sub-question.

"""

    # ✅ Use proper message objects

    messages = [

        SystemMessage(content=system_prompt),

        *state["messages"]

]

    # ✅ If at max steps, force a final answer without tools

    at_limit = state.get("step_count", 0) >= state.get("max_steps", 10) - 1

    if at_limit:

        from langchain_openai import ChatOpenAI

        llm_no_tools = ChatOpenAI(model="gpt-4o", temperature=0.0)

        messages_forced = messages + [

            SystemMessage(content="You have reached the maximum number of steps. Summarize what you know and provide a Final Answer now.")

]

        response = llm_no_tools.invoke(messages_forced)

    else:

        response = llm_with_tools.invoke(messages)

    # ✅ Safe tool call extraction (new format)

    tool_calls = []

    if hasattr(response, "tool_calls") and response.tool_calls:

        tool_calls = [tc.get("name") for tc in response.tool_calls]

    # ✅ Return only the DELTA (new message) — LangGraph reducer adds it automatically

    return {

        "messages": [response],

        "step_count": state.get("step_count", 0) + 1,

        "tool_calls_made": state.get("tool_calls_made", []) + tool_calls,

}

# ── Node 2: Tool Execution (using prebuilt ToolNode) ──────

from langchain_core.messages import ToolMessage

def execute_tools(state: ReActState) -> ReActState:

    last_message = state["messages"][-1]

    if not hasattr(last_message, "tool_calls") or not last_message.tool_calls:

        return state

    tool_outputs = []

    for tc in last_message.tool_calls:

        tool_name = tc["name"]

        tool_args = tc.get("args", {})

        if tool_name == "python_repl":

            result = python_repl.invoke(tool_args)

        elif tool_name == "get_weather":

            result = get_weather.invoke(tool_args)

        elif tool_name == "web_search":

            result = web_search.invoke(tool_args)

        else:

            result = "Unknown tool"

        tool_outputs.append(

            ToolMessage(

                content=str(result),

                tool_call_id=tc["id"]

)

)

    # ✅ Return only the DELTA — LangGraph reducer appends automatically

    return {

        "messages": tool_outputs

}

# ── Conditional Router ────────────────────────────────────

def should_continue(state: ReActState):

    # ✅ Guard: stop if max steps reached

    if state.get("step_count", 0) >= state.get("max_steps", 10):

        return "end"

    last_message = state["messages"][-1]

    # ✅ If LLM called tools → go to tool node

    if hasattr(last_message, "tool_calls") and last_message.tool_calls:

        return "tools"

    # ✅ Stop if final answer or no more tool calls

    return "end"

# ── Graph Construction ────────────────────────────────────

def build_react_graph():

    builder = StateGraph(ReActState)

    builder.add_node("reason_and_act", reason_and_act)

    builder.add_node("tools", execute_tools)

    builder.add_edge(START, "reason_and_act")

    builder.add_conditional_edges(

        "reason_and_act",

        should_continue,

        {"tools": "tools", "end": END}

)

    builder.add_edge("tools", "reason_and_act")   # Observation → back to reasoning

    return builder.compile()

# ── Usage ─────────────────────────────────────────────────
graph = build_react_graph()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

result = graph.invoke({
    "messages": [HumanMessage(content="What is the square root of 1764, and what's 
the weather in that many degrees Celsius in Tokyo?")],
    "question": "Math + weather query",
    "step_count": 0,
    "max_steps": 2,
    "tool_calls_made": [],
    "final_answer": None
})

print(f"Steps taken: {result['step_count']}")
print(f"Tools used: {result['tool_calls_made']}")
print(f"Final: {result['messages'][-1].content}")

Output :

Steps taken: 2 Tools used: ['python_repl', 'get_weather'] Final: The square root of 1764 is 42.

The weather in Tokyo is currently 52°C with rain and 96% humidity. Final Answer: The square root of 1764 is 42.

The weather in Tokyo at 42°C is not available, but currently,

it is 52°C with rain and 96% humidity.

Question : What is the square root of 1764, and what's

the weather in that many degrees Celsius in Tokyo?

LLM will route the request to tools based on user query, tools will process and send results back to LLM and then LLM will articulate the final result. Incase if there is not tool call happened then it response will be sent to END.

9) Tree-of-Thought (ToT)

Tree-of-Thoughts frames problem-solving as a search over a tree of reasoning states. Unlike linear CoT, ToT maintains multiple candidate thoughts at each step, evaluates their promise, and uses search algorithms(BFS/DFS/best-first) to explore the thought space.

This mirrors Human problem solving: we consider multiple approaches, evaluate them, backtrack when struck, and systematically explore alternatives. ToT is especially powerful for tasks requiring exploration - game playing, creative writing, proof construction, and multi-step planning.

In LangGraph, the tree structure is maintained in state as a list of nodes, with a BFS/DFS expansion node, an evaluation node that scores each thought, and a pruning step that keeps only the best branches.

Problem : Plan a 3 day trip.

How ToT works here :

generate_thoughts - LLM processes 3 destination ideas
evaluate_thoughts - LLM scores each on budget, weather, activities
prune_and_select - Keep only the best scoring destination
finalize - LLM builds a full itinerary from the winner

Implementation of ToT :

from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from IPython.display import Image, display
from dotenv import load_dotenv
import json, os

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o", temperature=0.7)


# ── State ─────────────────────────────────────────────────
class ToTState(TypedDict):
    problem: str                      # The user's problem
    thoughts: list[dict]              # Generated candidate ideas
    best_thought: Optional[dict]      # Highest-scoring candidate
    best_solution: Optional[str]      # Final developed answer


# ── Node 1: Generate Thoughts ─────────────────────────────
# Asks the LLM to propose 3 completely different candidate ideas.
# Each idea is one branch of the tree.

def generate_thoughts(state: ToTState) -> ToTState:
    print("[Node 1] Generating 3 candidate thoughts...")

    prompt = f"""Problem: {state['problem']}

Propose exactly 3 DIFFERENT high-level approaches or ideas to solve this.
Each should explore a clearly distinct angle.

Reply with ONLY a JSON list, no markdown fences:
["idea one", "idea two", "idea three"]"""

    response = llm.invoke([HumanMessage(content=prompt)])

    try:
        ideas = json.loads(response.content.strip())
    except json.JSONDecodeError:
        # Fallback: split on newlines if JSON fails
        ideas = [line.strip('- "') for line in response.content.strip().splitlines() if line.strip()]
    ideas = ideas[:3]

    thoughts = [
        {"id": i, "idea": idea, "score": 0.0}
        for i, idea in enumerate(ideas)
    ]

    for t in thoughts:
        print(f"  Thought {t['id']}: {t['idea']}")

    return {**state, "thoughts": thoughts}

    # ✅ Step 1: Try to parse JSON
    # ideas = json.loads(response.content.strip())

    # 👉 It assumes the response looks like this:

    # [
    # "Build a chatbot",
    # "Create a recommendation system",
    # "Develop a fraud detection model"
    # ]
    # Result:
    # ideas = [
    # "Build a chatbot",
    # "Create a recommendation system",
    # "Develop a fraud detection model"
    # ]
    # ❌ Step 2: If JSON fails → fallback logic
    # except json.JSONDecodeError:
    #     ideas = [line.strip('- "') for line in response.content.strip().splitlines() if line.strip()]

    # 👉 This handles messy LLM output like:

    # - Build a chatbot
    # - Create a recommendation system
    # - Develop a fraud detection model
    # What happens here:
    # 1. splitlines()
    # [
    # "- Build a chatbot",
    # "- Create a recommendation system",
    # "- Develop a fraud detection model"
    # ]
    # 2. Clean each line:
    # line.strip('- "')

    # 👉 Removes:

    # -
    # spaces
    # quotes
    # Final result:
    # ideas = [
    # "Build a chatbot",
    # "Create a recommendation system",
    # "Develop a fraud detection model"
    # ]
    # 🔹 Step 3: Take only top 3 ideas
    # ideas = ideas[:3]

    # 👉 Even if you had 10 ideas:

    # ["A", "B", "C", "D", "E"]

    # 👉 It becomes:

    # ["A", "B", "C"]
    # 🔹 Step 4: Convert into structured thoughts
    # thoughts = [
    #     {"id": i, "idea": idea, "score": 0.0}
    #     for i, idea in enumerate(ideas)
    # ]
    # Input:
    # ideas = ["A", "B", "C"]
    # Output:
    # thoughts = [
    # {"id": 0, "idea": "A", "score": 0.0},
    # {"id": 1, "idea": "B", "score": 0.0},
    # {"id": 2, "idea": "C", "score": 0.0}
    # ]

    # 👉 Adds:

    # id → index
    # idea → content
    # score → initialized to 0
    # 🔹 Step 5: Print each thought
    # for t in thoughts:
    #     print(f"  Thought {t['id']}: {t['idea']}")
    # Output:
    # Thought 0: Build a chatbot
    # Thought 1: Create a recommendation system
    # Thought 2: Develop a fraud detection model
    # 🔹 Step 6: Return updated state
    # return {**state, "thoughts": thoughts}

    # 👉 This means:

    # Keep everything in state
    # Add/update "thoughts"
    # Example:
    # state = {
    # "user_input": "Give me project ideas"
    # }
    # Output:
    # {
    # "user_input": "Give me project ideas",
    # "thoughts": [
    #     {"id": 0, "idea": "...", "score": 0.0},
    #     {"id": 1, "idea": "...", "score": 0.0},
    #     {"id": 2, "idea": "...", "score": 0.0}
    # ]
    # }


# ── Node 2: Evaluate Thoughts ─────────────────────────────
# Scores every candidate idea in a single LLM call.
# Returns a score 0.0–1.0 for each thought.

def evaluate_thoughts(state: ToTState) -> ToTState:
    print("\n[Node 2] Evaluating and scoring each thought...")

    ideas_text = "\n".join(
        f"{t['id']}. {t['idea']}" for t in state["thoughts"]
    )
    # 1️⃣ Input: state["thoughts"]

    # Assume this is your data:

    # state["thoughts"] = [
    #     {"id": 0, "idea": "Build a chatbot", "score": 0.0},
    #     {"id": 1, "idea": "Create a recommendation system", "score": 0.0},
    #     {"id": 2, "idea": "Develop a fraud detection model", "score": 0.0}
    # ]
    # 2️⃣ Loop + Format each item
    # f"{t['id']}. {t['idea']}" for t in state["thoughts"]

    # 👉 For each thought t, it creates a string like:

    # 0. Build a chatbot
    # 1. Create a recommendation system
    # 2. Develop a fraud detection model
    # 3️⃣ Join with newline \n
    # "\n".join(...)

    # 👉 Combines all lines into one string separated by new lines

    # 🔹 Final Output
    # ideas_text = """0. Build a chatbot
    # 1. Create a recommendation system
    # 2. Develop a fraud detection model"""

    prompt = f"""Problem: {state['problem']}

Evaluate each idea below. Score each from 0.0 to 1.0 based on:
- Feasibility (can it actually work?)
- Quality (how good is the outcome likely to be?)
- Fit (how well does it match the problem?)

Ideas:
{ideas_text}

Reply with ONLY a JSON object mapping id to score, no markdown fences:
{{"0": 0.7, "1": 0.9, "2": 0.5}}"""

    response = llm.invoke([HumanMessage(content=prompt)])

    try:
        scores = json.loads(response.content.strip())
    except json.JSONDecodeError:
        scores = {str(t["id"]): 0.5 for t in state["thoughts"]}

    scored_thoughts = [
        {**t, "score": float(scores.get(str(t["id"]), 0.5))}
        for t in state["thoughts"]
    ]

    for t in scored_thoughts:
        print(f"  Score {t['score']:.2f} → {t['idea']}")

    return {**state, "thoughts": scored_thoughts}

    # ✅ Step 1: Try to parse scores from response
    # scores = json.loads(response.content.strip())

    # 👉 Expected LLM output:

    # {
    # "0": 0.9,
    # "1": 0.7,
    # "2": 0.4
    # }
    # Result:
    # scores = {
    # "0": 0.9,
    # "1": 0.7,
    # "2": 0.4
    # }
    # ❌ Step 2: If JSON fails → fallback
    # except json.JSONDecodeError:
    #     scores = {str(t["id"]): 0.5 for t in state["thoughts"]}

    # 👉 If LLM gives messy output like:

    # Idea 1 is good
    # Idea 2 is average

    # Then fallback creates:

    # scores = {
    # "0": 0.5,
    # "1": 0.5,
    # "2": 0.5
    # }

    # 👉 Every idea gets a default score = 0.5

    # 🔹 Step 3: Merge scores with thoughts
    # scored_thoughts = [
    #     {**t, "score": float(scores.get(str(t["id"]), 0.5))}
    #     for t in state["thoughts"]
    # ]
    # 🔸 Input thoughts:
    # state["thoughts"] = [
    # {"id": 0, "idea": "Build chatbot"},
    # {"id": 1, "idea": "Recommendation system"},
    # {"id": 2, "idea": "Fraud detection"}
    # ]
    # 🔸 What happens here?

    # For each thought t:

    # Example for id = 0:
    # scores.get("0", 0.5) → 0.9

    # Then:

    # {**t, "score": 0.9}

    # 👉 {**t} means:

    # Copy existing dictionary
    # Add/update "score"
    # 🔸 Final Output:
    # scored_thoughts = [
    # {"id": 0, "idea": "Build chatbot", "score": 0.9},
    # {"id": 1, "idea": "Recommendation system", "score": 0.7},
    # {"id": 2, "idea": "Fraud detection", "score": 0.4}
    # ]
    # 🔹 Step 4: Print results
    # for t in scored_thoughts:
    #     print(f"  Score {t['score']:.2f} → {t['idea']}")
    # Output:
    # Score 0.90 → Build chatbot
    # Score 0.70 → Recommendation system
    # Score 0.40 → Fraud detection


# ── Node 3: Prune and Select ──────────────────────────────
# Sorts all thoughts by score and keeps only the best one.
# This is the "pruning" step — weak branches are discarded.

def prune_and_select(state: ToTState) -> ToTState:
    print("\n[Node 3] Pruning — keeping best thought...")

    best = max(state["thoughts"], key=lambda t: t["score"])
    print(f"  Winner: (score={best['score']:.2f}) {best['idea']}")

    return {**state, "best_thought": best}


# ── Node 4: Finalize ──────────────────────────────────────
# Takes the winning thought and develops it into a full,
# detailed solution. This is the synthesis step.

def finalize(state: ToTState) -> ToTState:
    print("\n[Node 4] Developing the winning thought into a full solution...")

    prompt = f"""Problem: {state['problem']}

The best approach identified is:
{state['best_thought']['idea']}

Now develop this into a complete, detailed, practical solution.
Be specific and actionable."""

    response = llm.invoke([HumanMessage(content=prompt)])
    return {**state, "best_solution": response.content.strip()}


# ── Graph Construction ─────────────────────────────────────
def build_tot_graph():
    builder = StateGraph(ToTState)

    builder.add_node("generate_thoughts", generate_thoughts)
    builder.add_node("evaluate_thoughts", evaluate_thoughts)
    builder.add_node("prune_and_select",  prune_and_select)
    builder.add_node("finalize",          finalize)

    builder.add_edge(START,              "generate_thoughts")
    builder.add_edge("generate_thoughts","evaluate_thoughts")
    builder.add_edge("evaluate_thoughts","prune_and_select")
    builder.add_edge("prune_and_select", "finalize")
    builder.add_edge("finalize",          END)

    return builder.compile()


graph = build_tot_graph()
display(Image(graph.get_graph().draw_mermaid_png()))



# ── Run: Trip Planning ────────────────────────────────────
result = graph.invoke({
    "problem":      "Plan an exciting 3-day trip for a solo traveller on a medium budget",
    "thoughts":     [],
    "best_thought": None,
    "best_solution":None,
})

print("\n" + "="*60)
print(f"Winning idea: {result['best_thought']['idea']}")
print(f"Score: {result['best_thought']['score']:.2f}")
print("="*60)
print("\nFULL PLAN:")
print(result["best_solution"])

Output :

# ── Run: Business Strategy ────────────────────────────────
result2 = graph.invoke({
    "problem":      "Suggest a strategy for a small bakery to increase revenue",
    "thoughts":     [],
    "best_thought": None,
    "best_solution":None,
})

print("\n" + "="*60)
print("All thoughts considered:")
for t in result2["thoughts"]:
    print(f"  {t['score']:.2f}  {t['idea']}")
print(f"\nWinner: {result2['best_thought']['idea']}")
print("="*60)
print(result2["best_solution"])

Output :

What each nodes does ?

10) Meta Prompting

Meta Prompting uses an LLM to design, test, and refine prompts automatically. Instead of a human writing prompts by hand through trial and error, the model reasons about what makes a good prompt, generates one, and rewrites the prompt base don what went wrong.

The key insight:

LLM knows a lot about what makes prompt effective - they have processed millions of examples of good and bad instructions. Meta prompting exploits this knowledge to automate prompt engineering.

Three roles in a meta prompting system :

What meta prompting is NOT :

Scenario in this example :

Implementation of Meta Prompting :

from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from IPython.display import Image, display
from dotenv import load_dotenv
import json, os

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

# ── State ─────────────────────────────────────────────────
class MetaState(TypedDict):
    task_goal: str            # What we want the prompt to accomplish
    test_input: str           # A sample input to test the prompt on
    expected_output: str      # The correct answer for the test input
    current_prompt: Optional[str]   # The candidate prompt being tested
    llm_output: Optional[str]       # What the prompt produced
    score: Optional[int]            # Quality score 1–10
    critique: Optional[str]         # What was wrong
    iteration: int                  # How many refinement rounds done
    max_iterations: int             # Safety ceiling
    final_prompt: Optional[str]     # The approved prompt


# ── Node 1: generate_prompt ───────────────────────────────
# The LLM reads the task goal and writes a complete system prompt.
# This is the "meta" step — the model is prompting itself.

def generate_prompt(state: MetaState) -> MetaState:
    print(f"\n[Node 1] Generating prompt (iteration {state['iteration'] + 1})")

    response = llm.invoke([
        SystemMessage(content="You are a prompt engineering expert. "
                               "Write clear, precise system prompts for LLM classification tasks. "
                               "Output ONLY the prompt text. No explanation."),
        HumanMessage(content=f"Write a system prompt for this task:\n{state['task_goal']}\n\n"
                              f"The model must output ONLY a single category word.")
    ])

    prompt = response.content.strip()
    print(f"[Node 1] Prompt generated ({len(prompt)} chars)")
    return {**state, "current_prompt": prompt}


# ── Node 2: execute_prompt ────────────────────────────────
# Run the generated prompt on the test input.
# Uses temp=0.0 for deterministic, repeatable results.

def execute_prompt(state: MetaState) -> MetaState:
    print(f"[Node 2] Running prompt on test input...")

    task_llm = ChatOpenAI(model="gpt-4o", temperature=0.0)

    response = task_llm.invoke([
        SystemMessage(content=state["current_prompt"]),
        HumanMessage(content=state["test_input"])
    ])

    output = response.content.strip().lower().split()[0].rstrip(".,;:")
    print(f"[Node 2] Output: '{output}'  |  Expected: '{state['expected_output']}'")
    return {**state, "llm_output": output}


# ── Node 3: score_output ──────────────────────────────────
# The LLM scores the output quality.
# A structured JSON response gives a clean machine-readable score.

def score_output(state: MetaState) -> MetaState:
    print(f"[Node 3] Scoring output quality...")

    is_correct = state["llm_output"] == state["expected_output"]

    response = llm.invoke([HumanMessage(content=f"""
A prompt was tested on this input:
Input:    {state['test_input']}
Expected: {state['expected_output']}
Got:      {state['llm_output']}
Correct:  {is_correct}

The prompt used:
{state['current_prompt']}

Score this prompt 1-10. Consider: correctness, clarity, output format.
Reply ONLY with JSON (no markdown fences):
{{"score": <int>, "critique": "<one sentence on what to fix>"}}
""")])

    try:
        result = json.loads(response.content.strip())
        score   = int(result["score"])
        critique = result["critique"]
    except (json.JSONDecodeError, KeyError):
        score    = 9 if is_correct else 3
        critique = "Output format issue" if not is_correct else ""

    print(f"[Node 3] Score: {score}/10 — {critique}")
    return {**state, "score": score, "critique": critique}


# ── Node 4: refine_prompt ─────────────────────────────────
# When the score is too low, rewrite the prompt using the critique.
# Targeted fix — not a random rewrite.

def refine_prompt(state: MetaState) -> MetaState:
    print(f"[Node 4] Refining prompt based on critique...")

    response = llm.invoke([
        SystemMessage(content="You are a prompt engineer. Fix the issues described. "
                               "Output ONLY the revised prompt. No explanation."),
        HumanMessage(content=f"Current prompt (has issues):\n{state['current_prompt']}\n\n"
                              f"Critique — what to fix:\n{state['critique']}\n\n"
                              f"Rewrite the prompt to fix these issues exactly.")
    ])

    refined = response.content.strip()
    print(f"[Node 4] Refined prompt ready")
    return {
        **state,
        "current_prompt": refined,
        "iteration": state["iteration"] + 1
    }


# ── Router ────────────────────────────────────────────────
SCORE_THRESHOLD = 7

def should_refine(state: MetaState) -> Literal["refine_prompt", "__end__"]:
    if state["iteration"] >= state["max_iterations"]:
        print(f"[Router] Max iterations reached — accepting prompt")
        return "__end__"
    if state["score"] >= SCORE_THRESHOLD:
        print(f"[Router] Score {state['score']}/10 — ACCEPTED")
        return "__end__"
    print(f"[Router] Score {state['score']}/10 — REFINING")
    return "refine_prompt"


# ── Save final prompt node ────────────────────────────────
def save_final(state: MetaState) -> MetaState:
    print(f"\n[Done] Final score: {state['score']}/10 after {state['iteration']+1} iteration(s)")
    return {**state, "final_prompt": state["current_prompt"]}



# ── Graph Construction ─────────────────────────────────────
def build_meta_graph():
    builder = StateGraph(MetaState)

    builder.add_node("generate_prompt", generate_prompt)
    builder.add_node("execute_prompt",  execute_prompt)
    builder.add_node("score_output",    score_output)
    builder.add_node("refine_prompt",   refine_prompt)
    builder.add_node("save_final",      save_final)

    # Linear start
    builder.add_edge(START,             "generate_prompt")
    builder.add_edge("generate_prompt", "execute_prompt")
    builder.add_edge("execute_prompt",  "score_output")

    # Quality gate
    builder.add_conditional_edges(
        "score_output",
        should_refine,
        {"refine_prompt": "refine_prompt", "__end__": "save_final"}
    )

    # Refinement loops back to execute (not generate)
    builder.add_edge("refine_prompt", "execute_prompt")
    builder.add_edge("save_final",    END)

    return builder.compile()


graph = build_meta_graph()
display(Image(graph.get_graph().draw_mermaid_png()))



# ── Run ────────────────────────────────────────────────────
result = graph.invoke({
    "task_goal": (
        "Classify customer support emails into one of four categories: "
        "billing (payments, invoices, refunds), "
        "technical (bugs, errors, login issues), "
        "shipping (delivery, tracking, damaged items), "
        "other (general questions). "
        "Output ONLY the single category word."
    ),
    "test_input":      "I was charged twice for my subscription this month.",
    "expected_output": "billing",
    "current_prompt":  None,
    "llm_output":      None,
    "score":           None,
    "critique":        None,
    "iteration":       0,
    "max_iterations":  3,
    "final_prompt":    None,
})

print("\n" + "="*60)
print("WINNING PROMPT:")
print("="*60)
print(result["final_prompt"])
print(f"\nFinal score : {result['score']}/10")
print(f"Iterations  : {result['iteration'] + 1}")
print(f"Test output : '{result['llm_output']}' (expected '{result['expected_output']}')") 

Output :

# ── Use the final prompt in production ────────────────────
# The winning prompt is now ready to deploy as a system message.

task_llm = ChatOpenAI(model="gpt-4o", temperature=0.0)

test_emails = [
    "My package has been stuck in transit for 5 days.",
    "The app keeps crashing when I open the dashboard.",
    "Can you explain the difference between your plans?",
]

print("PRODUCTION TEST WITH WINNING PROMPT")
print("-" * 40)
for email in test_emails:
    r = task_llm.invoke([
        SystemMessage(content=result["final_prompt"]),
        HumanMessage(content=email)
    ])
    category = r.content.strip().lower().split()[0].rstrip(".,;:")
    print(f"{category:12} | {email}")

Output :

Please look at below table to make a decision on which prompt technique to use.

Concluion :

That's all about Prompting techniques. We will look into defensive techniques in prompt engineering in the next blog.

Thank you for reading this blog !

Arun Mathe

DataSphere

Search This Blog

(AI #15) Prompt Engineering

Labels

Comments

Post a Comment

Popular posts from this blog

(AI #1) Deep Learning and Neural Networks

Spark Core : Understanding RDD & Partitions in Spark

(AI #3) Deep Learning Foundations - Activation & Loss Functions, Gradient Descent algorithms & Optimization techniques