Skip to main content

(AI Blog#18) RAG - Retrieval Augmented Generation

In my previous blog, I have discussed about complete indexing part like how to extract the data from multiple file sources, chunking, embedding and vector store/DB. This is a very important step to prepare knowledge base for our project to inject our company specific non-confidential information to process user query during 'retrieval' process in RAG. Once you prepare and ready with knowledge base then only we can enter into actual RAG implementation which this blog talks about. Incase if you are planning to implement the complete RAG, I recommend you to read below mentioned blog before going through this blog.

https://arunsdatasphere.blogspot.com/2026/04/ai-blog17-rag-preparing-knowledge-base.html


RAG (Retrieval Augmented Generation)  

RAG is a technique where an AI model first retrieves relevant information from an external knowledge source(like a vector database) and then uses it to generate more accurate and context-aware responses. 


Look at above image to understand the order of components in RAG. Below are the sequence of components to follow while developing a RAG:

  • Query Reformulation
  • Query Expansion
  • Intent Retrieval/Validation
  • Pre Filter
  • Post Filter
  • Hybrid Search
    • Semantic/Vector search
    • BM25 (Keyword search)
    • Semantic + BM25 (Hybrid search) - RRF
  • Re-ranking (very important - mandatory technique)
    • LLM Score based re-ranking
    • Pair wise re-ranking
    • List wise re-ranking
    • Query aware re-ranking
    • Hybrid re-ranking
  • Evaluation metrics
    • Precision
    • Recall
    • NDCG
    • Faithfulness
    • Answer relevancy
    • Context precision
    • Context recall

We will discuss about every technique mentioned in the above blog. These is called Retrieval + Augmentation + Generation


Let us assume, below is the user query :

"How to cancel  my order and what is the refund policy for electronic items ?" Whenever we are submitted this query as part of RAG - it goes to a step called Retrieval, then immediately it won't start RAG process like chunking, embedding this query and directly going to check the similarity of these chunks in input query with the chunks in vector DB. 

Internally we need to do following steps:

  • Query reformulation
  • Query expansion
  • Intent validation


Query Reformulation

Assume my question is just "AI". Tell me how am i going to interpret this question ?? If I ask AI, just 'AI' - is there any meaning ? May be correct search strings are 'What is AI?' or 'What are the applications used in AI ?' then we will get the correct response. For a moment, just ignore AI & RAG - even if you ask a person, simply 'AI' - do you think other person will understand what you are exactly talking about ? NO right ? This is when Query Reformulation is required. As part of Query Reformulation, we are going to validate whether the query is appropriate or not. If the query is not appropriate, then we need to reformulate that query to make AI understand what a user is asking about. 

Example :

Question is "How to cancel my order ?" If a user ask this question "Order Cancel" - it is a vague query. Whenever users ask these kind of questions, we need to articulate the query in a meaningful way. This is called Query Reformulation.


Query Expansion

Assume:

  • User-1 using application first time
  • User-2 using 50th time
  • User-3 using 1000th time
User-3 have more context about the application he is using right ? But User-1 doesn't know about application. 

Now, if User-1 asking "How to cancel my order ?" This query will be converted in 'n' no. of queries like query1, query2, query3 & query4. This is called Query Expansion

The question could be a complex question or user is a first time user. Both the cases, we need to expand the query and form multiple sub queries to build a full context of what that particular user is asking about ?  


Intent Validation

Let us assume, you created this bot/agent for an e-commerce application. But user a question about healthcare, then the intent of the bot is not related to healthcare right ? Here we need to validate the query - if the intent is appropriate, then we need to take user query to next level, otherwise we need inform user saying the actual intention of the bot/agent he is using. Something like This bot is mainly meant for e-commerce applications ! (some sort of response to let user understand what this bot is about).

To make it simple, in the retrieval step - we need to enable above 3 reforms. If we are not using above 3 reforms, we will get irrelevant output.

We can fix these issues in 2 ways :

  • Either we have to maintain our meta data while creating knowledge base, using metadata we can take care of Query Reformulation, Query Expansion & Intent Validation
  • Otherwise, we need to take the help of LLM 


Pre Filter 

Let us say users intent is to understand the 'refund policy', and he entered a query to get this information. Do we need to search entire data ? or only refund policy related information ? It is a smart idea to search only 'refund policy' related information only right ? This is called Pre Filter.

Post Filter

This comes after search completed. We may have to still filter the context which is the output from Pre-Filtering. This is called Post Filter.

One line summary :

  • Pre-Filtering narrows what you search
  • Post-Filtering fixes what you found.


Hybrid search

Hybrid search is a combination of Semantic/Vector search + Keyword search. 

In health care, we should not assume anything, we need to give exact results with correct keywords. We need to use keyword search in such cases for exact results. 

One more example is, user is looking for 'Refund Policy', especially looking for 'Electronics Refund Policy'. In this situation, we can use semantic search for 'Refund Policy' but for 'Electronic Refund Policy', we need to use Keyword search.

Hybrid would be something like: 

  • 70% Semantic search & 30% Keyword search
  • 50% Semantic search & 50% Keyword search
  • 30% Semantic search & 70% Keyword search

It all depends on the use case.

Assume, we got top 10 results from Hybrid search. Next question is whether all these top 10 results arranged in descending order or not ? based on the similarity score - they should be arranged in highest to lowest score. This will happen during Re-Ranking

Note, all the above steps are happening during Retrieval step. Once all these steps are done, along with user query, relevant context will be submitted to LLM as part and this is called Generation


Modern RAG Flow :

  • User Query
  • Query understanding
    • Reformulation
    • Expansion
    • Intent validation(if irrelevant - inform user)
  • Pre-filtering(Optional but common)
    • Metadata filters(category, tenant, language etc. - search only required data)
  • Hybrid search
    • Semantic/vector search(Pinecone etc.)
    • Keyword/BM25 search
  • Post-Filtering( if required - fix what you found)
  • Re-ranking
    • Improves top-k quality significantly
  • Context selection/compression
    • Remove redundancy
    • Fit within token limits(often missed in production flows - hit context window)
  • Prompt construction
    • Combine User query + retrieved context 
    • Instructions / system prompt
  • LLM Generation
  • Post Processing
    • Format output
    • Gaurdrails (hallucination checks, safety)
  • Evaluation & Feedback loop
    • Logging
    • Metrics(Precision, Recall, Faithfulness)
    • Continuous improvement


Retrieval Strategies 

This is the foundation layer of RAG quality. If retrieval is weak, no reranking or generation can fully fix it. 

Let's see 3 core Retrieval strategies:

  • Query Formulation
    • Transforming users raw query into a better search query
    • User query - reformulated query - retrieval
  • Query Expansion
  • Intent Validation
1) Query Formulation:

Main problem is, users don't speak database language. They generally write query/input in a normal language. We need to interpret it. 

Example:

Imagine asking a librarian about "Books about things going wrong in chips" but the catalog is indexed under "Semiconductor failure mechanisms". Then librarian reformulates your question before searching. That's called Query formulation. LLM can take care of reformulation.

Types of query formulation:

  • Semantic rewriting
  • Keyword enrichment
  • Domain normalization
  • Clarification-base reformulation

It bridges user language - document language gap. 


2) Query Expansion

Generating multiple related queries from user query.

  • User query - N queries - Retrieval - Merge results

Example: "What causes diabetes ?"

But the document might use:

  • Blood sugar disorder
  • Insulin resistance
  • Glucose imbalance
Here single query missed context. 

Solution:
Expand "What causes diabetes" into
  • Causes of diabetes
  • Insulin resistance explanation
  • Blood sugar disorder causes 

Disadvantage is, it will cost us more as we use LLM for this feature. More tokens, more money. 


3) Intent Validation

Check whether the query is:

  • Relevant
  • Valid
  • Safe
  • In-domain
User query - Validate - Proceed/ Reject/ Redirect

Example :

User asking "what is the capital of France" when we our domain is medical. 


Full retrieval pipeline:

User Query - Intent Validation - Query formulation - Query Expansion - Vector/Hybrid retrieval - Context - LLM - Answer

Implementation of Retrieval Strategies :

import chromadb
from sentence_transformers import SentenceTransformer
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()

llm = ChatOpenAI(model = "gpt-4o-mini", temperature=0.3)

# Initialize embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Create Chroma client
client = chromadb.Client()

collection = client.create_collection(name="rag-final")

# Sample documents
docs = [
    "RAG stands for Retrieval Augmented Generation.",
    "ChromaDB is a vector database used for storing embeddings.",
    "Query expansion improves recall in retrieval systems.",
    "Intent validation ensures the query is relevant to the system."
]

embeddings = [embedding_model.encode(doc).tolist() for doc in docs]

# Add documents
collection.add(
    documents=docs,
    embeddings=embeddings,
    ids=[f"id_{i}" for i in range(len(docs))]
)

def reformulate_query_llm(query):
    prompt = f"""
Rewrite the following query to make it clearer for semantic search.

Query: {query}
Rewritten Query:
"""

    response = llm.invoke(prompt)
       
    return response.content.strip()


def expand_query_llm(query):
    prompt = f"""
Generate 4 different search queries similar to the given query.

Query: {query}

Return as a Python list.
"""

    response = llm.invoke(prompt)

        # Safe fallback parsing
    try:
        queries = eval(response.content)
        if isinstance(queries, list):
            return queries
    except:
        pass

    return [query]


def validate_intent_llm(query):
    prompt = f"""
You are an intent classifier.

Classify if the user query is related to:
- RAG
- retrieval systems
- embeddings
- vector databases

Return ONLY "YES" or "NO".

Query: {query}
"""

    response = llm.invoke(prompt)

    return "YES" in response.content.upper()


def retrieve_docs(queries, top_k=2):
    results_all = []

    for q in queries:
        emb = embedding_model.encode(q).tolist()

        results = collection.query(
            query_embeddings=[emb],
            n_results=top_k
        )

        results_all.extend(results["documents"][0])

    return list(set(results_all))




def generate_answer_llm(query, context):
    context_text = "\n".join(context)

    prompt = f"""
Answer the question using the context below.

Context:
{context_text}

Question:
{query}
"""

    response = llm.invoke(prompt)

    return response.content


# Full Pipeline

def rag_pipeline_llm(user_query):

    # 1. Intent Validation
    if not validate_intent_llm(user_query):
        return "❌ Query is not relevant to the system."

    # 2. Reformulation
    refined_query = reformulate_query_llm(user_query)

    # 3. Expansion
    expanded_queries = expand_query_llm(refined_query)

    # 4. Retrieval
    docs = retrieve_docs(expanded_queries)

    # 5. Generation
    answer = generate_answer_llm(user_query, docs)

    return {
        "original_query": user_query,
        "refined_query": refined_query,
        "expanded_queries": expanded_queries,
        "retrieved_docs": docs,
        "answer": answer
    }

result = rag_pipeline_llm("what is query expansion in rag")

print(result)



Output :
{'original_query': 'what is query expansion in rag',
'refined_query': 'What is the concept of query expansion in retrieval-augmented
generation (RAG)?',
'expanded_queries': ['What is the concept of query expansion in retrieval-augmented
generation (RAG)?'],
'retrieved_docs': ['RAG stands for Retrieval Augmented Generation.',
'Query expansion improves recall in retrieval systems.'],
'answer': "Query expansion in Retrieval Augmented Generation (RAG) refers to the
process of enhancing the original search query by adding additional terms or phrases.
This technique aims to improve the recall of the retrieval system, allowing it to
fetch more relevant documents or information that may not have been captured by the
initial query. By broadening the scope of the search, query expansion helps ensure
that the generated responses are more comprehensive and relevant to the user's needs.
"}


Pre/ Post Filtering

Pre-Filtering:

Understand if user is looking for information about "Sick leave policy" and the user query would be "How many sick leaves are allowed per quarter ?" - for this use case, clearly this information is related HR policy about sick leaves, but if you search entire data(all vectors) in the vector DB, then it latency will screw up and it takes good amount of time. Hence we need to apply a filter before search, routing search filter to search only HR policy related chunks in the Vector DB. This is called Pre-Filtering. On the top of this - if you vector DB is a cloud hosted DB then it will cost us fortune. We need to keep all these considerations in mind while designing RAG pipeline. This is called Pre-Filtering.

Always apply constraints like:

  • Domain
  • Category
  • Time
  • Source
  • Meta Data
Main idea is narrowing down the amount of data to search, using meta data that we collected during data extraction phase and use it effectively to apply filter.

Example :

User query - How to reduce investment risk ?

Pre-Filter
{
     "domain": "finance"

Apply above filter, and search only finance related data.

We can either use LLM or Meta Data to enable this feature. That's the reason, please categorize meta data while creating your knowledge base itself as part of Indexing(to be precise during data extraction). If needed, sit with SME on data from your organization and implement this feature.

Even if it is a complex query that needs answer from 2 or more domains, then it will be flexible to handle because we already segregated meta data based on different constraints.

There is something called, Golden Data Set - which we need to create during/ may be before data extraction(depends on use case), which is a warehouse of multiple questions and related answers from our knowledge base. We prepare this per each chunk during data extraction, something like "Create 10 different questions per chunk" during data extraction and repeat this for entire knowledge base and keep this warehouse as a Golden Data Set. This will be very useful during evaluation once LLM generated the response after Generation. This data set will be very helpful to compare the response generated from LLM to decide whether LLM is generating grounded response or not.

Post-Filtering:

  • Even after retrieval, Top-k results ≠ Fully relevant
  • Some results are partially relevant, noisy, misleading
Evaluate retrieved documents and remove bad ones. 

Implementing Pre-Post Filtering :

import os
import json
import chromadb
from sentence_transformers import SentenceTransformer
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

# OpenAI Chat Model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Chroma client
client = chromadb.Client()
collection = client.create_collection(name="multi_domain_rag")


# ==========================================
# 2. SAMPLE DOCUMENTS (GENERIC DOMAINS)
# ==========================================
documents = [
    "Diabetes is a chronic disease that affects blood sugar levels.",
    "Hypertension increases the risk of heart disease and stroke.",
    "Stock markets fluctuate based on economic conditions.",
    "Diversification reduces investment risk in finance.",
    "Artificial Intelligence enables machines to learn from data.",
    "Cloud computing provides scalable infrastructure over the internet.",
    "Neural networks are used in deep learning applications."
]

metadatas = [
    {"domain": "healthcare", "type": "disease"},
    {"domain": "healthcare", "type": "risk"},
    {"domain": "finance", "type": "market"},
    {"domain": "finance", "type": "investment"},
    {"domain": "technology", "type": "ai"},
    {"domain": "technology", "type": "cloud"},
    {"domain": "technology", "type": "ml"}
]

# Add to Chroma
embeddings = [embedding_model.encode(doc).tolist() for doc in documents]

collection.add(
    documents=documents,
    embeddings=embeddings,
    metadatas=metadatas,
    ids=[f"id_{i}" for i in range(len(documents))]
)



# ==========================================
# 3. FILTER VALIDATION + BUILDING
# ==========================================
VALID_DOMAINS = ["healthcare", "finance", "technology"]
VALID_TYPES = ["disease", "risk", "market", "investment", "ai", "cloud", "ml"]

def clean_filters(filters):
    cleaned = {}

    if filters.get("domain") in VALID_DOMAINS:
        cleaned["domain"] = filters["domain"]

    if filters.get("type") in VALID_TYPES:
        cleaned["type"] = filters["type"]

    return cleaned


def build_chroma_filter(filters):
    conditions = []

    for k, v in filters.items():
        if v:
            conditions.append({k: v})

    if not conditions:
        return None

    if len(conditions) == 1:
        return conditions[0]

    return {"$and": conditions}



# ==========================================
# 4. PRE-FILTERING (LLM)
# ==========================================
def detect_filters_llm(query):
    prompt = f"""
    Extract metadata filters from query.

    Allowed values:
    domain: healthcare, finance, technology
    type: disease, risk, market, investment, ai, cloud, ml

    Return ONLY JSON:
    {{
      "domain": "...",
      "type": "..."
    }}

    Query: {query}
    """

    response = llm.invoke(prompt)

    try:
        filters = json.loads(response.content)
        return filters
    except:
        return {}



# ==========================================
# 5. POST-FILTERING (LLM)
# ==========================================
def post_filter_docs(query, docs):
    filtered_docs = []

    for doc in docs:
        prompt = f"""
        Check if this document is relevant.

        Query: {query}
        Document: {doc}

        Answer YES or NO only.
        """

        response = llm.invoke(prompt)

        if "YES" in response.content.upper():
            filtered_docs.append(doc)

    return filtered_docs


# ==========================================
# 6. RETRIEVAL (FIXED)
# ==========================================
def hybrid_retrieval(query):

    # Step 1: LLM filter extraction
    raw_filters = detect_filters_llm(query)
    print("🧠 Raw filters:", raw_filters)

    # Step 2: Clean filters
    cleaned_filters = clean_filters(raw_filters)
    print("🧹 Cleaned filters:", cleaned_filters)

    # Step 3: Convert to Chroma format
    chroma_filter = build_chroma_filter(cleaned_filters)
    print("🔎 Chroma filter:", chroma_filter)

    # Step 4: Vector search
    query_embedding = embedding_model.encode(query).tolist()

    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=10,
        where=chroma_filter
    )

    docs = results["documents"][0]
    print("📄 Retrieved:", docs)

    # Step 5: Post-filter
    refined_docs = post_filter_docs(query, docs)
    print("✅ After post-filter:", refined_docs)

    return refined_docs[:5]



# ==========================================
# 7. GENERATION
# ==========================================
def generate_answer(query, context_docs):
    context = "\n".join(context_docs)

    prompt = f"""
    Answer ONLY using the context.

    Context:
    {context}

    Question: {query}
    """

    response = llm.invoke(prompt)
    return response.content


# ==========================================
# 8. FULL PIPELINE
# ==========================================
def rag_pipeline(query):

    context_docs = hybrid_retrieval(query)

    if not context_docs:
        return {"error": "No relevant documents found"}

    answer = generate_answer(query, context_docs)

    return {
        "query": query,
        "context": context_docs,
        "answer": answer
    }



# ==========================================
# 9. TEST
# ==========================================
if __name__ == "__main__":

    queries = [
        "What is diabetes?",
        "How to reduce financial risk?",
        "Explain neural networks"
    ]

    for q in queries:
        print("\n============================")
        print("Query:", q)

        result = rag_pipeline(q)

        print("\n🎯 FINAL OUTPUT:")
        print(json.dumps(result, indent=2))

Output :

Vector store, Chroma DB schema :

collection.add(
    documents=documents,
    embeddings=embeddings,
    metadatas=metadatas,
    ids=[f"id_{i}" for i in range(len(documents))]
)


Lets refresh what we have learnt so far :

We have discussed about following retrieval strategies:

  • Intent validation
  • Query reformulation
  • Query expansion
  • Pre-Filter
  • Post-Filter
From now, we are going to discuss about:

  • Search strategies
    • Semantic search(vector)
    • Keyword search(BM25 - Best Match version 25)
    • Hybrid search (Semantic + BM25) - RRF
  • Re-Ranking


Hybrid search strategy

Hybrid search strategy is one of the most important ideas in modern RAG systems. Hybrid search is where "retrieval" becomes more powerful. It is the combination of Keyword search(BM25) & Semantic search(Embeddings).  

Pipeline view:

User query - BM25(keyword search) - Semantic search(vector)  - Fusion(RRF) 

Problem with BM25:

  • BM25 relies on exact words
  • Ex: "Heart attack causes" but document says "myocardial infarction reasons"
  • BM25 fails (no exact match)
Problem with semantic search:
  • Semantic search understands meaning but can miss exact keywords
  • Ex: "Python list append syntax" but semantic search may return "How to modify arrays in programming" - this is too generic
Best solution : Combine both Semantic search + Keyword search (BM25)


RRF (Reciprocal Rank Fusion)

RRF combines rankings from multiple retrievers.

  • Key idea is, instead of combining scores, combine ranks
  • Formula, Score = 1/(k + rank), where
    • rank = position in list
    • k = smoothing constant, usually 60 (means 60% weightage to Semantic + 40% to BM25)
Analogy : Imagine 2 judges ranking contestants

Judge1 (BM25):
  • A - 1st
  • B - 2nd
  • C - 3rd
Judge2 (Semantic):
  • B - 1st
  • A - 2nd
  • C - 3rd
RRF combines rankings:
  • A good in both
  • B strong in one
  • C moderate
Final decision  = balanced answer

Why RRF is powerful ? It doesn't depend on score scale, robust to noisy rankings and works across different retrievers.

🔥 10. Common Mistakes

❌ Using only vector search

❌ Ignoring keyword matching

❌ Not using fusion (RRF)

❌ Combining scores incorrectly


Full retrieval stack :

🚀 11. Full Retrieval Stack (Modern RAG)

User Query

   ↓

Intent Validation
     
     |
     
Query Formulation

   ↓

Query Expansion

   ↓

Pre-filters
 
    |

Hybrid Search (BM25 + Semantic)

   ↓

RRF Fusion

   ↓

Post-filtering

   ↓

Reranking

   ↓

Final Context

   ↓

LLM Answer


Full pipeline :




Implementation of Hybrid Search :

# ==========================================
# 1. SETUP
# ==========================================
import os
import json
import numpy as np
import chromadb
from dotenv import load_dotenv
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
from langchain_openai import ChatOpenAI

load_dotenv()
assert os.getenv("OPENAI_API_KEY")

# OpenAI Chat Model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Embedding Model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")


# ==========================================
# 2. DATA
# ==========================================
documents = [
    "Diabetes is a chronic disease affecting blood sugar levels.",
    "Hypertension increases risk of heart disease.",
    "Stock markets fluctuate due to economic conditions.",
    "Diversification reduces investment risk.",
    "Neural networks are key to deep learning.",
    "Cloud computing provides scalable infrastructure."
]

# BM25 setup
tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)

# Chroma setup
client = chromadb.Client()
collection = client.create_collection("hybrid_rag_full")

embeddings = [embedding_model.encode(doc).tolist() for doc in documents]
collection.add(
    documents=documents,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(documents))]
)


# ==========================================
# 3. QUERY EXPANSION (LLM)
# ==========================================
def expand_query(query):
    prompt = f"""
    Generate 3 alternative search queries.
    Return JSON list.

    Query: {query}
    """
    response = llm.invoke(prompt)

    try:
        return json.loads(response.content)
    except:
        return [query]


# ==========================================
# 4. BM25 RETRIEVAL
# ==========================================
def bm25_retrieve(query, top_k=3):
    scores = bm25.get_scores(query.lower().split())
    ranked = np.argsort(scores)[::-1]
    return [documents[i] for i in ranked[:top_k]]


# ==========================================
# 5. VECTOR RETRIEVAL
# ==========================================
def vector_retrieve(query, top_k=3):
    q_emb = embedding_model.encode(query).tolist()
    results = collection.query(query_embeddings=[q_emb], n_results=top_k)
    return results["documents"][0]


# ==========================================
# 6. RRF FUSION
# ==========================================
def rrf(rank_lists, k=60):
    scores = {}

    for rlist in rank_lists:
        for rank, doc in enumerate(rlist):
            scores[doc] = scores.get(doc, 0) + 1 / (k + rank)

    return sorted(scores, key=scores.get, reverse=True)


# ==========================================
# 7. HYBRID RETRIEVAL
# ==========================================
def hybrid_retrieval(query):

    # Step 1: Expand queries
    queries = expand_query(query)
    queries.append(query)

    all_rank_lists = []

    for q in queries:
        bm25_docs = bm25_retrieve(q)
        vec_docs = vector_retrieve(q)

        all_rank_lists.append(bm25_docs)
        all_rank_lists.append(vec_docs)

    # Step 2: Fuse rankings
    fused_docs = rrf(all_rank_lists)

    return fused_docs[:5]


# ==========================================
# 8. LLM RERANKING
# ==========================================
def rerank_llm(query, docs):
    scored_docs = []

    for doc in docs:
        prompt = f"""
        Score relevance from 0 to 1.

        Query: {query}
        Document: {doc}
        """

        try:
            score = float(llm.invoke(prompt).content.strip())
        except:
            score = 0.0

        scored_docs.append((doc, score))

    ranked = sorted(scored_docs, key=lambda x: x[1], reverse=True)
    return [doc for doc, _ in ranked]


# ==========================================
# 9. FINAL GENERATION
# ==========================================
def generate_answer(query, docs):
    context = "\n".join(docs)

    prompt = f"""
    Answer ONLY using the context.

    Context:
    {context}

    Question: {query}
    """

    response = llm.invoke(prompt)
    return response.content


# ==========================================
# 10. FULL PIPELINE
# ==========================================
def hybrid_rag_pipeline(query):

    print("🔍 Query:", query)

    # Step 1: Hybrid Retrieval
    retrieved_docs = hybrid_retrieval(query)
    print("📄 Retrieved:", retrieved_docs)

    # Step 2: Rerank
    final_docs = rerank_llm(query, retrieved_docs)
    print("⭐ Reranked:", final_docs)

    # Step 3: Generate Answer
    answer = generate_answer(query, final_docs[:3])

    return {
        "query": query,
        "retrieved_docs": retrieved_docs,
        "final_docs": final_docs[:3],
        "answer": answer
    }


# ==========================================
# 11. TEST
# ==========================================
if __name__ == "__main__":

    queries = [
        "How to reduce investment risk?",
        "What is diabetes?",
        "Explain neural networks"
    ]

    for q in queries:
        print("\n========================")
        result = hybrid_rag_pipeline(q)
        print(json.dumps(result, indent=2))

Output :


Explanation :

  • Observe that we are using LLM for retrieval techniques

# ==========================================
# 4. BM25 RETRIEVAL
# ==========================================
def bm25_retrieve(query, top_k=3):
    scores = bm25.get_scores(query.lower().split())
    ranked = np.argsort(scores)[::-1]
    return [documents[i] for i in ranked[:top_k]]

  • BM25 which is a keyword search, taking input as query and return top 3 results based on score
  • Splitting query, and getting scores for each word/token
  • Sorting the data based on score and storing in a variable called ranked 
  • returning top-k documents

# ==========================================
# 5. VECTOR RETRIEVAL
# ==========================================
def vector_retrieve(query, top_k=3):
    q_emb = embedding_model.encode(query).tolist()
    results = collection.query(query_embeddings=[q_emb], n_results=top_k)
    return results["documents"][0]

  • Semantic retrieval
  • It is also accepting user query and top_k information as input
  • Getting embeddings from query
  • Results are getting store in chroma DB

# ==========================================
# 6. RRF FUSION
# ==========================================
def rrf(rank_lists, k=60):
    scores = {}

    for rlist in rank_lists:
        for rank, doc in enumerate(rlist):
            scores[doc] = scores.get(doc, 0) + 1 / (k + rank)

    return sorted(scores, key=scores.get, reverse=True)

# ==========================================
# 7. HYBRID RETRIEVAL
# ==========================================
def hybrid_retrieval(query):

    # Step 1: Expand queries
    queries = expand_query(query)
    queries.append(query)

    all_rank_lists = []

    for q in queries:
        bm25_docs = bm25_retrieve(q)
        vec_docs = vector_retrieve(q)

        all_rank_lists.append(bm25_docs)
        all_rank_lists.append(vec_docs)

    # Step 2: Fuse rankings
    fused_docs = rrf(all_rank_lists)

    return fused_docs[:5]

  • Hybrid retrieval getting sub queries from expand_query() and adding them into variable queries
  • Original user query is also getting appended into same variable
  • Created empty list all_rank_lists to store search results
  • Started a loop based on no. of queries 
    • Looping through bm25 and semantic search until all queries search is complete
    • Appending search results into all_rank_lists
  • Passing these search results into RRF
  • returning top 5 results

# ==========================================
# 6. RRF FUSION
# ==========================================
def rrf(rank_lists, k=60):
    scores = {}

    for rlist in rank_lists:
        for rank, doc in enumerate(rlist):
            scores[doc] = scores.get(doc, 0) + 1 / (k + rank)

    return sorted(scores, key=scores.get, reverse=True)

Finally, we are asking some queries as below.

# ==========================================
# 11. TEST
# ==========================================
if __name__ == "__main__":

    queries = [
        "How to reduce investment risk?",
        "What is diabetes?",
        "Explain neural networks"
    ]

    for q in queries:
        print("\n========================")
        result = hybrid_rag_pipeline(q)
        print(json.dumps(result, indent=2))


Important point : If you observe above code carefully, especially BM25 search - we are not using vector DB for this type of search. 

BM25 doesn't require a vector DB. It is a standalone retrieval system based on inverted indexing, while vector DB enable semantic retrieval. In production RAG systems, both are often combined(hybrid search) to balance precision and recall.


Re-ranking strategies

Re-ranking in a RAG pipeline is where you take the initially retrieved documents and reorder them using a more accurate model so that the most relevant context goes first. 


Where it sits in RAG ?

User Query

   ↓

Retriever (BM25 / Vector / Hybrid)  → gets top N (e.g., 20)

   ↓

Reranker (cross-encoder / LLM scoring) → reorders those 20

   ↓

Top-K selection (e.g., 5)

   ↓

LLM (final answer generation)


How reranking works ?

Instead of storing documents independently, a reranker:

  • Looks at (query, document) pair together
  • Assigns a relevance score
Example:

Query : How to finetune LLM ?

Doc A : "Steps to train Neural Networks"
Doc B : "Fine tuning GPT models using LoRA"

  • Retriever might rank Doc A higher
  • Re-ranker correctly boosts Doc B to top


Without re-ranking :

  • LLM gets noisy context
  • Hallucinations increase
  • Answer quality drops
With re-ranking :
  • Better context precision
  • Lower token waste
  • More accurate responses


When to use what kind of LLM based re-ranking?

  • Simple system - LLM Score Based
  • Small dataset - Pair wise
  • Production RAG - List wise
  • Ambiguous queries - Query-aware 
  • Enterprise system - Hybrid (LLM Based)


Implementation of re-ranking :

import os
import json
import re
import chromadb
import numpy as np
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer
from langchain_openai import ChatOpenAI

load_dotenv()
assert os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

client = chromadb.Client()
collection = client.create_collection("rerank_demo")

documents = [
    "Diabetes affects blood sugar levels.",
    "Hypertension increases heart disease risk.",
    "Diversification reduces investment risk.",
    "Neural networks are used in deep learning.",
    "Cloud computing provides scalable infrastructure."
]

collection.add(
    documents=documents,
    embeddings=[embedding_model.encode(d).tolist() for d in documents],
    ids=[str(i) for i in range(len(documents))]
)


def retrieve(query, k=5):
    emb = embedding_model.encode(query).tolist()
    results = collection.query(query_embeddings=[emb], n_results=k)
    return results["documents"][0]


LLM Score Based re-ranking

# 🔍 1. Purpose of This Function

# 👉 Extract a numeric score (float) from a string

# Why needed?

# LLMs often return messy outputs like:

# "Score: 0.85 because it is relevant"

# 👉 You only need:

# 0.85

# 🔧 2. Line-by-Line Explanation

# 🔹 Step 1: Function Definition

# def extract_score(text):

# 👉 Input:

# text → string (LLM response)

# Example:

# text = "Score: 0.85 because it is relevant"

# 🔹 Step 2: Regex Search

# match = re.search(r"\d*\.?\d+", text)

# What is happening?

# 👉 Searching for the first number inside the text

# 🔍 Understanding the Regex Pattern

# r"\d*\.?\d+"

# Break it down:

# Part  Meaning
# \d*   0 or more digits
# \.?   optional decimal point
# \d+   at least one digit

# ✅ Matches Examples

# Input                Match
# "0.85"               0.85
# "Score: 1"            1
# "0.5 relevance"      0.5
# "Score = 10"         10

def extract_score(text):
    match = re.search(r"\d*\.?\d+", text)
    return float(match.group()) if match else 0.0


def rerank_score(query, docs):
    scored = []

    for doc in docs:
        prompt = f"""
        Score relevance from 0 to 1.
        Return only number.

        Query: {query}
        Document: {doc}
        """

        score = extract_score(llm.invoke(prompt).content)
        scored.append((doc, score))

    return [d for d, _ in sorted(scored, key=lambda x: x[1], reverse=True)]



Pairwise re-ranking :
Compare two docs - choose better

def pairwise_compare(query, doc1, doc2):
    prompt = f"""
    Which document is more relevant?

    Query: {query}

    A: {doc1}
    B: {doc2}

    Answer A or B.
    """

    return llm.invoke(prompt).content.strip()


def rerank_pairwise(query, docs):
    scores = {doc: 0 for doc in docs}

    for i in range(len(docs)):
        for j in range(i + 1, len(docs)):
            result = pairwise_compare(query, docs[i], docs[j])

            if "A" in result:
                scores[docs[i]] += 1
            else:
                scores[docs[j]] += 1

    return sorted(scores, key=scores.get, reverse=True)



List wise re-ranking :

LLM ranks all docs at once

def rerank_listwise(query, docs):
    docs_text = "\n".join([f"{i}: {d}" for i, d in enumerate(docs)])

    prompt = f"""
    Rank documents by relevance.

    Query: {query}

    Documents:
    {docs_text}

    Return ordered indices as JSON list.
    """

    response = llm.invoke(prompt).content

    try:
        order = json.loads(response)
        return [docs[i] for i in order]
    except:
        return docs




Query aware re-ranking :

Extract intent - rerank accordingly

def extract_intent(query):
    prompt = f"""
    Extract intent (short phrase).

    Query: {query}
    """
    return llm.invoke(prompt).content


def rerank_query_aware(query, docs):
    intent = extract_intent(query)

    scored = []
    for doc in docs:
        prompt = f"""
        Score relevance (0-1)

        Intent: {intent}
        Document: {doc}
        """
        score = extract_score(llm.invoke(prompt).content)
        scored.append((doc, score))

    return [d for d, _ in sorted(scored, key=lambda x: x[1], reverse=True)]



def rerank_hybrid(query, docs):
    q_emb = embedding_model.encode(query)

    scored = []
    for doc in docs:
        d_emb = embedding_model.encode(doc)

        sim = np.dot(q_emb, d_emb) / (np.linalg.norm(q_emb) * np.linalg.norm(d_emb))

        prompt = f"""
        Score relevance (0-1)

        Query: {query}
        Document: {doc}
        """

        llm_score = extract_score(llm.invoke(prompt).content)

        final_score = 0.5 * sim + 0.5 * llm_score
        scored.append((doc, final_score))

    return [d for d, _ in sorted(scored, key=lambda x: x[1], reverse=True)]



FULL PIPELINE (END-TO-END)

def rag_with_reranking(query, method="listwise"):

    # Step 1: Retrieval
    docs = retrieve(query)
    print("📄 Retrieved:", docs)

    # Step 2: Reranking
    if method == "score":
        ranked = rerank_score(query, docs)
    elif method == "pairwise":
        ranked = rerank_pairwise(query, docs)
    elif method == "listwise":
        ranked = rerank_listwise(query, docs)
    elif method == "query":
        ranked = rerank_query_aware(query, docs)
    else:
        ranked = rerank_hybrid(query, docs)

    print("⭐ Reranked:", ranked)

    # Step 3: Generation
    context = "\n".join(ranked[:3])

    prompt = f"""
    Answer using context only.

    Context:
    {context}

    Query:
    {query}
    """

    answer = llm.invoke(prompt).content

    return {
        "query": query,
        "top_docs": ranked[:3],
        "answer": answer
    }


if __name__ == "__main__":
    result = rag_with_reranking(
        "How to reduce investment risk?",
        method="listwise"
    )

    import json
    print(json.dumps(result, indent=2))


Output :


Important points to remember :

  • We have seen entire RAG pipeline i.e. Retrieval Augmentation Generation
  • Discussed about all the steps involved in Retrieval, Augmentation, Generation
  • One myth people always assume is RAG is completed here, BUT NO
  • We should evaluate with from parameters to confirm that we are getting best results
  • Most of the AI engineers fail here to explain this stuff.
  • Let us see what is it.

Evaluation Metrics

Evaluation Metrics are classified into two categories

  • Retrieval Metrics
    • Precision@K
    • Recall@K
    • MRR (Mean Reciprocal Rank)
    • NDCG@K
    • Context Relevance
  • Generation Metrics
    • Faithfulness
    • Answer Relevancy
    • Groundedness

Being a production grade AI engineers, we should be in a position to explain all these techniques. Remember that retrieval metrics are classified during Retrieval Metrics and generation metrics are classified during Generation Metrics.

There are frameworks like RAAGAS, Truelens etc. to perform same steps but better learn hard way of implementing these steps. Lets see how these metrics work.


Retrieval Metrics 

  • We will apply below techniques after retrieval (post re-ranking)

1) Precision@K 

What is K ? If the end user is asking top 3 / 5 / 10 results - this value is nothing but K (This K value is after re-ranking)

User question : What is the eligibility criteria for a home loan ?

Result : 



Formula for precision is as below:

Precision@K, Precision@5 = Relevance documents in top K / K = 3 / 5 = 0.6

Means, 60% of retrieved documents are useful. 

Industry standards of Precision@K :

  • 0.8 - 1.0  - Excellent
  • 0.6 - 0.8  - Acceptable
  • < 0.6  - Irrelevant (means system is performing very poor)
Then what is the preventing technique ?
  • Apply Pre-Filters (Meta data filters, Product type filters etc.)
  • Improve Re-Ranking
  • Reduce K value - Instead of 5, go with 3 (This will be controlled by end user)


2) Recall@K

User question : What is the eligibility criteria for a home loan ?

Ground Truth is (Total relevance documents are 5) : Get them from Golden Data Set

  • Salary
  • Credit score
  • Age
  • Employment Type
  • Existing loans

But our system retrieved, 3 relevant documents. But ground truth is 5.

Formula for Recall@K = Relevant documents in top K / total no. of relevant documents = 3 / 5 = 0.6

Means, 60% of retrieved documents are useful

Industry standards of Recall@K :

  • 0.8 - 1.0  - Excellent
  • 0.6 - 0.8  - Acceptable
  • < 0.6  - Irrelevant (means system is performing very poor)

Then what is the preventing technique ?

  • Increase K value
  • Use Hybrid search (BM25 + Semantic)
  • Add Query Expansion


3) Mean Reciprocal Rank (MRR)

User question : What is EMI ?

Result :


 Formula for MRR = 1 /  Rank of first relevant result = 1 / 2 = 0.5

Means, 50% of retrieved documents are useful 

Interpretation of MRR is - Correct answer is not always an immediate answer (But it should be Rank-1)

Industry standards of MRR :

  • > 0.8   - Correct answer is usually at rank-1
  • 0.5 - 0.8  - OK
  • < 0.5  - Poor Ranking

Then what is the preventing technique ?

  • Improve reranking strategy
    • Try multiple reranking mechanisms and see output
  • Tune your embedding model
    • We use embeddings at 2 places
      • 1st while creating knowledge base
      • 2nd while processing user query
    • Try using one model at both places like open AI small model etc. and observe results
    • Try another model at both places and compare results with the results of first model
    • Keep experimenting until you get good results and fix that model


4) NDCG@K

User question : Best way tor educe home loan interest ?

Result :



Step1 : Calculate DCG@K


Step2 : Calculate IDCG@K



Step3 : Calculate NDCG

                    NDCG@K = DCG@K / IDCG@K = 3.76 / 4.76 = 0.79


Interpretation : 79% ranking is decent but not optimal

Industry standards of NDCG :

  • > 0.9   - Near perfect ranking
  • 0.7 - 0.9  - Good ranking
  • < 0.7  -  Ranking problem

How to improve this quality ?

  • Improve reranking strategy
  • Improve relevance labelling
How to label condition ?
  • Write appropriate prompt, to attach label value to each document for the relevant data after reranking

This is how relevance labelling in NDCG.


5) Context Relevancy

User question :  How to improve credit score ?

Result :



Formula for context relevancy = Relevant chunks / Total number of chunks = 3 / 5 = 60%

Interpretation :  40% of noise in retrieved context


Industry standards of Context Relevancy:

  • > 0.8   - Clean context
  • 0.6 - 0.8  - Some noise is associated
  • < 0.6  -  Noisy retrieval

How to improve the quality of context relevancy ?

  • Select best chunking strategy 
  • Add semantic / pre filters (in meta data)
  • Use appropriate reranking strategy


Important points about evaluation techniques in RAG :

  • Above 5 evaluation techniques are related to retrieval process in RAG
  • We should calculate these metrics after RAG retrieval (post reranking) to confirm that we built a better RAG system and it will help to shows metrics to client
  • If we are not clear on above 5 techniques, then we are not building production grade RAG - it will be just like a toy project


Generation Metrics  

  • Faithfulness
  • Answer Relevancy
  • Groundedness


1) Faithfulness 

User question : How to improve my credit score ?

Context that we got as part of Retrieval + Augmentation is :

  • Pay EMI on time
  • Reduce credit card utilization

LLM Response : 

Pay EMI's on time, reduce credit card utilization and invest in gold

Pay EMI on time & reduce credit card utilization are from context BUT invest is gold is generated by LLM.


Formula for Faithfulness = Supported claims / Total claims = 2 / 3 = 0.67

Means, 33% of data is hallucinated data. If we present 33% hallucinated of data to customer, then they won't be happy. 

Benchmarks :

  • > 0.9  - Very safe
  • 0.7 - 0.9 - Minor Issues
  • < 0.7 - hallucinated data

How to prevent hallucination :

  • Write strict prompting as below
    • You answer only from context
    • DO NOT generate hallucinated answers
  • Reduce temperature value
    • Go towards more deterministic answers (< 0.5)


2) Answer Relevancy 

User question : How to improve my credit score ?

Answer : Credit score is calculated using your financial history 

Formula for Answer Relevancy = Similarity (User Query, LLM response)

Benchmarks :

  • > 0.85  - Strong Alignment
  • 0.6 - 0.85 - Partial
  • < 0.6 - Wrong Answer
How to increase Answer Relevancy ?
  • Improve Query Reformulation
  • Query Intent - Fast Fail (Understand the intent, if it is meaningful then only move to next step)
  • Improve prompt instructions according to user query



3) Groundedness 

User question : How to improve my credit score ?

Context : 

  • Pay EMIs on time
  • Reduce the utilization of credit card
LLM response :
  • Pay EMI, Reduce utilization of credit card, avoid loans & invest in stocks
Formula for Groundedness = Grounded Tokens / Total number of tokens = 6/10 = 0.6

(Assume, total tokens = 10 & Grounded tokens = 6)

Interpretation : 40% of answer is not supported

Benchmarks :

  • > 0.9 - Fully grounded
  • 0.7 - 0.9 - Mostly grounded
  • < 0.7 - Unsafe

How to increase Groundedness :

  • Force context only answers in prompt
  • Add Retrieval citations (Citations will be produced by LLM)


Implementation :

import os
import json
import numpy as np
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from sentence_transformers import SentenceTransformer
import chromadb

load_dotenv()
assert os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

client = chromadb.Client()
collection = client.create_collection("rag_eval")

documents = [
    "Diabetes affects blood sugar levels.",
    "Hypertension increases heart disease risk.",
    "Diversification reduces investment risk.",
    "Neural networks are used in deep learning."
]

collection.add(
    documents=documents,
    embeddings=[embedding_model.encode(d).tolist() for d in documents],
    ids=[str(i) for i in range(len(documents))]
)

# GOLDEN DATASET
golden_data = [
    {
        "query": "What is diabetes?",
        "relevant_docs": ["Diabetes affects blood sugar levels."],
        "answer": "Diabetes affects blood sugar levels."
    },
    {
        "query": "How to reduce investment risk?",
        "relevant_docs": ["Diversification reduces investment risk."],
        "answer": "Diversification reduces investment risk."
    }
]

def retrieve(query, k=2):
    emb = embedding_model.encode(query).tolist()
    results = collection.query(query_embeddings=[emb], n_results=k)
    return results["documents"][0]

def precision_at_k(retrieved, relevant, k):
    retrieved_k = retrieved[:k]
    rel = sum([1 for doc in retrieved_k if doc in relevant])
    return rel / k

def recall_at_k(retrieved, relevant, k):
    retrieved_k = retrieved[:k]
    rel = sum([1 for doc in retrieved_k if doc in relevant])
    return rel / len(relevant)

def ndcg_at_k(retrieved, relevant, k):
    dcg = 0
    for i, doc in enumerate(retrieved[:k]):
        if doc in relevant:
            dcg += 1 / np.log2(i + 2)

    idcg = sum([1 / np.log2(i + 2) for i in range(min(len(relevant), k))])

    return dcg / idcg if idcg > 0 else 0

import re

def extract_score(text):
    try:
        # Extract first float number
        match = re.search(r"\d*\.?\d+", text)
        if match:
            return float(match.group())
    except:
        pass

    return 0.0  # fallback

def faithfulness(query, answer, context):
    prompt = f"""
    Score from 0 to 1.

    Return ONLY a number.

    Context:
    {context}

    Answer:
    {answer}
    """

    response = llm.invoke(prompt).content
    return extract_score(response)

def answer_relevancy(query, answer):
    prompt = f"""
    Score from 0 to 1.

    Return ONLY a number.

    Query:
    {query}

    Answer:
    {answer}
    """

    response = llm.invoke(prompt).content
    return extract_score(response)

def context_precision(query, context):
    prompt = f"""
    Score from 0 to 1.

    Return ONLY a number.

    Query:
    {query}

    Context:
    {context}
    """

    response = llm.invoke(prompt).content
    return extract_score(response)

def context_recall(query, context, golden_answer):
    prompt = f"""
    Score from 0 to 1.

    Return ONLY a number.

    Context:
    {context}

    Expected Answer:
    {golden_answer}
    """

    response = llm.invoke(prompt).content
    return extract_score(response)

def generate_answer(query, context):
    prompt = f"""
    Answer using context only.

    Context:
    {context}

    Query:
    {query}
    """

    return llm.invoke(prompt).content


def evaluate_rag(golden_data):

    results = []

    for item in golden_data:

        query = item["query"]
        relevant_docs = item["relevant_docs"]
        golden_answer = item["answer"]

        retrieved_docs = retrieve(query, k=2)
        context = "\n".join(retrieved_docs)

        answer = generate_answer(query, context)

        # Retrieval Metrics
        p = precision_at_k(retrieved_docs, relevant_docs, k=2)
        r = recall_at_k(retrieved_docs, relevant_docs, k=2)
        ndcg = ndcg_at_k(retrieved_docs, relevant_docs, k=2)

        # Generation Metrics
        faith = faithfulness(query, answer, context)
        ans_rel = answer_relevancy(query, answer)

        # Context Metrics
        ctx_p = context_precision(query, context)
        ctx_r = context_recall(query, context, golden_answer)

        results.append({
            "query": query,
            "precision@k": p,
            "recall@k": r,
            "ndcg@k": ndcg,
            "faithfulness": faith,
            "answer_relevancy": ans_rel,
            "context_precision": ctx_p,
            "context_recall": ctx_r
        })

    return results


if __name__ == "__main__":
    results = evaluate_rag(golden_data)

    import json
    print(json.dumps(results, indent=2))


Output :

[
  {
    "query": "What is diabetes?",
    "precision@k": 0.5,
    "recall@k": 1.0,
    "ndcg@k": 1.0,
    "faithfulness": 0.8,
    "answer_relevancy": 0.8,
    "context_precision": 0.3,
    "context_recall": 0.5
  },
  {
    "query": "How to reduce investment risk?",
    "precision@k": 0.5,
    "recall@k": 1.0,
    "ndcg@k": 1.0,
    "faithfulness": 1.0,
    "answer_relevancy": 0.8,
    "context_precision": 0.8,
    "context_recall": 1.0
  }
]

Conclusion :

  • That's all about RAG and RAG metrics
  • I will see you guys in my next MCP blog !
  • Automated Frameworks 
    • Raagas
      • Documentation for Raagas is available at : https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/
    • Truelens
      • Documentation for Truelens is available at : https://www.trulens.org/


LLM Fine Tuning : 

  • PEFT - Partial Fine Tuning
    • LoRA
    • QLORA
  • FFT - Full Fine Tuning

And Agent, RAG finetuning are different. RAG fine tuning we discussed above. LLM fine tuning we are yet to discuss.


Please find some advanced reranking techniques as below :







Thank you for reading this blog !

Arun Mathe

Comments

Popular posts from this blog

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...

(AI Blog#1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space. What is Artificial Intelligence, Machine Learning & Deep Learning ? AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.  BTW, starting from this blog, it is not just a technical journal, we talk about internals here. ...

Spark Core : Understanding RDD & Partitions in Spark

Let us see how to create an RDD in Spark.   RDD (Resilient Distributed Dataset): We can create RDD in 2 ways. From Collections For small amount of data We can't use it for large amount of data From Datasets  For huge amount of data Text, CSV, JSON, PDF, image etc. When data is large we should go with Dataset approach     How to create an RDD ? Using collections val list = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(list) SC is Spark Context parallelize() method will convert input(collection in this case) into RDD Type of RDD will be based on the values assigned to collection, if we assign integers and RDD will be of type int Let's see below Scala code : # Created an RDD by providing a Collection(List) as input scala> val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23 # Printing RDD using collect() method scala> rdd.collect() res0: Array[Int] = Array(1, 2, 3, 4...