Skip to main content

(AI Blog#2) Building a Real-World Neural Network: A Practical Use Case Explained

This blog will explain a clear picture on what will happen inside a Neural Network(NN).  But before going through NN, we need to have some knowledge on some of the basic concepts in Calculus(Maths) & architecture of a Neural Network. 

Note : 
I recommend you to read the following blog(link mentioned below) and then start reading this blog.
Lets start with Derivatives.

Derivatives : 
                Derivatives are originally a core concept of calculus (maths). They answer one question which is “How fast is something changing?” 

Why derivatives appear in Machine Learning ?
  • Machine Learning uses Math as its foundation.
  • In ML, derivatives help answer : If I slightly change a ML model parameter, how will the error change ? (just ignore what is error in this context, understand that if we change some value/parameter in a ML model then how does it impact something else ?)
  • This is exactly what we need to 'train' the ML models 
  • We use this in a concept called "Backpropagation" which we are going to discuss in this blog
Steps involved  :
  1. Model makes a prediction
  2. Calculate loss (error)
  3. Use derivatives to find :
    1. Which weight caused more error ?
    2. How much should each weight change ?
Simple example :
  • Think of standing on a hill ⛰️ and you want to reach the lowest point:
    • Derivative tells:
      • Which direction is downhill
      • How steep it is
  • ML does the same
    • Hill = loss function
    • Lowest point = Best model
    • Derivative =  Direction to update weights
Note : Don't worry if you don't understand anything except what is derivative ! This entire blog talks about it. I am 100% sure, by the end of this blog each and every doubt will be clarified. This concept is like this, bit complex and needs multiple rounds of reading to digest the context.


Formulas that we need to learn in Derivatives :

  • d/dx(xⁿ) = n·xⁿ⁻¹
    • This is called Power rule
    • Examples :
      • d/dx(x⁵) = 5x⁴ 
      • d/dx(x³) = 3x²
      • d/dx(x²) = 2x 
      • d/dx(x)  = 1  
      • d/dx(x⁻²) = -2x⁻³
      • d/dx(√x) = (1/2)x⁻¹ᐟ²

  • d/dx(uv) = u * dv/dx + v * du/dx
    • Examples :
      • As per formula : d/dx(uv) = u dv/dx + v du/dx
      • Please consider x² is u and x³ is v and apply the formula
      • d/dx (x² · x³) = x²(3x²) + x³(2x) = 3x⁴ + 2x⁴ = 5x⁴​

  • d/dx(g(x)ⁿ) = n·g(x)ⁿ⁻¹·g'(x)
    • This is called Chain rule
    • Example :
      •  d/dx(3x² + 5)⁴ = 4(3x² + 5)³ · d/dx(3x² + 5) = 4(3x² + 5)³ · 6x = 24x(3x² + 5)³
      • We may confuse and apply "d/dx(xⁿ) = n·xⁿ⁻¹" BUT 
        • this example is like more of a function, g(x)^n instead of simple x. Isn't it ?
        • hence we need to apply chain rule and calculate both n·g(x)ⁿ⁻¹ & g'(x)
        • We need to be clear on when to apply Power rule & Chain rule
          • Power rule comes into picture when we have only X, not function
          • Chain rule comes into picture when we have a function like g(x) or f(x)
          • Let's say, we have d/dx(3x^4+2x^2-7), then it is not power rule or chain rules here. This is a polynomial(sum of terms), hence we need to differentiate term by term and then apply power rule as below.
            • d/dx(3x⁴ + 2x² − 7) = d/dx(3x⁴) + d/dx(2x²) − d/dx(7) = 12x³ + 4x

  • d²/dx² = d/dx ( d/dx )
    • It means taking the derivative of a derivative
    • Example :
      • Function : f(x) = 4x³
      • First derivative : d/dx (4x³) = 12x²
      • Second derivative : d/dx (12x²) = 24x
      • Therefore : d²/dx² (4x³) = 24x 

  • ∂²f / ∂x ∂y 
    • Then ∂f/∂x → differentiate w.r.t x (keep y constant)
    • Then ∂/∂y ( ∂f/∂x ) → differentiate again w.r.t y
I believe, you are clear about derivatives and above formulas. Incase if you still not confident on using above formulas then I recommend you to practice more examples online to have some command on these concepts. 

Now, lets understand what is PyTorch & one of it's functionality called "Autograd". 

PyTorch :
  • PyTorch is a open source deep learning framework developed by Meta that is used to build, train and deploy neural networks easily
  • It is a python library, helps you write deep learning models
  • It provides
    • Tensors(like NumPy but faster + GPU support)
    • Automatic differentiation
    • Neural Network building blocks
    • Tools to train models
Note : I am preparing respective python code in Google colab notes, to showcase :
  • How to enable GPU for PyTorch
  • How to import torch module
  • How to create a Tensor
  • How to enable Autograd etc
I will attach the programs here in some time.

Autograd :
  • We have seen derivates and its usage in Neural Networks right ? That's conceptual.
  • Autograd will help us to achieve same functionality programmatically using PyTorch module
  • This is PyTorch automatic differentiation engine
  • It automatically calculates gradients of tensors during backpropagation.
  • In simple words, Autograd tracks all operations on tensors and computes derivatives automatically.
  • We will see what is gradient, tensor and backpropagation in this blog
We will use this Autograd functionality in LLM's as well during a concept called Backward propagation. 
  • In our previous blog https://arunsdatasphere.blogspot.com/2026/01/deep-learning-and-neural-networks.html
    • We have learnt what is a weight and bias of a neuron in Neural Network
    • Using this backward propagation, we will adjust the values of weights and biases
    • Initially the values of weights and biases are random values, after backward propagation these values will be adjusted and we will do it using autograd functionality
    • To understand the Autograd functionality, we have to know derivates. That's the reason, I explained derivatives first and then started Autograd.
Look at this equation : 
  • Y = 2x + 3
    • In PyTorch, Autograd automatically calculate "Gradients" (means derivatives)
    • Mathematical representation of "Gradients" is derivatives
    • Where exactly we are going to calculate "Gradients" ? in the "Training Phase" of ML model
    • Autograd is nothing but a automatic differentiation engine, internally it is going to create a computation graph (like DAG in Spark)
    • What is meant by computation graph and how it work ? :
      • Assume x = 2
      • Y = 2x + 3
        • Now x is multiplied with 2 
          • Step1 : 2 * 2 (2x)
          • Step2 : 4 added with 3 (2x + 3)
          • Step3 : 7
          • Y value is 7
        • Internally Autograd will create a computation graph which includes above steps, we call it grapcolabh it is step by step process
Lets see same in code :

Example 1 :

import torch

# Enabling autograd using flag requires_grad=True
x = torch.tensor(2.0, requires_grad=True)

# 2x +3
y = 2*x + 3

print(y)
# Output
# tensor(7., grad_fn=<AddBackward0>)

# It started backward propagation and goes until leaf node, nothing but starting point of computational graph
y.backward()

# print x.grad to see the gradient
# Please note, unless you execute y.backward(), backward propagation won't start
print(x.grad)
# Output : tensor(2.) because derivative of (2x + 3) is 2

Example 2 : We need to apply chain rule here as it is a function, g(x) i.e. d/dx(2x+3)²

import torch

x1 = torch.tensor(3.0, requires_grad=True)
y = (2*x1 + 3)**2
print(y) # tensor(81., grad_fn=<PowBackward0>)
y.backward() # backward propagation initiated
print(x1.grad) # Printing gradinet of x :  tensor(36.)
x1.grad.zero_
# if we don't set x1.grad.zero_ then
# result will be accumulated each time you run this program

"""
Example:
Find d/dx(2x+3)²

We apply the formula:

d/dx[g(x)ⁿ] = n[g(x)]ⁿ⁻¹ · g'(x)

Step 1: Identify inner function
g(x) = 2x + 3

Step 2: Identify power
n = 2

Step 3: Apply formula

d/dx(2x+3)²
= 2(2x+3)²⁻¹ · d/dx(2x+3)

= 2(2x+3)¹ · 2

Step 4: Simplify

= 4(2x+3)

Final Answer:
d/dx(2x+3)² = 4(2x+3)

"""

One quick question : 
  • Is this backward propagation happening at neuron level or layer level or entire Neural Network ?
    • backward propagation updates all weights & biases across every layer of neural network, not just the output layer
    • See below image for visual clarity

What is a Regression Vs Classification problem ?
  • Lets us consider, we have 2 input values x1, x2 and target value y
    • If our target value contains multiple values or continuous data then it is a Regression problem
    • If our target consists of categorical data, then it is called Classification problem
  • Examples :
    • Predicting the salaries of employees is a Regression problem (because we can't categorize salaries, they can vary with no limitation)
    • Predicting an email is spam or not is a Classification problem(because it has only 2 categories, spam or not spam)

Let us solve a regression problem to understand the entire flow of a Neural Network.

Problem statement : Predict house prices (in $1000) based on below 2 features
  • Features are input values :
    • x1 = House Size
    • x2 = No. of Bedrooms
  • Output
    • y = House Price (Output which continuous data, hence regression)

Now, let's remember this generic formula :
  • Wᵏᵢⱼ where 
    • k = no. of hidden layer we are hitting
    • i = input neuron
    • j = hidden neuron
  • remember this formula to notate weight values for all the above connections 
  • See below image, assigned weights for all connections as per above formula

Understanding Non-linearity and activation function :
  • We are going to deal with complex neural networks which are represented with curved lines in n dimensional space which is called non-linearity.
  • For non-linearity, we have different activation functions available like
    • ReLu - most popular
    • Sigmoid - binary classification
    • Tanh - centered data
    • Softmax - multiclass output
  • In simple words :
    • After computing Z = w1x1 + w2x2 + b (as per above neural network diagram)
    • We apply activation function, a = f(Z), where f() is activation function to add non-linearity 
    • Which helps model to learn complex relationships
  • We use ReLu (Rectified linear unit)
    • ReLu(x) = max(0, x)
    • Means if x <= 0 then ReLu is 0 else if x > 0 then ReLu is x
  • Real world analogy
    • For electrical switch :
      • if INPUT is <= 0 ; OFF
      • if INPUT is > 0 ; ON
  • ReLU (Rectified Linear Unit) outputs zero for negative inputs and passes positive values as-is, making it fast, efficient, and widely used in deep learning.
Understanding Loss function :
  • A loss function measures how wrong the models prediction is compared with actual answer
    • Loss = Error
  • As we have multiple types of activation functions, we do have multiple loss functions
    • We use MSE (Mean Squared Error)
      • MSE = (ypred - ytrue)^2 ; where
        • ypred is predicted value of Y
        • ytrue is actual value of Y
      • and we are squaring it just to avoid -ve values

Lets take sample data : 
  • This problem will help us any Neural Network problem
  • Lets take 2 records of sample data and solve this problem using NN
  • At the end of model, we will calculate predicted value which is Y^ (we call as Y hat)
  Records  X1 (Size of House)      X2 (No of Bed rooms)                Y (Price)        Y^ ?
    R1                       1.5                                3                                        250
    R2                       2.0                                4                                        320          

Next step is, whenever we are going to execute a Regression or Classification problem, we have to execute some steps.
  • Forward propagation (involved 2 steps as below)
    • Linear transformation, use formula, Z = w1x1+w2x2+b to calculate
    • And then apply activation function
      • We use ReLu activation function which is ReLu(Z)
    • Output of Forward propagation is predicted value which is Y^ (Y hat)
  • Calculate Loss 
    • This Loss is based on whether the problem is Regression is Classification
    • We are dealing with Regression, and we discussed MSU type of loss above
    • Loss = (ypred - ytrue)^2
  • Backward propagation
    • Internally we are calculating the gradient here (derivatives as discussed above)
  • Adjust the weights and biases by using below formulas
    • New Weight : W_new = W_old − η · ( ∂L / ∂W_old )
    • New Bias : B_new = B_old − η · ( ∂L / ∂B_old )
    • Where η is learning rate, derivate of Loss with respect to derivative of Weight old
Now, lets apply above steps for sample data that we considered above. Considering first record(R1) for this example. Note, we are not acting on R2.

Step1 : Forward propagation 
  • We need to apply linear transformation, which is Z = w1x1+w2x2+b
    • Understand what data we have at this point, and what we need
    • We have input values for record, R1 (x1 = 1.5, x2 = 3)
    • Now we need to define weights & biases as random values to start this process
    • Below are weights & biases for input layer to hidden layer
      • Below are random weights (randomly taken, could be any values)
        • [W11  W12] = [0.5   0.3]
        • [W21  W22] = [-0.2  0.4]
      • Biases (these are input to 1st hidden layer)
        • [b1] = [0.1]
        • [b2] = [0.2]
    • Below are weights & biases for hidden layer to output layer
      • Weights are
        • [W1   W2] = [0.7   0.6]
      • Biases are
        • b3 = 0.3
    • Now observe all the random values like input values, weights and biases are ready
    • Neural Network looks as below at this stage

    • Lets calculate the forward propagation
      • x1 = 1.5, x2 = 3
    • Lets calculate Z = w1x1 + w2x2 + b
      • But we have 2 hidden nodes right ? Hence we need to calculate 2 equations
      • For h1, Z1 = W11*x1 + W21*x2 + b1 = (0.5 * 1.5) + (-0.2 * 3) + 0.1 = 0.25
      • For h2, Z2 = W12 * x1 + W22 * x2 + b2 = (0.3 * 1.5) + (0.4 * 3) + 0.2 =  1.85
    • Now, we have to apply activation function for both Z1, Z2 (adding non linearity)
      • We are using ReLu activation function where ReLu(Z) = max(0, Z)
      • ReLu of :
        • Z1 is ReLu(0.25) = 0.25
        • Z2, ReLu(1.85) = 1.85
    • Understand that we completed the portion on (input layer + hidden layer) as of now and we have to repeat same process for (hidden layer + output layer)
      • Y^(Y hat) = W1*h1 + W2*h2 + b3 (we have only one neuron at o/p layer)
      • W1 & W2 represents 2nd random weights that we considered above, 
      • [W1   W2] = [0.7   0.6] & h1, h2 are ReLu(Z1), ReLu(Z2) i.e [0.25, 1.85]
      • Y^(Y hat) = (0.7 * 0.25) + (0.6 * 1.85) + 0.3 = 1.585
      • Y^ = 1.585
    • During problem statement, we stated that house price is in $1000's of USD, hence need to multiple Y^ with 1000
    • Y^ = 1.585 * 1000 = 1585 
      • Note that for record R1, actual price of house is 250
    • We completed step1 now

Step2 : Calculate Loss 
  • Loss = (Y^ - Y)**2 = (1.585 - 250)**2 = 61,710.0122
    • You may doubt that we need to mention Y^ as 1585 (but No, that's for our understanding), incase if we have mention Y^ as 1585 then we need to multiple 250 ass well with $1000
    • Ignore above point if you didn't get the doubt of adding Y^ as 1585(instead of 1.585)
  • This is the actual gap between predicted price and actual price !


Step3 : Backward propagation
  • This step is very important
    • Initially, we considered random values for weights, biases for all connections and neurons and calculated Loss
    • Now we have to fin the derivative of Loss based on each and every weight, bias in the entire neural network
    • This is how we minimize the loss
    • This is a very tidious process and this is what happen in the core of Neural Network with 100's and 1000's of neurons with multiple hidden layers
    • This is why we need GPU, TPU's for AIML because of huge numbers of mathematical calculations happening in parallel
    • PLEASE TRY TO UNDERSTAND THIS ENTIRE PROCESS CAREFULLY TO HAVE A SOUND KNOWLEDGE OF NEURAL METWORKS.  


  • Now we need to find the derivatives in backward propagation
  • As per above diagram, backward propagation start from Loss and adjust each value accordingly using derivatives in reverse order
  • Now lets calculate the gradients/derivatives from Loss --> Y^ (Y hat)
    • Nothing but finding the derivative of Loss based on the derivative of Y^ 
  • Formula for Loss calculation is, Loss = (Y^ - Y)**2 
    • Derivative of Loss (from Loss to Y^), d(Loss)/d(Y^)
    •  d/dY^((Loss)) = d /dY^((Y^ - Y)**2)) (replaced Loss with above Loss formula)
    • We need to apply chain rule, d/dx(g(x)ⁿ) = n·g(x)ⁿ⁻¹·g'(x)
    • Hence d /dY^((Y^ - Y)**2)) = 2(Y^-Y)*d/dy^(y^-y) = 2(y^-y)*1 = 2(y^-y)
      • we are calculating wrt y^, hence d/dy^(y^-y) = 1-0 = 1
    • Now Derivate of Loss wrt derivative of y^ = 2(y^-y) = 2(1.585 - 250) = -496.83
      • as we calculated, y^ =1.585 & y = 250 
      • Backward propagation from Loss to y^ is -496.83 ( this is dL/dy^)
  • Now lets calculate the gradients/derivatives for output layer
    • We need to calculate 3 things
      • dL/dw1
      • dL/dw2
      • dL/db3
    • But Loss is not directly connected to W1, W2, d3 right ? there is a intermediate step called y^; Hence path will be :
      •  Loss-------> y^ ----------> W1
      • Loss-------->y^------------>W2
      • Loss-------->y^------------>b3
    • Hence chain rule applies here as follows : (as we calculated above dL/dy^ = -496.83)
      • dL/dw1 = dL/dy^ * dy^/dw1 
      • dL/dw2 = dL/dy^ * dy^/dw2
      • dL/db3 = dL/dy^ * dy^/db3
    • So we need to calculate 
      • dy^/dw1 = 
      • dy^/dw2
      • dy^/db3
    • But during Forward propagation, y^ = w1h1+w2h2+b3
        • dy^/dw1 = 0.25 (see below steps)
          • dy^/dw1 =d/dw1(w1h1+w2h2+b3)
          • =d/dw1(w1h1)+d/dw1(w2h2)+d/dw1(b3)
          • =d/dw1(w1h1) 
          • =h1 but h1 is 0.25 dy^/dw1 = 0.25 (see below steps)
        • dy^/dw2 =  1.85 (see below steps)
          • dy^/dw2 = d/dw2(w1h1+w2h2+b3)
          • = d/dw2(w1h1)+d/dw2(w2h2)+d/dw2(b3)
          • =d/dw2(w2h2)
          • =h2 but h2 is 1.85
        • dy^/db3 = 1
          • dy^/db3 = d/db3(w1h1+w2h2+b3)
          • =d/db3(b3)
          • =1
    • Finally :
      • dL/dw1 = dL/dy^ * dy^/dw1 = -496.83 * 0.25 = -124.20
      • dL/dw2 = dL/dy^ * dy^/dw2 = -496.83 * 1.85 = -919.13
      • dL/db3 = dL/dy^ * dy^/db3 =  -496.83 * 1 = -496.83
  • Until now, we calculated gradients until hidden layer and we need to calculate derivatives for rest of the weights, biases in the neural network
  • Lets calculate for h1, h2
    • dL/dh1 = dL/dy^ * dy^/dh1 = -496.83 * 0.7 = -347.78
    • dL/dh2 = dL/dy^ * dy^/dh2 = -496.83 * (w2) = -496.83 * 0.6 = -298.10
    • Note dL/dy^ = -496.83
      • dy^/dh1= d/dh1(y^)
      • =d/dh1(w1h1+w2h2+b3)
      • =d/dh1(w1h1)   (ignore others as they are constant and zerod)
      • =w1
      • =0.7
    • as per NN, path is
      • Loss --> y^ --> h1
      • Loss --> y^ --> h2
  • Remember, we applied ReLu for h1, h2 during forward propagation. Hence we need to find out derivatives for ReLu as well.
    • if Z > 0  : derivative  = 1
    • if Z <= 0 : derivative = 0
    • dL/dz = dL/dh * dh/dz
    • and we have to calculate it for both z1, z2
    • Note : 
      • z1 = 0.25 (apply above der(ReLu) then it will be 1)
      • z2 = 1.85 (apply above der(ReLu) then it will be 1)
    • Now 
      • dL/dz1 = dL/dh1 * dh1/dz1 = -347.78 * 1 = -347.78
      • dL/dz2 = dL/dh2 * dh2/dz2 = -298.10 * 1 = -298.10
      • But we know the values of dL/dh1 & dL/dh2 i.e (-347.78 & -298.10)
      • We need to calculate :
        • dh1/dz1 = d/dz1(0.25) = 1
        • dh2/dz2 = d/dz2(1.85) = 1
  • Final step
    • Remember generic formula, Z = w1x1 + w2x2 + b
    • AND
      • z1 = w11x1+w21x2+b1
      • z2 = w12x1+w22x2+b2
    • For w11
      • dL/dw11 = dL/dz1 * dz1/dw11 (we calculated dL/dz1 = -347.78)
      • Need to calculate dz1/dw11 = d/dw11(z1)=d/dw11(w11x1+w21x2+b1)=x1=1.5
      • Now dL/dw11 = dL/dz1 * dz1/dw11 = -347.78 * 1.5 = -521.67
    • For w21
      • dL/dw21 = dL/dz1 * dz1/dw21 (we calculated dL/dz1 = -347.78)
      • Need to calculate dz1/dw21=d/dw21(z1)=d/dw21(w11x1+w21x2+b1)= x2= 3
      • Now dL/dw21 = dL/dz1 * dz1/dw21 = -347.78 * 3 = -1043.34
    • For b1
      • dL/db1 = dL/dz1 * dz1/db1 (but dL/dz1 is -347.78)
      • Need to calculate dz1/db1 = d/db1(z1)=d/db1(w11x1+w21x2+b1)=1
      • Now dL/db1 = dL/dz1 * dz1/db1 = -347.78 * 1 = -347.78
    • We completed all values for z1, need to calculate z2
    • For W12
      • dL/dw12 = dL/dz2 * dz2/dw12 (we calculated dL/dz2 = -298.10 )
      • dL/dw12 = (-298.10 * dz2/dw12)= (-298.10 * 1.5) = -444.05
      • dz2/dw12=d/dw12(z2)=d/dw12(w12x1+w22x2+b2)=x1=1.5 
    • For W22
      • dL/Dw22 = dL/dz2 * dz2/dw22= (-298.10 * 3)= -888.6
      • dz2/dw22= d/dw22(z2)=d/dw22(w12x1+w22x2+b2)=x2=3
    • For b2
      • dL/db2 = dL/dz2 * dz2/db2= (-298.10 * dz2/db2)=(-298.10 * 1)=-298.10
      • dz2/db2 =d/db2(z2)=d/db2(w12x1+w22x2+b2)=1

Step4 : Adjust weights & Biases
  • W_new = W_old − η · ( ∂L / ∂W_old )
  • Lets assume η = 0.001
  • Lets calculate W1_new
    • W1_new = 0.7 - (0.001) * dL/dw1_old = 0.7 - (0.001) * (-124.21)= 0.8242
    • Old w11 value is 0.7, and the weight is adjusted to 0.8242
    • This is how weights and biases will be adjusted in neural network
  • Similarly, we have to calculate W2_new, W11_new, W12_new, W21_new, W22_new, b1_new, b2_new, b3_new, h1_new, h2_new 

This entire process is just one iteration in neural network, these iterations will continue for n number of time, and old values will be adjusted after every iteration. Programmatically, that's what will happen when we run x1.grad.zero_ using tensor via PyTorch.

Kindly note that, this entire process will be repeated until y^(predicted output) is very close to y(original output), not until same but we have to predict a value which is close to actual value, isn't it ?

Consider a 2D graph, representing Loss in y-axis and number of iterations in x-axis, as the number of iterations increase, loss will start decreasing and we have to iterate until loss is bare minimal. That's the expectation. Anyways, we don't do all this manually BUT THIS IS WHAT WILL HAPPEN IN NEURAL NETWORK.


How does ML model decides to stop this continuous loop ?
  • First way : Manual way is defining the epocs, lets say if epochs=50 then this loop will stop after 50 iterations
  • Second way : We have a  package called optuna, available in PyTorch, using which we can stop model automatically

BTW, a simplified way of the algorithm mentioned in this current blog is written in the below blog:
https://arunsdatasphere.blogspot.com/2026/01/ai-blog3-deep-learning-foundations.html

Please read Gradient Descent section, especially check images and hand written graphs.


That's all for this blog! Have a good day.


Thanks,
Arun Mathe
Email ID : arunkumar.mathe@gmail.com

Comments

Popular posts from this blog

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...

(AI Blog#1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space. What is Artificial Intelligence, Machine Learning & Deep Learning ? AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.  BTW, starting from this blog, it is not just a technical journal, we talk about internals here. ...

Spark Core : Understanding RDD & Partitions in Spark

Let us see how to create an RDD in Spark.   RDD (Resilient Distributed Dataset): We can create RDD in 2 ways. From Collections For small amount of data We can't use it for large amount of data From Datasets  For huge amount of data Text, CSV, JSON, PDF, image etc. When data is large we should go with Dataset approach     How to create an RDD ? Using collections val list = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(list) SC is Spark Context parallelize() method will convert input(collection in this case) into RDD Type of RDD will be based on the values assigned to collection, if we assign integers and RDD will be of type int Let's see below Scala code : # Created an RDD by providing a Collection(List) as input scala> val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23 # Printing RDD using collect() method scala> rdd.collect() res0: Array[Int] = Array(1, 2, 3, 4...