(AI #1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space.

What is Artificial Intelligence, Machine Learning & Deep Learning ?

AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.

BTW, starting from this blog, it is not just a technical journal, we talk about internals here. We will see modules, programing involved, concepts and connected information to gain sounds knowledge on AI and its related topics.

Machine Learning :

A Machine Learning system is trained rather than explicitly programmed. In classical programming, we define some set of rules in program and based on the data that comes in, program executes and gives us some results. But it actually won't perform anything which is not programmed right ? we need to handle each and every possible scenario while programming, so when that particular data/activity hits that piece of code, it will execute based in the conditions we define in the program, and perform what it intends to perform. Correct ? But in Machine Learning this is not the situation! We feed huge amount of data to train the ML model and it learn what it intends to learn from that data. We will also feed some answers to it so that our ML model comes up with some results. We will see more in depth information about what we are talking about now. Please see below diagram.

Deep Learning :

Deep Learning is a subset of Machine Learning that uses neural networks with many layers(deep networks) to learn patterns from large amounts of data automatically.

Deep Learning = Machine Learning using multi layered Neural Networks.

So, ho many layers it uses ? It depends on the complexity of the input data. More complex the data, more layers to analyze the pattern. We will see more technicality in this aspect later.

Below diagram shows a 4 layered Deep Learning Neural Network(NN), to analyze the given image and comes up with output after passing through all the NN layers.

Think this deep network as a multistage information distillation process.

Structure of Neural Network :

Below diagram shows the structure of a simple single hidden layer Neural Network. Every NN will have 3 layers.

1. Input Layer (which provides the input)

2. Hidden Layer(Single or multiple based on the complexity of use case)

3. Output Layer

Circles in the Hidden layers are nothing but artificial neurons where actual processing happens. It transform input data into output after huge numbers of computations & calculations. Each hidden layer will have multiple neurons based on the complexity of the data. It will inject both Linearity & Non-Linearity to the data which we will see further in this blog.

Starting from Input Layer, each neuron is connected to every other neuron in the next layer. Also every neuron in each layer will receive input from the every other neuron from it's previous layer.

Also note that every connection has a number associated with it as shown in above diagram. These numbers are called WEIGHT's. For example weight of input X(1) to 1st neuron in hidden layer is 0.2. Each neuron has a number inside is, it is called BIAS. Both Weights and Biases are called parameters of the Neural Network.

How many neuron's should present in each layer and how many such layers should exist are controlled by Hyper parameters in NN. These are tunable based on the situation.

When we say a NN is trained, it means that a NN came to a point where they have certain Weights and Biases which has learned from the input data in each layer than can finally transformed to a meaningful output. Training also involved adjusting these Weights and Biases based on the error signals.

Try to digest below sentences carefully, read them with intensity to inject in mind(if multiple reads needed then please do, you have to remember it lifetime):

In input layer X(1), X(2), X(3) are the input parameters
At start, before first hidden layer, we will allocate some random numbers to these neuron's as weights & Biases and these will be adjusted down the length in each iteration based on the errors. We will ask program to initialize these numbers to some random numbers.
I will provide detailed example to have a clear understanding, it is a huge process hence I will write it at end of this blog or a new blog.
Lets say, for example the predicted value is 100 but the actual value is 120, so the loss is (100 - 120 = -20), now there is something called back propagation algorithm which will adjust the weight based on this loss and it will be iterated to next level of processing. I know it is still not clear, don't worry! Every single doubt will be clarified slowly. Just keep moving further.
So, all these numbers in the above picture are learned and adjusted accordingly

Neural Network Parameter calculation :

Below NN has 5 input nodes, 3 hidden layers with 5 neurons in each layer and a output layer.
No. of parameters in a layer is equal to

= (neurons in previous layer * neurons in current layer) + neurons in current layer
= number of connections + biases
= weights + biases

Total number of parameters = Sum of parameters in each layer
Ex, below NN has 69 parameters

Alright, let's see what is a Tensor.

Tensor :

A Tensor is a data container or mathematical object that represent data in N dimension
These are used for data processing(we will see in programming, I will add colab notes with proper description)
It is a structured way to store numbers for computers to process
Core properties of a Tensor is as below :

Rank (Order) : Number of dimensions
Shape : Tuple indicating size along each dimension (Matrix : 3 * 4 - 3 rows, 4 columns)
Data Type : Type of elements(float32, int etc.)

Now lets understand what are the types of Tensor's available.

1) Scalars (Rank-0 Tensor) :

It is single valued, represented by a single real or complex number.
Magnitude only - has size but no directional component
Rank 0 tensor - In tensor algebra, a scalar is a rank-0 tensor
Examples : batting average, number of goals scored etc.

2) Vectors (Rank-1 Tensor) : A vector is an ordered collection of numbers that represents both magnitude and direction in space.

It is multi valued - Represented by and ordered list of real or complex numbers
Has both magnitude and direction
Coordinate dependent - Components change with coordinate system transformation
Rank-1 tensor - In tensor algebra, vector is a rank-1 tensor
Note : 2nd image below is a 3D vector
Notation - Tuple/List : (x1, x2, x3 ....xn) OR [x1, x2, x3, ..xn]

3) Matrices (Rank-2 Tensor) : Matrices are rank-2 tensors represented by a 2 dimensional array of numbers

A collections of vectors
It has both row, column wise organization
Values changes based on choice of coordinate basis
It tensor algebra, matrices are rank-2 tensors

Types of matrices : Just refresh your cache with below information, I know you know these :)

Rectangular matrix
Square matrix
Diagonal matrix
Identity matrix
Upper triangular matrix
Lower triangular matrix

4) Rank-3 Tensor :

Cube -valued - represented by a 3 dimensional array of numbers
Multi-directional structure - has depth, height and width
Coordinate invariant - Its values doesn't change with coordinate system transformations

5) Rank 4 tensor

Image data
A batch of 128 color images could be stored in a tensor of shape (128, 256, 256, 3)

128 images
256 * 256 pixels
3 is the color depth (remember R G B, depth of red, blue, green in any image ?)

6) Rank 5 tensor

It is video data
Video data is one of the few types on real world data for which you will need rank-5 tensor
A 60 second, 144 * 256 YouTube video clip sampled at 4 frames per second would have 240 frames
A batch of 4 such video clips would be stored in a tensor of shape(4, 240, 144, 256, 3)

Tensor recap :

Processing inside a Neuron :

Below diagram shows what will happen inside a neuron
x1, x2, x3, xn are the inputs, it is a vector on 'n' number of values
Each input has a weight on neuron
Sigma is the summation function, Sigma = x1*w1 + x2*w2+x3*w3+xn*wn
Bias is added to the output of sigma, bias is another number
Activation function(f), add non-linearity to the neuron, it is applying some kind of quadratic function or exponential function
Each neuron outputs another number, similar type of processing happens in each neuron across hidden layer
Final predicted output is : Summation of all input * weights + bias + activation function applied on this result to add non-linearity

Linear transformation talks about : if X is a number and it is multiplied with any constant in a 2 dimensional space lets say (1, 2), then vector magnitude gets increased
Non linear transformation change/adjust the angle of actual linear vector

For example, lets us consider a simple formula, y = mx + b (where (m, b) is a point in 2D space), and x is a constant
Now if we multiply m with x, then only m value changes but not b, and it add non-linearity to the initial linear vector

Note : Only try to understand what is non-linearity from above example. That is good enough for now.

More examples to get some idea on non-linearity :

Linear data :

Here the relationship between the input and output is linear data
Real time examples :

House price Vs House Size
We add more space to house, price also will increase
If we draw in a line a line between Size & Price, it is linear. Isn't it ? This is called linearity or linear data.

Non linear data :

Deep Learning is meant of Non-linear data, not for Linear data. Linear data we can handle using Machine Learning as well.
Real time examples :

Let us consider : Age, Salary, Discount, Brand purchase history of a customer in a online store like Amazon.
Think that all the above variables will work independently, correct ?
Do you think one parameter is dependent on other ?
Now understand

Low Salary + High Discount ==> He might buy the product
High Salary + Low Discount ==> He might buy the product
Med Salary + Medium Discount ==> May or May not buy
This is not a straight line ? Isn't it ? Sometimes Salary is low, medium, high and similar for all other params
This is called non-linear data

Tensor Operations : The gears of neural networks

Vector operations
Matrix operations

Vector Operations (Addition):

U, V are 2 different vectors and we are adding both, it add corresponding values in each vector an create a new vector
Bias will add to this new vector (bias is also a vector which we will add to final summation vector)

Vector Operations (Scalar multiplication):

Scalar multiplication - c * v = [c*v1, c*v2, ...]
We are multiplying a scalar with a vector
This is a linear transformation
If we multiply with negative scalar, say -2 then it will traverse in reverse director because of sign '-'

Vector Operations (Dot product (Inner product)):

This is the most important operation that we want to learn
We have 2 kinds of vector multiplications

Dot Product
Cross Product

From Neural Networks perspective, we are interested in "Dot Product"

We are done with Vector operations, not lets see matrix operations.

Matrix Operations :

Please see below images for information on matrix transformation, it will be easy to understand via images instead of data

Rank of a Matrix :

Rank = No. of linearly independent rows or columns in a matrix
Rank tells you how much unique information a matrix contains

How to find rank, in Python way :

import numpy as np

A = np.array([[1,2],[2,4]])

print(np.linalg.matrix_rank(A))

Tensor reshaping :

Reshaping a tensor means rearranging its rows and columns to match a target shape.
Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor.

Non Linear Activations :

Please verify below images for limitations on Linear transformations and the need of non-linear transformations
ReLu is a famous non-linear transformation which we use in Deep Learning and Neural Networks
Read below activation function formulas from below images and memorize

Sigmoid
Softmax

We are done with basics of Neural Networks and Deep Learning.

I will be adding the colab notes for tensor programming showcasing some important operations that we can perform using PyTorch and also come up with one example of Neural Network with some raw data in next few days.

Thanks,

Arun Mathe

Email ID : arunkumar.mathe@gmail.com

DataSphere

Search This Blog

(AI #1) Deep Learning and Neural Networks

Labels

Comments

Post a Comment

Popular posts from this blog

(AI #3) Deep Learning Foundations - Activation & Loss Functions, Gradient Descent algorithms & Optimization techniques

Spark Core : Understanding RDD & Partitions in Spark