DataSphere

(AI Blog#11) LLM - Coding Attention Mechanism

Attention Mechanism is the heart of LLM. Around 70% of transformer architecture is about Attention Mechanism. Four categories of Attention Mechanism : Simplified Attention Mechanism Attention Mechanism with trainable weights Causal Attention Mechanism Multi Head Attention Mechanism Note : If we will be familiar with above attention mechanisms, then it is not just about GPT model, we will be familiar with other LLM architectures like Deep Seek R1, R2 etc. It is extremely important to have commanding knowledge on one LLM framework(in our case GPT), to understand the changes in other models. Background about Attention Mechanism : Before discussing about Attention Mechanism , first lets discuss what are the problems for implementing this concept. RNN (Recurrent Neural Network) - We have discussed about RNN, it introduces very important concept called MEMORY , which is the hidden state in a RNN. It will maintain previous context information. Each RNN cell receive an input...

DataSphere

Search This Blog

Posts

(AI Blog#11) LLM - Coding Attention Mechanism