Skip to main content

(AI #22) LLM Fine-Tuning Techniques

LLM finetuning is all about taking a pre-trained LLM and training it further on your own domain/task-specific data so it becomes specialized for your use case. 


In fine tuning, model might haven't seen this type of complex data during pre-training and we would like to tune the model with this complex data. Here we are not talking about changing the model, we are going to use same model where we have less accuracy but we will train it further with this complex set of data to make the model perfect.

Generally prompt engineering rely on the current knowledge of LLM, and we will use RAG to get more accuracy in our project specific data. If problem is beyond this, then we end up fine-tuning model itself. Drug discovery companies, Oil and Gas etc. domains use model fine-tuning as data is very rare

What is model fine-tuning is still not enough ? Then we have obviously go for developing a new ML model.

Sometime, for some use cases - a combination of fine-tuning model + prompt engineering + RAG + Agentic AI orchestration will be helpful to achieve expected accuracy. 


Fine Tune Decision Framework :


Please observe above example carefully to understand when to go for fine-tuning.


Till now, we are talking about what is fine-tuning and when to go for it. Let us deep dive into it.


Understand what happens when we update model weights :

Models weights shifts from general knowledge towards task expertise


How it work ?

  • Feed instruction into the model (feed some data set)
  • Model generates a response (token by token)
  • Compare generated tokens with desired response.
  • Compute cross-entropy loss and backpropagate
  • Update weights to make the desired response more likely


Fine-Tuning Techniques :

PEFT - Parameter Efficient Fine Tuning - How many parameters do we actually need to update ?

  • Instead of changing everything, we will change certain things in the model to get expected accuracy. This is the basis for PEFT.
  • Need small GPU's to fine tuning at this basic level in our local laptop


As we can see in the above image, full fine tuning may not be feasible for small and mid range companies all the time. If you can observe above image, even for a small model with 7B parameters, we need 112 GB GPU memory, then just think about latest Claude models with trillions of parameters. Hence we ended up with below techniques.


LoRA: Low-Rank Adaption

The Key Insight - Weight updates during fine-tuning are LOW-RANK. We can decompose them into smaller matrices.  

LoRA is a parameter efficient fine-tuning technique that freezes the base LLM weightsand trains only LOW-RANK adapter matrices to reduce training cost and GPU memory.


QLoRA:  Quantization + LoRA

QLoRA extends LoRA by combining low rank adapters with quantized model weights, enabling efficient fine tuning of LLMs on LOW memory GPUs.


Thank you for reading this blog !

Arun Mathe

Comments

Popular posts from this blog

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...

(AI #1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space. What is Artificial Intelligence, Machine Learning & Deep Learning ? AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.  BTW, starting from this blog, it is not just a technical journal, we talk about internals here. ...

Spark Core : Understanding RDD & Partitions in Spark

Let us see how to create an RDD in Spark.   RDD (Resilient Distributed Dataset): We can create RDD in 2 ways. From Collections For small amount of data We can't use it for large amount of data From Datasets  For huge amount of data Text, CSV, JSON, PDF, image etc. When data is large we should go with Dataset approach     How to create an RDD ? Using collections val list = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(list) SC is Spark Context parallelize() method will convert input(collection in this case) into RDD Type of RDD will be based on the values assigned to collection, if we assign integers and RDD will be of type int Let's see below Scala code : # Created an RDD by providing a Collection(List) as input scala> val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23 # Printing RDD using collect() method scala> rdd.collect() res0: Array[Int] = Array(1, 2, 3, 4...