Optimizing the Fine-tuning Process of Large Language Models
Abstract
We present an optimized fine-tuning process for large language models (LLMs) that combines Low-Rank Adaptation (LoRA) and Quantization. Traditional full fine-tuning methods are computationally expensive, requiring significant GPU memory, which limits their accessibility. In our approach, we first quantize the LLaMA-2 7B model and then apply LoRA fine-tuning to that quantized model. We demonstrate that the combination of quantization and LoRA significantly reduces GPU memory requirements while maintaining model performance. Through rigorous experiments, we successfully fine-tuned the 7B LLaMA-2 model using the CodeAlpaca-20k dataset with only 10.8 GB of GPU memory, compared to the 112 GB required by traditional methods. We further developed an inference system using this optimized fine-tuned model for practical deployment.
Downloads
Published
How to Cite
Issue
Section
License
©2025 Jahangirnagar University Journal of Electronics and Computer Science. All rights reserved. However, permission is granted to quote from any article of the journal, to photocopy any part or full of an article for education and/or research purpose to individuals, institutions, and libraries with an appropriate citation in the reference and/or custcomary acknowledgement of the journal.