Optimizing the Fine-tuning Process of Large Language Models

Authors

  • Mahbub Islam Mahim Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342
  • Dr. Jugal Krishna Das Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka-1342

Abstract

We present an optimized fine-tuning process for large language models (LLMs) that combines Low-Rank Adaptation (LoRA) and Quantization. Traditional full fine-tuning methods are computationally expensive, requiring significant GPU memory, which limits their accessibility. In our approach, we first quantize the LLaMA-2 7B model and then apply LoRA fine-tuning to that quantized model. We demonstrate that the combination of quantization and LoRA significantly reduces GPU memory requirements while maintaining model performance. Through rigorous experiments, we successfully fine-tuned the 7B LLaMA-2 model using the CodeAlpaca-20k dataset with only 10.8 GB of GPU memory, compared to the 112 GB required by traditional methods. We further developed an inference system using this optimized fine-tuned model for practical deployment.

Downloads

Published

2025-06-20

How to Cite

Mahim, M. I., & Dr. Jugal Krishna Das. (2025). Optimizing the Fine-tuning Process of Large Language Models. Jahangirnagar University Journal of Electronics and Computer Science, 16. Retrieved from https://ecs.ju-journal.org/jujecs/article/view/38

Issue

Section

Articles