Date of Award

6-13-2024

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2024

Department

Computer Science and Engineering

First Advisor

Younghyun Cho

Abstract

This senior design project explores the application of Bayesian optimization-based auto-tuning techniques on the low-rank adaptation (LoRA) fine-tuning of large language models (LLMs), demonstrating how fine-tuning methods can reduce training times and costs, albeit with a slight trade-off in accuracy. However, little is known about the optimal hyperparameters on LoRA and its variants for those methods. This project addresses this lack of knowledge by analyzing data gathered from auto-tuning LoRA hyperparameters to determine the most optimal parameter configurations for a model’s accuracy and training efficiency.

The team has implemented a pipeline utilizing many different technologies. The main technology driving the Bayesian optimization-based auto-tuning is GPTune, a performance auto-tuner built on Gaussian Process regression. The team selected Llama3 as the base model for testing due to its high-performance capability and relevancy. These models are fine-tuned using the QLoRA framework and evaluated on their training time and loss. GPTune uses Bayesian optimization to efficiently explore the Pareto front of training time and loss for varying QLoRA parameters.

This research has contributed to LLMs by demonstrating the efficacy of Bayesian optimization-based auto-tuning techniques in fine-tuning LoRA. By applying these techniques, we have shown that it is possible to optimize hyperparameters more efficiently, reducing computational resource requirements and improving model performance. Our work provides a proof of concept for integrating advanced auto-tuning frameworks like GPTune with existing fine-tuning tools, paving the way for more accessible and cost-effective use of LLMs in various applications.

Share

COinS