<aside> š” This survey is in progress. Please feel free to let me know if you have any suggestions.
</aside>
Fine-tuning is a necessary process for the large language model (LLM) to achieve good performance on downstream tasks. However, fine-tuning all the parameters of LLM is energy consuming. Therefore, many parameter-efficient fine-tuning (PEFT) methods are proposed, e.g., Adapter Tuning [1] and Low-Rank Adaptation (LoRA) [2].
Compared with previous PEFT methods, the fashion of LoRA is becoming more and more popular, because it is light-weight for training and adds no overhead at inference. Today, we are going to introduce several research works in this direction.
The LoRA method is based on an assumption that the change of parameters during fine-tuning has a low āintrinsic rankā property. Thus, we can represent the fine-tuned parameters as follows
$$ W = W_0 + \Delta W = W_0 + BA $$
where $W$ is the parameters after fine-tuning, $W_0$ is the parameters before fine-tuning, $\Delta W = BA$ is the low-rank de-composition for the change of parameter. In the low-rank decomposition, the $B \in \mathbb{R}^{d\times r}$ and $A \in \mathbb{R}^{r\times d}$ , where $d$ is the hidden size and $r << d$. During fine-tuning, the $W_0$ is frozen. The illustration of this process is shown in the following figure:

As shown in the figure, $B$ and $A$ are initialized by zero and Gaussian distribution, respectively. This is to make sure that $\Delta W$ starts from zero.
After training parameters of LoRA, they can directly add the learned low-rank matrix $\Delta W$ to the original LLM. Thus, the inference of LoRA model is identical to the original LLM.