Why You Need RAG, Not Finetuning for introducing new facts to an LLM

Using finetuning is often an XY problem

Jul 17, 2023

To be helpful in a real business AI needs to learn new information, just like a human would. The standard way to “teach” a machine learning system new facts is to train it - but training a completely new LLM is prohibitively expensive - so everybody is using the few foundation models already trained by the couple of companies. Then people learn that OpenAI lets people finetune their model.
The definition of finetuning sounds like exactly what we need:

In deep learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data.

https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)

But this is a trap.

Finetuning is useful for some things:

Fine-tuning lets you get more out of the models available through the API by providing:
Higher quality results than prompt design
Ability to train on more examples than can fit in a prompt
Token savings due to shorter prompts
Lower latency requests

https://platform.openai.com/docs/guides/fine-tuning

But there is not “teach the model new facts” among them. The main idea of finetuning is that it uses orders of magnitude fewer examples than the main training and the added information will not weigh enough. (I cannot find the source now - but I have seen somewhere that finetuning with new facts may even cause the model to hallucinate more - because it becomes less sure about the ground truths).

The way to make an LLM use new information when answering questions is to inject it into the prompt together with the question itself. This sounds a bit paradoxical - as if we were giving the LLM the answer that we wanted to get from it. But it is not - we only give it the background information and we request an answer from it based on that info.

This is called Retrieval Augmented Generation - RAG in short. It is a system built from two parts - the Retrieval part that finds all the relevant information and the LLM part generating the answer. The most popular technique used for the Retrieval part is using embeddings and vector databases - but it is important to understand that it is just a search system. There are many ways to build these search systems. For me the most intriguing involve using the LLM itself to come with the search terms.

Using finetuning for teaching new facts is an example of an XY problem.

By the way there are some competing ways to call the technique used in RAG systems: in context querying and grounding. I really like grounding - it is short, intuitive and less intimidating than Retrieval Augmented Generation.

Update: There is a much more detailed article about when you need finetuning and when you need RAG at: Fine tuning is for form, not facts

AI Adventures: A Programmer’s Journey

Discussion about this post