Reasoning machines

Facts and LLMs

Jul 30, 2023

LLMs can recall a lot of facts. You can ask them about the capital of France or what is the role of Interleukine-16 and they seem to come with the right answers. But there are problems with treating them as omniscient:

They might hallucinate. The linked Wikipedia article lists many examples, some with serious consequences for the users who trusted them. For me it has improved much with the recent model updates but I can still easily trick GPT-4 into hallucinating when asking it in Polish (curiously it correctly identifies the collection that this ballad was part of - but then it goes on completely inventing the story) even though asking the same question in English in most cases results in it admitting that it does not know the subject (by the way the poem does exist: LILIJE BALLADA).
They cannot know anything that was not in their training set - and in particular, information from:
- texts written after their training cut-off date
- private sources
- obscure and low-quality sources that did not get into their training data

You can fight hallucinations by verifying with a second language model (assuming they don’t hallucinate in the same way - I don’t know how common that could be) or by doing web searches. But for if the information is missing in the training set - then the LLM can only reason about it if we bring that info to the model - by using Retrieval Augmented Generation (RAG). By the way it is tempting to think that you could use finetuning to reach a similar goal - but this is not the case. RAG can also mitigate hallucinations (citation needed).

The currently fashionable way to design the Retrieve part of RAG is to use vector databases and embeddings - but it is not the only way. For me the most intriguing idea is using the LLM to find out what background information it needs and request it from a retrieval system using the ReAct technique.

A software engineering perspective suggests dividing AI into three components: learning, information storage and reasoning. Currently, learning is a separate process, but large language models perform both information storage and reasoning from it. However, LLMs have limited capacity for information storage, so we supplement them with specialized systems and create RAG systems. This is a suboptimal design with overlapping responsibilities. What happens if we feed the LLM facts that contradict what it learned during training? It would be much cleaner to provide all the necessary facts to a language model and let it focus on reasoning only. By assigning only one responsibility to LLMs, we would follow the Single-responsibility principle and make it easier to update, evaluate, and analyze the limitations of both the reasoning and the information storage components.

LLMs without storage of facts would be be pure reasoning machines.

By the way - two interesting papers on limitations of transformer based LLMs: Faith and Fate: Limits of Transformers on Compositionality, Neural Networks and the Chomsky Hierarchy.

AI Adventures: A Programmer’s Journey

Discussion about this post

Ready for more?