Inference time
There is an argument that transformer-based LLMs are limited because they always perform the same amount of computation for each task. This contrasts with human thinking but isn’t entirely true. While it’s accurate that computing each token involves the same sequence of matrix multiplications, the number of tokens output isn’t always the same. Famously, the Chain of Thought (CoT) technique induces LLMs to output additional sentences and perform more computations during inference. Interestingly, the logic of these additional sentences might not be the key factor; it could simply be that giving the LLM more space for thought improves results. For instance, in the paper “Let's Think Dot by Dot: Hidden Computation in Transformer Language Models” researchers found that asking the LLM to output additional meaningless dot tokens also improved performance!
LLMs and Turing completeness
This topic came to mind after the release of o1, as many are curious about its architecture and how it encourages the model to think more deeply about challenging questions. When someone asked if transformer-based LLMs can compute any computable function, I got nerd sniped. If special prompting could enable LLMs to compute any recursive function, it could potentially solve many of our current problems. There is a paper that on the first reading suggests that this is possible: GPT is becoming a Turing machine: Here are some ways to program it. This led me to revisit the transformer-based architecture and confirm my intuition that they are finite state automata, they can output infinite strings, but they don’t read them - the next token they compute relays only on their internal state. The technique the paper proposes works only as long as it fits in the context.
Curiously many people at that twitter thread where this question was posed seemed to believe that in fact LLMs can compute any recursive function. Someone linked to: Attention is Turing Complete - but the catch? They use transformers with infinite precision - come on!
What is a good abstraction?
Now we have yet another paper in the same vein: Autoregressive Large Language Models are Computationally Universal. This paper, like “GPT is becoming a Turing machine,” postulates unbounded context. While practical implementations of transformers have finite context, our computers are also finite, yet we still consider infinite Turing Machines a good abstraction for these finite systems. This is why I don’t entirely dismiss the ideas from articles like these - but I doubt they will lead to practical solutions. The focus here is not theoretical limitations of LLMs - but rather what can be achieved in practice. In practice the length of context is limited by the available memory. But even if not - even if we consider adding additional memory when we encounter a problem that requires it - even then extending the context length seems a lot more complex operation than simply adding more tape to a Turing Machine.
It might be practical to induce LLMs to think more on harder problems - maybe with thinking tags and additional CoT threads like in o1 or with just ‘thinking dots’. However, the techniques introduced in papers evaluating LLMs as universal computers seem impractical. They appear very low level, akin to assembler, with no clear path to making them higher-level - due of memory limitations.
Next in LLM reasoning
IMHO Turing Completeness inside LLMs is a nerd-bait - sounds interesting but leads to dead ends. But there are many other paths to explore:
instead making LLMs Turing Complete and adjust inference time to the difficulty of the problem we can do a Monte Carlo Search of LLM generated solutions (looping around LLM)
prompting LLMs to generate programs that do the looping and search
integrating these two techniques into … something
working with sampling - see Entropix
eliminating semantic leaking
or doing all of the above and more
To be continued.