o1, inference time, Turing Machines and stuff

Why Turing Completeness in LLMs is a red herring

Oct 16, 2024

Inference time

There is an argument that transformer-based LLMs are limited because they always perform the same amount of computation for each task. This contrasts with human thinking but isn’t entirely true. While it’s accurate that computing each token involves the same sequence of matrix multiplications, the number of tokens output isn’t always the same. Famously, the Chain of Thought (CoT) technique induces LLMs to output additional sentences and perform more computations during inference. Interestingly, the logic of these additional sentences might not be the key factor; it could simply be that giving the LLM more space for thought improves results. For instance, in the paper “Let's Think Dot by Dot: Hidden Computation in Transformer Language Models” researchers found that asking the LLM to output additional meaningless dot tokens also improved performance!

LLMs and Turing completeness

Inference time is not constant - but does that mean that LLMs are like Turing Machines? I’ve got nerd sniped with this question when someone asked if transformer-based LLMs can compute any computable function. If special prompting could enable LLMs to compute any recursive function, it could potentially solve many of our current problems because we have classical algorithms for them.

There is a paper that on the first reading suggests that this is possible: GPT is becoming a Turing machine: Here are some ways to program it. This led me to revisit the transformer-based architecture and confirm my intuition that they are finite state automata, they can output infinite strings, but they don’t read them - the next token they compute relays only on their internal state. The technique the paper proposes works only as long as it fits in the context.

Curiously many people at that twitter thread where this question was posed seemed to believe that in fact LLMs can compute any recursive function. Someone linked to: Attention is Turing Complete - but the catch? They use transformers with infinite precision - come on!

What is a good abstraction?

Now we have yet another paper in the same vein: Autoregressive Large Language Models are Computationally Universal. This paper, like “GPT is becoming a Turing machine,” postulates unbounded context. While practical implementations of transformers have finite context, our computers are also finite, yet we still consider infinite Turing Machines a good abstraction for these finite systems. This is why I don’t entirely dismiss the ideas from articles like these - but I doubt they will lead to practical solutions. The focus here is not theoretical limitations of LLMs - but rather what can be achieved in practice. In practice the length of context is limited by the available memory. But even if not - even if we consider adding additional memory when we encounter a problem that requires it - even then extending the context length seems a lot more complex operation than simply adding more tape to a Turing Machine.

It might be practical to induce LLMs to think more on harder problems - maybe with thinking tags and additional CoT threads like in o1 or with just ‘thinking dots’. However, the techniques introduced in papers evaluating LLMs as universal computers seem impractical. They appear very low level, akin to assembler, with no clear path to making them higher-level - due of memory limitations.

Next in LLM reasoning

IMHO Turing Completeness inside LLMs is a nerd-bait - sounds interesting but leads to dead ends. But there are many other paths to explore:

instead making LLMs Turing Complete and adjust inference time to the difficulty of the problem we can do a Monte Carlo Search of LLM generated solutions (looping around LLM)
prompting LLMs to generate programs that do the looping and search
integrating these two techniques into … something
working with sampling - see Entropix
eliminating semantic leaking
or doing all of the above and more

To be continued.

AI Adventures: A Programmer’s Journey

Discussion about this post