I can see how this is an appealing idea, but I don't think it's correct at all.

3 min readSep 24, 2023

The reason is Turing Equivalence. Every sufficiently flexible computational scheme actually has the same theoretical computational power. Some are a lot more efficient than others, but that's all. As a practical matter, one is really making a statement about not-too-crazily-bad-efficiency. If something takes exponential time, for a large exponent, while something else takes linear time (for example, trying to calculate exponents themselves via addition rather than multiplication or exponentiation), it's probably out of reach as a practical matter. But enormously much is within reach, and usually you don't have O(n) turn into O(e^n).

This means that we should not be surprised if you have a lot of outputs from one function, if you throw enough computation at it and can figure out how, you should be able to get pretty close with a radically different architecture. There may be some similarity (because of regularities in the world, for instance) between representations at certain levels of abstraction (and yes, the purpose of all the layers is in a sense to allow "abstraction" albeit of a different sort than we use consciously), but even that doesn't mean that they are used the same way. And neural networks are universal function approximators (this was one of the earliest theoretical results regarding multilayer perceptrons, precursors to most of the architecture in today's deep learning systems).

So we have to take very seriously the idea that their spooky accuracy is because we have built a universal function approximator for an incredibly complicated function--a function well beyond our own capacity to evaluate explicitly (even if we implement it with our brains). We finally have enough outputs, and have tuned the degree of generalization, for appropriately-constructed models to have approximation within the range of computation that we are willing to throw at the problem. (We're willing to throw a lot.)

If you ask: does the architecture of the human brain (especially cortical areas involved with different aspects of linguistics) look like transformer architecture, the answer is: not really! Do transformer operations model how receptive fields and outputs vary between cortical layers? Nope! Connectivity patterns similar? Nope! There are some vague hints of possibly similar operations from various parts of the brain (e.g. attention looks suspiciously like activity patterns in parts of the hippocampus--but not other parts!), but LLM architecture is very different overall from brain architecture, so there's no real reason to expect that structurally it should come up with the same function in the same way.

And because we're not aware of the mechanics of the computation of our cognition, all we really have to go on is the similarity of the output. If LLMs made human-like mistakes, when they make mistakes, that would be a bit of additional evidence that there is some "model of cognition" beyond just a "function approximation of language output". But when LLMs get confused, they sound, at best, like a student who hasn't paid attention, remembers a few key words, and is randomly stringing together stuff they've heard in a vain attempt to not fail their quiz. If you try to use psychological priming techniques, they mostly don't work. (I've tried.)

So the failure modes don't match, even though when there isn't a failure it matches pretty well. Basically what you'd expect from good function approximation without actually capturing the underlying cognition.

Instead, it is more conservative to simply view LLMs as being able to do computations in linguistic space. This is really amazing--it's an immensely high-dimensional space and lies on an incredibly complex manifold in a much much higher-dimensional space of possible language outputs. But one should be very skeptical of the idea that it models human cognition beyond exactly what it's been trained to do: it will tell you where humans constrain their language output (by producing points on that manifold that are in-bounds and reasonably-probable extensions of lower-content points), but how it does it is largely quite different. The only exception is that when we are extremely explicit about the cognitive steps we're taking and write them down, there is a direct parallel between our cognition and our language, so computing on language mirrors the cognition. But it's not because the model does that in general; it's because we already encoded cognition in language.

They're absolutely fantastic, but it is doubtful that they're a useful model of human cognition in any deep sense.

Written by Rex Kerr

Responses (1)