Rex Kerr
2 min readJun 3, 2023

--

We talk about our model of the world all the time. The world-model is therefore embedded in the correlational structure of language. GPT-4 is computationally sophisticated enough to pull this out.

I don't understand why this is hard to understand. It can't possibly be any other way. Of course we use language in a way that reflects the operation of the world! Of course at some level of sophistication we'll reveal this with a decoder model! That GPT-3 and Bard partly and GPT-4 quite convincingly manage to do this is surprising only in that "oh, we got that far".

(Honestly, multi-digit multiplication is far more surprising to me.)

Because we talk about everything, the world model should cover everything. But because it's implicit in the structure of language, it's not very efficient, and because it's language-based, it's vulnerable to linguistic distractors that a person wouldn't be (unless they were a confused student trying to do their own linguistic computations--"this word says radius and r means radius so I'll plug the number in here and hope for the best"--rather than relying on their own world model). For instance, when talking about a variety of different things that have radii (e.g. an orbiting rotating cylinder), I've repeatedly gotten GPT-4 to mess up.

And that's exactly what you'd expect with a world model implicit in the structure of language: if your attention modules don't get aligned just so, the computation is performed on the wrong bits of language and the result is nonsense.

But otherwise, because you talk about the world, your language outputs are only sensible if what you say obeys the world-model implicit in the structure of (human-generated) language.

--

--

Rex Kerr
Rex Kerr

Written by Rex Kerr

One who rejoices when everything is made as simple as possible, but no simpler. Sayer of things that may be wrong, but not so bad that they're not even wrong.

Responses (2)