No, it certainly isn't useful computationally.

1 min readApr 10, 2023

No, it certainly isn't useful computationally. That's kind of my point. Given the universal approximation theorem, we have no a priori reason to believe that similarity of outputs tells us anything profound about the similarity of computation. All we can tell for certain is that the outputs lie in the correct range.

If we were wondering whether something even could be computed, then that we can compute it is a very convincing demonstration. Otherwise, although if you squint hard at a GPT transformer module you might be able to see some vague shadows of some transformations that happen in a cortical column and between cortical regions, it's far from clear that the nature of the computation is close enough to be instructive.

In some ways it doesn't matter--7+7+7+7 and 7*4 and exp(log(7)+log(4)) all represent the same transformation. But I'm very wary of overinterpreting anything about the internals of LLMs with respect to human cognition. I agree that at least bidirectional inspiration between ML and neuroscience has been and will continue to be very important, and that model-based methods for understanding circuit function are likely to be necessary. I just think that the danger of overinterpretation is substantial.

(Also, I didn't call Terry's book "old" in any important sense. It takes the reader right up to the present-day of 2018. It's just that it was released prior to a lot of interesting developments in LLMs, so it obviously couldn't be expected to cover the details.)

Written by Rex Kerr

Responses (1)