This certainly warrants skepticism, not in the least because you can find people who are supposedly experts saying mutually contradictory things about it!
I suppose whether or not what LLMs can already do has any claim to fall under the "understanding" label (and/or "semantics") eventually boils down to a choice of what we want to call "understanding"
Without going into detail, it is in applying both p-zombies and the Chinese Room to this question that I find that they're essentially the same. In original form, yes, they're making rather different points.
I am predisposed to use a experimentalist-centric view of understanding. If I were to go into a lab and look for correlates of understanding, and I found neural activity that was observably different when a human or animal was trained on a task where they were presented with a complex context-dependent stimulus and then governed their behavior according to the appropriate complex rules, vs alternate neural activity where the subject got the same stimulus but failed to govern its behavior, and then by stimulating those neurons I could provoke the same type of complex behavior without the need to deliver the stimulus, I would think I was getting at the mechanism of "understanding". If it was in a human and they reported, "Oh, I just realized that...", that would be even better, but I don't think that's necessary to call this a model system for understanding.
We could call it something other than "understanding", but I wouldn't see the point. Whether I had the core of the understanding mechanism or not would be debatable, but it's at least mechanistically intertwined with that process that we identify at a high level as "understanding".
It's not pure behaviorism because it's not just checking that the output looks superficially like we expect, but that the mechanism is also running in the way it (almost surely) must if it is the thing we're hoping to study. That is an extremely powerful additional constraint.
But we already have that additional constraint with LLMs. (C.f. Anthropic paper I linked a couple messages ago.)
More than anything else, this makes me hesitate to judge how applicable the label is. Of course with humans we know that understanding-as-we-mean-it is there to find, and because of evolution we expect that it is likely that we'll find it in animals too. That helps a lot with knowing which label to (at least tentatively) apply. In contrast, with AI, we know the underlying substrate is really different.
Most likely, I think, is that we'll find that there are essential similarities and essential differences, even with current architectures, and eventually we'll develop language that helps us more precisely express those similarities and differences.
But how we choose our words then will depend on the degree of similarity and difference, and I just can't see how at this point we can get much closer than, "Wow, that's remarkable; I really hope we understand this better soon."