If they're deep in an ideological echo chamber, what would possess them to create a test like "racial accuracy of humans in historical scenes" that they test every time?
You don't even think of that stuff if you're in that echo chamber. Baseline accuracy/truthiness should take care of itself. You only fix the warts. "Are we exacerbating historical patterns of injustice," sure. They probably have tests where the prompt includes "poverty and hunger" and check to make sure it doesn't only return images of famine with people who visually look like sub-Saharan Africans. Anything that came up in some paper or article that criticizes AI for being unjust probably got converted into a few human-scored tests that they do run.
However, anything that has to be human-scored for quality is going to be part of a pretty limited set. Human labor is expensive. But for image generation assessment, as far as I know, and as far as any of the LLMs I have access to know, there aren't great automated methods for the kind of gestalt-level quality that would catch things like "unrealistic unasked-for historical depiction". Keep in mind that they *are* used for style transfer type things all the time, so if you say, "Please show me the Founding Fathers in an alternate history where Native Americans established a United States of Europe with Mayan-era technology," it should generate something appropriate.
This prompt will probably get you a decent discussion in any decent LLM: "What kind of tests sets or scoring criteria are there to evaluate the output of image generation AIs? Are there standard test suites that you could run to check for regressions after a new model has been trained? Or does that kind of software engineering methodology not apply when creating AI image generators?"
Anyway, I'm sure now all the image generation AI companies who didn't already have a small historical accuracy test suite where they score racial accuracy will have one now, and are counting their blessings that Google was the one who got burned, not them.
And then they'll all go back to wondering if there's any way they can get the number of fingers right.