Rex Kerr
2 min readJul 17, 2022

--

Huh! This would be an interesting thing to research.

I guess the simplest hypothesis is that everyone has ears, but the task is sometimes a tricky one, so performance is likely to be limited by experience.

In a white majority society, you would expect everyone to have a good deal of experience with what the voices of white people sound like. In a society where people in practice tend to associate more with people of their racial category than with others--certainly true on average in the U.S.--you would furthermore expect that each racial group would have a good deal of experience with their own racial group.

So you'd expect that black people would on average be best positioned to distinguish black from white (but maybe not Hispanic or Asian) and white people would on average be worst positioned to distinguish anyone from anyone.

If this is true, you have a pretty strong prediction not only for who can distinguish which contrasts well, but also that people from highly diverse communities will all score more closely to each other.

I don't suppose it would be possible to get data on that.

There's also the question of why the determination can be made. One possibility is that it's some physical difference, in which case reliability should vary with, in the black vs white case, skin tone. Light-skinned people would not only look whiter, they'd sound whiter (on average--without knowing how many genetic variations were responsible for the differences, we wouldn't know how smoothly the trait would blend). Alternatively, if it's subcultural micro-accents, then you'd expect that people from diverse and well-integrated regions would be hard for everyone to tell apart, whereas black people might have an easy time telling who was socially considered as black (regardless of skin tone) in communities with strong racial segregation (and white people would have an easier time on that group too, but still would presumably fare poorly because of the lack of experience). If regional dialect differences were too large, though, everyone would be in the same boat (because nobody would really have enough of the relevant experience with the regional dialects, assuming you recorded and tested people from all regions).

If we could find a way to get data on the first thing, I think we'd likely have a good hypothesis for this one too.

But how to get the data?

I don't know of any existing speech dataset that would have enough metadata for this to work. Common Voice is closest, but doesn't ask about race or geographical location, so it wouldn't be suitable.

So maybe for now, absent a research grant, we just have to wonder about it.

For the phenomenon itself, it has been researched (albeit somewhat sparsely over a period of decades, with a lot of work in the late 70s/early 80s). For instance, people can tell black vs white at better than chance levels just from hearing the sound /a/ (https://pubs.asha.org/doi/abs/10.1044/jshr.3704.738), and both computers and humans can distinguish Asian and white Brits at better than 90% rates (https://app.dimensions.ai/details/publication/pub.1053179963?or_subset_publication_citations=pub.1045317898).

--

--

Rex Kerr
Rex Kerr

Written by Rex Kerr

One who rejoices when everything is made as simple as possible, but no simpler. Sayer of things that may be wrong, but not so bad that they're not even wrong.

Responses (1)