This is a really good analysis!

5 min readAug 11, 2022

This is a really good analysis! Very refreshing after the typical fare of completely unsupported dire-sounding statements countered by shrill invective also devoid of evidence. Gathering evidence is hard work, but extremely important if you want to be informed!

There are a few additional points that I think are worth reflecting on.

The first is that you cite meta-analyses of studies many of which are flawed, and many of which are probably flawed in the same direction (there especially is a lot of selection bias). This means that we shouldn't take the precise numbers too seriously: meta-analyses only work to increase accuracy if the different studies have different types of bias, but because of methodological constraints (e.g. you have to find people to ask them things, and they don't have to reply even if you find them) a lot of the biases are likely to be shared across studies. So, for instance, you report the 1% regret number--which is fair to do, as that is our best estimate--but we should interpret the number as "it's really low" not "it's about 1%". (Also, comparing to regret rates for other surgery doesn't make much sense due to cognitive effects e.g. that lie behind the "sunk cost fallacy" that impact to what extent people experience regret; the more important thing is that the absolute number is quite low as if it were high it would be clear that something was badly awry.)

The second is that although you do report both effectiveness for treating gender dysphoria and regret about surgery separately, which is excellent, I think it's important to understand how the two are related to each other in terms of medical policy. In general, medical treatments are to improve people's conditions not to make them say they are happy with their care. This is a classic problem with, for instance, determining quality of care: when you ask people if they are satisfied with their doctor, the answer depends mostly on personability, not quality of care (these are old studies from the 80s). This difference leads to two types of problems. First, personable doctors can actually get away with delivering quite shoddy care but those doctors who do deliver poor care should be corrected because we want good outcomes, not just people who feel satisfied with their poor situation. Second, throwing away the importance of personability and focusing solely on outcome metrics would be a mistake, because it is also important that people feel that they are being treated well (e.g. it helps compliance with treatment plans and helps rates of people seeking necessary care in the first place). So, on the issue of gender reassignment surgery, it's the "we actually improve gender dysphoria" result is the one that tells us that this should be covered as standard medical care. The "regret is low" result tells us that existing procedures and safeguards are reasonably adequate (though the one case of regret explained as "forced by partner" is rather horrifying).

The third issue is that we can't look at studies like these and assume that we're done. GRS has been really hard to obtain historically (when most studies were done). This means that there has been an incredibly strong selection for people who really really want GRS. The existing studies suggest that more people would benefit from GRS than have been able to have it so far...but, as we expand our capacity to perform the surgery, necessarily the selection for those people who are most desperate for it will relax (indeed, this is the entire point of making it easier to get!). But at the same time, once you start selecting less stringently, the effectiveness and/or regret levels are likely to change. We need to keep monitoring, keep studying, so that as we make care more available we can know if in addition to treating the people who really benefit, we're also starting to treating a sub-population that doesn't actually benefit but who couldn't get care before because they lacked the necessary commitment. Maybe no such population exists, but we should set ourselves up to be able to detect it. (Wiepjes et al. (2018) is, admirably, exactly this sort of study, and shows that at least if you have the kind of health care you get in the Netherlands (and the social attitudes there), raising the rates of GRS from 0.0009% to 0.026% retains high levels of satisfaction...but this doesn't tell us that this will keep being true if we go to 0.3% or more.)

The fourth issue is that the amount of long-term followup is really minimal in most places, and when it's done, the studies tend to have really strong selection bias. (E.g. "let's find people who happily identify as trans, and ask them if they are happy with being trans".) We really need to get more serious about long-term followup--this is an issue with medical care more generally, not just GRS. Just as an example of how this can affect things, if you select people who identify as trans and ask about detransitioning, you get low rates, and most of those who said they did detransition to some extent say that they did so because of societal pressures (>80%) and not that it was wrong for them (<20%). But if you select people who identify as having detransitioned and ask, they report that transition was wrong for them (>60% IIRC) and that it wasn't societal pressure (<30% IIRC). I can dig out the studies if you're interested, but the point is that we need to get way more serious about following individual people over time so we can get a better idea of the rates of different individual experience. It's kind of shocking that by far our best data comes from the Netherlands of all places, with a population of only 17 million people.

The fifth issue that meta-analyses obscure is that not all gender-affirming care is equal (at least if you only report the bottom-line numbers from the meta-analysis). Whatever they're doing in the Netherlands seems to work really well: Wiepjes et al. (2018) is a huge and well-designed study and documents very low rates of regret (~0.6% for transition to trans womanhood). Imbimbo et al (2009) is considerably smaller but not tiny and also documents low rates, but much higher rates (~6%)--significantly higher rates (p < 10^-9)--than Wiepjes. This suggests that we ought not look at these results and say: regret low, we're good, done. Rather, it says that either populations matter or details matter. We should figure out the causes of the differing results so we can get the best outcomes possible, not just average them all together and call it good.

Anyway--kudos for actually digging into the research here. I dearly wish people did this more often! If they did, I think three highly valuable outcomes would obtain: (1) there would be more discussions based on facts instead of invective-laden shouting matches and we might be able to make progress in establishing a consensus view instead of splitting into polarized subgroups; (2) the value of careful research would be highlighted, increasing the chance that we can get funding for improved research which is literally the only way we can get reliable information about a complex issue that is of critical importance to the health of a non-negligible number of people; and (3) it would help refocus attention on the issue of gender dysphoria, which is serious, and for which long-term treatment is still far from perfect even if GRS does help.

Written by Rex Kerr

No responses yet