This is fantastic!
The one thing I would add is that you seem to be doing the standard meta-analysis-type thing of trying to average across studies to get a "true rate" of regret.
But I think the premise that there is a fixed underlying rate is flawed. Surely it is dependent on all sorts of factors that may vary, like the quality of the surgery (which has generally been getting better over time), the stringency of pre-surgery or pre-hormonal screening (which has generally been getting less stringent but better-sculpted...I don't know what to think about this except for the Amsterdam clinic which seems to have maintained equal quality control), reduced stigma of being trans and different degrees of stigma in different locations which ought to result in people with more or less severe gender dysphoria wishing to have gender-affirming care (which could have an impact on detransition rates), and so on.
Thus, when I see both McCallion and De Castro, my tentative conclusion is that the apparent p < 10^-15 significance is real, that there are genuinely different rates of detransition, and averaging isn't really helpful unless you're assigning people at random to gender identity clinics.
(And, likewise, I also think the default assumption should be that the GnRHa vs GAH difference in McCallion is a real effect.)
(Also, unless you have reason to believe that being "lost to follow-up" is uncorrelated to detransitioning, it's not fair to just exclude those "lost to follow-up". Arguably emigration is uncorrelated, but unless the other 51 left before any treatment was given, we ought to consider the full range of possibilities. Thus, Butler, for instance, would more fairly be given a range of (58/1108, 109/1108) = (5.2%, 9.8%).)
It's a rather fiddly technical point, but important, I think, in the context of the claim of inadequate psychological screening and/or other safeguards.
If one wanted to argue that screening/selection is adequate, one would need to (1) bound the detransition rate given observed practices in clinics that have been studied, (2) argue that practices in clinic(s) under scrutiny are equal to or better than that, and (3) argue that this rate is acceptable.
Otherwise, one is left with the rather less useful but still helpful "gosh, it's pretty darned low, maybe a percent or so with surgery and a bit more than that with hormones".
Anyway, this is really really good work! Better than most of the meta-analyses I've seen.