Er, the motivation is to understand if there are systematic differences?
We need more of this kind of thing, not less. If we can identify differential competence on the basis of some large group, we have a chance of being able to take compensatory measures.
The study is large enough so that at least the death at 90 days stat is pretty secure; unless they failed to compensate for prognosis at outset (they say they did), the 95% confidence interval is far enough clear from "same odds" so that this ought to be solid. (The odds ratios are better by 10-25% with female doctors, depending on what you look at; the comments make it sound like it's night and day, but the effect is moderate.)