Why aren't the tokens selected with probabilities that follow their relative predicted likelihood? The predictions are noisy so you might do better if you don't follow them exactly, but the entire top-k to top-p to min-p is all just making simple steps in the oh-wait-it-was-dumb-to-throw-away-our-estimates direction.