Selection bias in Large Language Models has emerged as a fundamental obstacle to reliability, fairness, and robustness. Defined operationally as systematic decision changes under equivalence-preserving input perturbations, including option permutation, label renaming, candidate-order swapping, and evidence relocation, the phenomenon is examined across four representative task families: multiple-choice question answering, in-context classification, LLM-as-a-Judge evaluation, and long-context or retrieval-augmented generation. Selection bias is first analyzed through a causal chain that links biased behavior to training-data priors, architectural asymmetries, and post-training amplification. Existing mitigation methods are then synthesized through an intervention-level taxonomy spanning inference-time calibration and prompt optimization, architecture-level modification, and training-level debiasing. The evaluation landscape is unified by summarizing commonly used metrics, benchmark families, and application settings, with the lack of standardized and cross-task-comparable protocols identified as a central bottleneck. Selection bias is best understood as a failure of invariance under non-semantic reformatting, and mitigating it is essential for trustworthy, robust, and selection-invariant language models.