Early 2026 has witnessed significant volatility in the oil market, and an energy crisis is expected in the coming months. Large-scale LLM inference continues to consume substantial power in data centers and improving inference efficiency is therefore increasingly important for energy sustainability of AI economy. Semi-structured N:M sparsity, most notably 2:4 (50%) and potentially 1:4 (75%), offers a hardware-friendly path to lower compute and energy, and has been supported in modern GPU designs. Yet existing training methods for 2:4 sparsity (e.g., STE-based approaches) often incur large accuracy drops relative to dense baselines, and practical support for 1:4 remains limited in current software stacks. As a result, attention has shifted toward quantization and mixture-of-experts, leaving high-sparsity N:M pre-training underexplored. Here we introduce a paradigm shift: we treat neural networks as complex systems whose sparse connectivity can be trained using network-science principles formalized by Cannistraci–Hebb sparse-to-sparse training (CHT), coupled with a tailored optimizer. We propose CHTsNM, a sparse-to-sparse training framework centered on Topology-Aware Newton–Schulz (TANS) optimization. TANS makes Newton–Schulz-style matrix updates compatible with dynamically changing semi-structured sparse topologies via active-mask projection, active-support RMS matching, and refresh-aware ramping after topology updates. CHTsNM further incorporates two lightweight mechanisms: Contextually Modulated LoRA (CoMoLoRA) for input-adaptive low-rank residual compensation, and Motif Pattern Revisitation (MPR) to improve exploration of legal row-wise N:M patterns. Across 4 LLaMA pre-training benchmarks, CHTsNM with 2:4 sparsity achieves performance close to dense baselines on most tasks and yields sparse-over-dense gains on 8 tasks. 1:4 sparsity approaches dense performance, though does not yet consistently surpass it. For hardware evaluation, we report measured speedups for native 2:4 execution on current NVIDIA GPUs, and provide a clearly labeled CSR sparse-GEMM surrogate analysis to estimate the acceleration potential of 1:4. Overall, although not implement on hardware yet, our results identify 1:4 sparse pre-training as a promising direction and establish TANS sparse-to-sparse optimization as a practical step toward future high-sparsity N:M accelerators.