Nano-UAVs weighing under 50g have become useful IoT platforms for GPS-denied navigation, but fitting a neural network into their sub-512kB memory and sub-100mW power budget remains an open engineering problem. PULP-Dronet v3 tackles this with depthwise separable (D+P) blocks and a channel-reduction factor γ. Even so, its most compressed variant (γ = /8, 1.1M MACs) loses 6 percentage points of collision accuracy versus the full model. Methods: We swap the 5×5 first convolution for a 3×3 depthwise + 1×1 pointwise pair, and retrain with cosine-annealing scheduling and per-epoch color-jitter augmentation. Results: At γ = /4 the model has 6409 parameters, needs only 540K MACs, and scores 83.97% collision accuracy with 0.372 steering RMSE on the official benchmark—+2.97pp over the same-γ baseline at 4.4× less compute. The full γ = /1 model (12M MACs) reaches 84%; our model nearly matches it with 22× fewer operations. Conclusions: Factorizing the stem and adjusting the training recipe recovers most of the accuracy lost to aggressive channel reduction, without adding inference cost.