Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Tamás Márton; Balázs Szalontai; Balázs Pintér; Tibor Gregorics

doi:10.20944/preprints202603.0295.v1

Submitted:

03 March 2026

Posted:

04 March 2026

You are already at the latest version

Abstract

Refactoring is essential for developing maintainable software. Using Large Language Models in software engineering is widespread, but compared to well-established domains such as code generation, reliable refactoring is still relatively underexplored. In this paper, we perform a broad analysis on the refactoring capabilities of small open-weight language models (SLMs) by evaluating 12 models on 3,453 Python programs. Our study focuses on the two defining aspects of refactoring: behavior preservation and code quality improvement. We evaluate these properties using unit tests and various code metrics. Across models ranging from 0.5B to 8B parameters, most models improve code quality. Larger models are more reliable, as they preserve behavior more consistently. Reasoning models often make more significant changes while refactoring. Allowing models to generate reasoning traces improves performance, but only for models larger than 4B. For smaller models, reasoning in fact reduces refactoring reliability. The difficulty of the underlying task affects refactoring performance, with more complex tasks associated with higher failure rates. Our results indicate that current open SLMs can support refactoring tasks, especially larger ones with reasoning capabilities, but they are best used with human oversight.

Keywords:

refactoring

;

small language models

;

open-weight

;

code equivalence

;

code quality

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe