3. Methodologies
The current increase in automated feature engineering research indicates a shift from static, human heuristics to models capable of reasoning, adapting, and learning transformations or selections. In the analyzed studies, we discern three primary methodological prototypes that characterize feature optimization: (i) reinforcement learning for feature selection, (ii) reinforcement learning for feature production, and (iii) LLM-guided feature optimization. These categories indicate not only varying structures but also fundamentally divergent perspectives on the representation of decision processes within the feature space.
The initial group frames the application of reinforcement learning (RL) for feature selection as a sequential decision-making problem. The objective is to train agents to identify which attributes should be retained or eliminated to enhance downstream model performance. Each feature is conceptualized as an autonomous agent inside the multi-agent reinforcement learning framework introduced by Liu et al [
2], receiving feedback through task accuracy, redundancy penalties, and other reward signals. Gao et al [
10] expand upon this by incorporating external guidance from KBest filters and decision trees. Distinguishing "hesitant" from "assertive" agents enhances interpretability and convergence by offering the latter targeted direction in initial training phases. Li et al [
1] address the issue using combinatorial multi-armed bandits (CMAB), thereby circumventing the intricacies of comprehensive reinforcement learning with a more streamlined methodology. This formulation enables rapid and scalable feature selection by incorporating inherent methods to balance mutual information-based relevance and redundancy.
The second collection of works develops new features through mathematical modifications rather than picking existing ones. These methodologies are predicated on the conviction that latent structures within the data can be elucidated through novel representations generated by operations such as addition, logarithmic transformation, or interaction. Three sequential RL agents select two feature groups and an operation to apply between them, so creating GRFG, a group-wise feature generator. Mathematical similarity and mutual information govern these actions. InHRecon, introduced by Zhang et al. [
11], establishes a hierarchical agent framework that modularizes transformation, feature targeting, and operation selection. Their technique ensures that the generated interactions accurately represent genuine second-order effects through the utilization of H-statistics, hence enhancing interpretability.
Large language models (LLMs) are used instead of numerical models for semantic reasoning and text-informed feature engineering in the third practical class. Still, LLMs are useful in more than one way in these systems. In Large language model based Feature Generation [
12] and Text Informed Feature Generation [
3], the LLM mostly functions as a feature generator suggesting changes depending on stimuli, metadata, or acquired knowledge. Proto-RM [
4] learns preference-aligned scoring systems from minimal input using LLMs as part of the reward modeling pipeline instead. These methodological clusters provide a more pertinent framework for comparison than individual paper summaries. Each category exhibits trade-offs in generalization, interpretability, computational expense, and relevance to practical issues. Despite their sensitivity to reward shaping, RL-based selectors are flexible and lightweight. Despite their computational expense, generative agents effectively document interactions. Despite incorporating domain reasoning, LLM-guided models encounter challenges related to repeatability and bias mitigation.
3.1. Reinforcement Learning Strategies for Dynamic Feature Selection
Reinforcement learning-based automated feature selection is a rational advancement from static filter or wrapper methods to dynamic, feedback-oriented selection. The primary idea is to depict the selection of a feature subset as a sequential decision-making process, wherein the learner observes a state (the selected feature subset) and receives delayed incentives (such as subsequent accuracy) upon completion. This domain generates three distinct methodologies, each addressing the challenges of reward sparsity, interpretability, and scalability.
Mutual information (MI) is frequently utilized in the assessment of redundancy and relevance. Specifically, redundancy penalizes overlapping characteristics within the selected subset, whereas relevance promotes traits that align with the target label.:
These metrics help guide the learning process toward informative and diverse subsets, balancing model performance and interpretability.
First representative work models in multi-agent reinforcement learning each as an independent agent. Every agent learns a policy by a common environment evaluating the combined subset on a downstream action and generates binary judgments either select or drop. Especially in early peroid, this structure promotes distributed exploration but suffers from delayed and sparse rewards even if it is naturally parallelizable. The authors aid to reduce feature redundancy by means of group-level normalisation and correlation aware reward components.
Upper Confidence Bound (UCB): The CMAB approach uses UCB to balance exploration and exploitation. The reward estimate for feature arm
i at time
t is computed as [
1]:
Here, is the mean reward of arm i, is the number of times it has been selected, and t is the total time steps. This formulation encourages selection of promising but under-explored features.
A second way builds on the limitations of pure MARL by interacting with standard models to add monitoring from outside the RL loop. This design labels agents as "assertive" or "hesitant" based on how much they trust each other. Decision trees and statistical filters are common ways to train people who aren’t sure what to do. This makes convergence faster, selection quality better, and learning more stable, especially when there isn’t much data. The system can switch between supervised guidance and autonomous policy learning on the fly, combining the flexibility of current RL with the readability of classical logic.
In contrast to the agent-intensive formulations previously mentioned, a third approach conceptualizes each characteristic as an independent arm, hence streamlining the decision framework through combinatorial multi-armed bandits (CMAB). Unlike comprehensive reinforcement learning approaches, these techniques do not represent environmental states or long-term policy trajectories. CMAB optimizes feature subset selection by utilizing short-term reward estimates, commonly through generative Beta sampling or confidence upper bounds. While CMAB models are significantly quicker to train, more interpretable, and more resilient in low-resource or latency-sensitive contexts, their absence of state transitions renders CMAB less expressive than deep RL in sequential decision-making scenarios. They also streamline tuning and acknowledge the effort in reward design. CMAB remains a viable option for tabular datasets when feature independence is presumed or tolerated, despite its inability to model inter-feature interdependence over time steps. The disparities in method design between MARL and bandit-based techniques are mostly influenced by this distinction in expressiveness and efficiency. These models significantly vary in agent architecture, reward granularity, and computing demands.
Table 1 presents a comparative summary.
3.2. Structured Feature Construction via Reinforcement Learning
Feature generation methods produce new attributes by mathematical or statistical transformations, while feature selection techniques concentrate on determining an appropriate subset of existing features. Reinforcement learning (RL) offers a methodical approach to autonomously guide agents through processes such as addition, multiplication, or logarithmic transformations. These methods preserve interpretability while enhancing model performance by linking each generated feature to its original components.
Group-wise Reinforced Feature Generation (GRFG) [
11] presents an interesting method generating three consecutive reinforcement learning agents. The first two agents choose groupings of original features while the third agent chooses a transformation operation (e.g., addition, logarithmic transformation, or cross-product). Since GRFG combines two feature groups instead of aggregating two independent features, therefore producing numerous new features in a single phase, the group-wise interaction of GRFG is unique. This arrangement guarantees better quality of reward feedback and accelerates training. Cosine similarity and mutual knowledge help the agents to guarantee both relevance and diversity.
InHRecon employs a hierarchical reinforcement learning architecture to segment feature creation into three sequential decisions: selecting the transformation operation, identifying the first feature, and then determining the second feature. The approach assesses whether features are numerical or categorical and employs H-statistics to determine the strength of second-order interactions. This ensures the correct processes are executed. These design choices are most effective in contexts where specialized knowledge is limited and complex relationships are difficult to identify manually. InHRecon evaluates both the validity of transformations and the quality of interactions to identify the optimal balance between interpretability and generalization capability.
Feature Interaction Strength: To measure the second-order interaction between features, InHRecon uses Friedman’s
H-statistic [
12]:
This measures how much of the label variation is due to interaction effects. This makes sure that the features that are created show real relationships instead of just noise.
GRFG and InHRecon aim to enhance the performance of subsequent models by developing characteristics that are structured and comprehensible. However, due to their distinct methods of organizing agents, disseminating guidance messages, and providing feedback, they exhibit varying trade-offs, as illustrated in
Table 2.
In GRFG, the term cascaded agents refers to a sequential reinforcement learning setup where each agent makes a decision that feeds into the next stage of the feature generation pipeline.
3.3. Language Guided Feature Engineering with External Knowledge
Large language models (LLMs) are a completely different way of thinking about semantic reasoning, while reinforcement learning methods explore feature space through reward-optimized exploration. Instead of just using numbers or task-specific rewards, LLM-guided systems use language, metadata, and outside knowledge to suggest, rate, or rank new features. These models are great at giving tabular data meaning and encoding logic at the expert level. Three new studies show that LLMs can do three different things: preference-based reward modeling (Proto-RM), knowledge-informed augmentation (TIFG), and feature building (LFG).
The LFG paradigm sees each change in a feature as a task that needs to be thought out. Tree-of-Thought (ToT) prompting is used by some LLM agents to come up with candidate features, explain their reasoning, and change outputs based on input they get later. Include a Monte Carlo Tree Search (MCTS) system to help choose the best choice paths based on model performance metrics, like F1 score or accuracy. This method mixes symbolic reasoning with probabilistic search so that LLMs can look into a number of transformations—for example, log(income), weight × age without coming up with ideas that are too similar or don’t make sense. LFG works well with many models, like KNN, MLP, and Random Forest, with little labeled guidance.
The idea that real-world traits can have secret meanings in their names or dataset metadata is built upon by TIFG [
3]. Retrieval-Augmented Generation (RAG) looks for task-related ideas on Wikipedia or other sources, then asks the LLM to put together things like "density = population / land area" or "BMI = weight / height2." This method works best for financial and healthcare datasets where subject knowledge is not built into the raw numbers. TIFG supports thinking and justifying in more than one way, which creates features that are both understandable and new.
Proto-RM [
4] shifts the focus from making features to evaluating and aligning those features. It teaches a prototypical reward model by using input from a small number of human comparisons (chosen from features or outputs that were rejected) to make prototype vectors. Then, new examples are given marks based on how much they look like the prototypes that were bought. This model makes reward accuracy better in low-label settings, especially for LLM output alignment, preference modeling, or safety tuning in chat systems.
Prototype-Guided Loss Function: Proto-RM optimizes a composite loss to align model outputs with human preference [
4]:
in this formula promotes the model to provide preferred outputs like human preference rankings or accurate classifications—higher scores, which are usually based on pairwise comparisons. By encouraging variance among learnt prototype representations, the term helps avoid mode collapse and enhances generalization to a variety of input sources. The trade-off between alignment accuracy and representation spread is managed by the hyperparameter .
Table 3.
Comparative Analysis of LLM-Guided Feature Generation and Evaluation Frameworks.
Table 3.
Comparative Analysis of LLM-Guided Feature Generation and Evaluation Frameworks.
| Method |
Core Functionality |
Reasoning Strategy |
Strengths |
Limitations |
| LFG |
Semantic feature construction through iterative reasoning |
Tree-of-Thought prompting combined with Monte Carlo Tree Search |
Enables structured transformations with minimal labeled supervision; supports multi-step reasoning chains |
Highly sensitive to prompt phrasing and search depth; computationally intensive |
| TIFG |
Context-aware feature synthesis using external corpora |
Retrieval-Augmented Generation (RAG) |
Produces domain-grounded features aligned with task context; leverages large-scale external knowledge |
Heavily dependent on retrieval quality; results may lack reproducibility across datasets |
| Proto-RM |
Reward-guided evaluation of LLM outputs via human preference alignment |
Prototype-based Reward Modeling using contrastive supervision |
Promotes efficient, interpretable learning of human-aligned reward signals; applicable for fine-tuning LLM behavior |
Requires curated preference comparisons; limited direct use for generative feature construction |