Preprint
Article

This version is not peer-reviewed.

A Generalized Logistic‐Logit Function and Its Application to Multi‐Layer Perceptron and Neuron Segmentation

Submitted:

30 April 2026

Posted:

06 May 2026

You are already at the latest version

Abstract
Logistic and logit functions play important roles in modern science, serving as foundational tools in various applications including artificial neural network (ANN). While there are functions that could produce distinct logistic and logit curves, no single, unified framework has been developed to generate both logistic and logit curves. We introduce a generalized logistic–logit function (CMG-GLLF) to fill this gap. CMG-GLLF provides four interpretable and trainable parameters that allow explicit control over: curve type and steepness, asymmetry, upper and lower limits of x- and y-axes. CMG-GLLF’s potential is explored in basic machine intelligence tasks. As a proof-of-concept on how this function can improve performance of deep learning, we propose a trainable input feature modulator (IFM) that consists in learning the parameters of the CMG-GLLF for each input layer node during backpropagation for multi-layer perceptron (MLP), which is a fundamental building block of many complex network architectures. Compared to various other learnable functions, across 3 different optimizers, CMG-GLLF allows superior MLP’s accuracy and stable training behavior on CIFAR-10 and CIFAR-100 image classification, but at the cost of increased computational time. Hence, we identified limitations to address in future studies, notably the need to derive an explicit mathematical expression for the logit phase, which could: (i) mitigate numerical instability in more complex architectures (e.g., CNNs) while reducing computational overhead, and (ii) enable a systematic evaluation of CMG as an activation function across all layers. Furthermore, CMG-GLLF adopted as data transformation function enhances the accuracy of affinity-graph-based neuron segmentation. CMG-GLLF combines in a unique framework the ability of logistic and logit function to modulate signals or variables, covering a full spectrum of attenuation or amplification transformations. CMG-GLLF is flexible and trainable, has potential to advance machine learning models, and can inspire further applications in other data analysis challenges in different domains of science.
Keywords: 
;  ;  ;  ;  

1. Introduction

Logistic and logit functions play pivotal roles in various fields such as economics, medicine and computer science (Kwasnicki, 2013; Ramos, 2013; Boateng and Abaye, 2019; Dubey et al., 2022). The logistic function is a fundamental mathematical tool widely employed across diverse fields due to its unique S-shaped curve and bounded output. In statistics and machine learning, it forms the core of logistic regression, enabling effective modeling of binary outcomes and probabilistic predictions (Cramer, 2003; Hosmer Jr et al., 2013). In biology, the logistic function is used to describe growth phenomena under resource constraints, capturing the transition from exponential growth to saturation (Marciniak-Czochra, 2003; Wu et al., 2020). Its smooth, differentiable nature also makes it indispensable in artificial neural networks, where it serves as a nonlinear activation function facilitating gradient-based learning (Dubey et al., 2022). Meanwhile, the logit function, the inverse of the logistic function, maps probabilities from the interval (0,1) onto the entire real line by transforming a probability p into its log-odds. This transformation is central to logistic regression: by modeling the log-odds in terms of predictors, the logit function allows one to capture how the linear combination of covariates and predictors affects the probability of event happening (Hosmer Jr et al., 2013). The versatility of the logistic and logit functions in modeling growth, decision-making, classification, and intelligent systems underscores their enduring significance in both theoretical and applied research.
Nevertheless, the standard logistic and logit functions expressed as
l o g i s t i c x = 1 1 + e x
l o g i t x = ln x 1 x
has limited flexibility, which restricts their ability to capture complex real-world phenomena—particularly in cases of imbalanced, skewed, or asymmetric growth and response behaviors. To overcome these shortcomings, researchers have introduced generalized logistic (Richards curve) and generalized logit functions with additional tunable parameters, thereby extending their adaptability to a wider range of scientific applications (Richards, 1959; Prasetyo et al., 2020). (See Supplementary Material 1 for detailed description) Yet, these approaches still face important limitations:
(1)
Lack of exact boundaries. Generalized logistic and logit functions do not provide exact reachable lower and upper bounds on either the x- or y-axis, posing difficulties in applications that require strict input/output boundaries and precise mappings. For example, when modeling the relationship between project time t and completion rate r: at the start (t=0), r=0%, and values below this have no practical meaning; at the deadline (t=T), r=100%, and values beyond this point are impossible.
(2)
Limited shape control. The steepness and asymmetry of generalized logit functions are determined only by the relative relationship between two parameters, instead of using separate parameters that independently control these curve characteristics.
To address these issues, we introduce the Cannistraci–Muscoloni-Gu generalized logistic–logit function (CMG-GLLF, denoted as CMG below for simplicity), which not only fills these gaps but also provides the first unified framework for generating both generalized logistic and logit curves. Derived from generalized logistic function (Richards, 1959) CMG offers:
(1)
Explicit control over exact reachable lower and upper bounds on both x and y axis
(2)
Independent control of steepness and asymmetry through inflection rate and deviate inflection point.
(3)
An approximation algorithm to derive the logit curve by inverting the generalized logistic curve, introducing more flexibility into logit curve.
(4)
A unifying inflection rate parameter that enables smooth transitions between step functions, logistic functions, linear functions, and constant functions.
We explore CMG’s potential in machine learning through two main applications. First, CMG can be adopted as an input feature modulator (IFM), assigning each input feature a CMG curve with learnable parameters (via gradient update in back propagation) to enhance deep learning performance, with negligible parameter increase. Unfortunately, meticulously testing the CMG on large neural network architectures would require a large budget that we do not have available, therefore instead of performing few and unreliable testes on many large network architectures, we opted as a proof-of-concept to conduct a deep and extensive study on multi-layer perceptron (MLP). This because the MLP is a fundamental building block of many artificial neural network structures. However, to design a challenging stress-test that could help us to investigate properly the advantage of CMG, we selected non-trivial tasks for MLP which are CIFAR-10 and CIFAR-100 image classification. This could allow us to fairly investigate the superiority of CMG as IFM compared to no-IFM direct input and many other learnable functions.
In addition to performance, practical aspects such as numerical stability, computational overhead and their variations across different neural network architectures are important when introducing new functional modules into neural networks. In particular, the implicit approximation procedure required for the logit-phase of CMG may introduce additional computational cost and raise concerns about training stability. Therefore, we included systematic evaluation of the numerical stability and computational overhead of CMG as IFM in MLP under three different optimizers, and we added a preliminary analysis on a simple convolutional neural network (CNN).
CMG as IFM increased performance of MLP while retaining numerically stability, but at the cost of increased computational time. Hence, we identified also limitations to address in future studies, most notably the need to derive an explicit mathematical expression for the logit phase. Such an expression could (i) mitigate numerical instability in more complex architectures such as CNNs while reducing computational overhead, and (ii) enable a systematic evaluation of CMG as an activation function across all layers
Second, we demonstrate that CMG can improve the accuracy of an affinity-graph-based neuron segmentation algorithm by transforming affinity graphs with CMG mappings (Funke et al., 2019). Finally, since CMG is interpretable and explainable, we analyze the type of signal transformation which results from CMG application.
Overall, CMG provides a powerful new family of input feature modulators and data transformation tools for machine learning. With its high flexibility, optimized CMG’s curves can be tailored to diverse tasks thereby they might be investigated in future studies for accelerating the deployment of machine learning models in both research and industry.

2. Results and Discussion

2.1. CMG

The expression of CMG is given by
C M G x = y L + y R y L 1 + x m a x x x x m i n e 2 1 μ x x m i n x m a x x m i n I ,   i f   0 μ 0.5   f 1 x , 1 μ ,   i f   0.5 < μ 1
The below text and Figure 1 explain how different parameters affect the shape of CMG curve.
  • x m i n and x m a x are the minimum and maximum of x values
  • y L and y R are the corresponding y value of x m i n and x m a x
  • x I   is deviate inflection point determining the asymmetry of the curve and I controls the relative position of it on x-axis, specifically
x I = x m i n + I x m a x x m i n
when 0 < I < 0.5 , the curve is left-skewed
when I = 0.5 , the curve is symmetric about the deviate inflection point
when 0.5 < I < 1 , the curve is right-skewed
μ is inflection rate controlling the steepness and the type of CMG curve, specifically:
When μ = 0 , the curve is a step function
C M G x = y L ,   i f   x <   x I y R ,   i f   x >   x I max y L , y R ,   i f   x = x I
When 0<μ<0.5, it is a logistic curve, and a lower μ results in a steeper curve
When μ=0.5, it is a linear curve
C M G x = y L + x x m i n y R y L x m a x x m i n
When 0.5<μ<1, it is a logit curve derived from an approximation algorithm that inverts the logistic curve, and a lower μ results in a steeper curve.
When μ=1, it becomes a constant function defined as
C M G x = y I = y L + I ( y R y L )
Here for simplicity and brevity, we only report the function and we comment its formulation. For readers interested in the detailed theoretical derivation of CMG and the approximation algorithm, please refer to the Supplementary Material 2.

2.2. CMG as MLP Neural Network Input Feature Modulator (IFM)

The goal of this section is to offer evidence that CMG is trainable. To this aim we demonstrate that by applying CMG as input feature modulator for MLP in image classification, the accuracy and learning speed can be greatly improved. The motivation to design an IFM is the hypothesis that task-oriented learning of the input feature value distribution during end-to-end training provides a better representation of input data to improve the model’s performance. Specifically, we assign each input feature a learnable CMG curve during end-to-end training to modulate (amplifying or attenuating) the feature values. During training, μ and I can be learned within the limit of definition range (0<μ<1, 0<I<1). This means that the MLP has 2*Ni parameters more to train, where Ni is the number of nodes in the input layer. The upper and lower limits of x- and y-axes are data specific, in which the lower bound for x and y values are minimal value in the input data batch, and the upper bound is the maximal value in input batch. This is to ensure only the value distribution is reshaped but the original feature range is kept invariant.
We focus on MLP for image classification rather than on more recent and complex neural network architectures, because MLP represents one of the simplest network forms and are therefore well suited to evaluate the net effect of the IFM, avoiding interference from other performance-enhancing mechanisms such as convolutional layers (Rumelhart et al., 1986; Lecun et al., 1998). In addition, we conducted an extensive set of experiments to evaluate multiple CMG training scenarios and to compare them with other transformation functions; therefore, we opted for the MLP to moderate GPU computing costs.
We train the CMG on the MLP as a stress test to cope with two image classification tasks that are difficult for a basic neural network architecture: CIFAR-10 (in which the task is to classify a colored 32x32 image into one of the 10 object categories) and CIFAR-100 (similar to CIFAR-10 but there are 100 object categories, which is more challenging) (Krizhevsky, 2009). For CIFAR-10 we adopt an MLP with 2 hidden layers with size {1024,512} as done by (Wesselink et al., 2024). This means that for CIFAR-10 the IFM costs a 0.17% (2*Ni/total number of parameters = 2*3072/ ((3072 * 1024) + 1024+(1024 * 512) + 512+(512 * 10) + 10+2*3072)) parameter increase. For CIFAR-100 we extend the hidden layer dimension by 2 to enable the network dealing with a more complex task (Wesselink et al., 2024). This means that for CIFAR-100 the IFM costs a 0.07% (2*Ni/total number of parameters = 2*3072/ ((3072 * 2048) + 2048 +(2048 * 1024) + 1024+(1024 * 100) + 100 + 2*3072)) parameter increase. Figure 2A depicts the MLP with input feature modulator in the input layer. We also extended the number of training epochs in (Wesselink et al., 2024) from 100 to 150 for a deeper investigation of IFM’s potential. The training configurations are described in Section 4.3.
In the experiments, the highest accuracy in all epochs is taken to measure the classification performance and area across epochs (AAE) proposed in (Zhang et al., 2024) is utilized to measure the learning speed (calculation detailed in Section 4.6).
In ANN training, learning rate scheduler is widely applied to gradually decay the learning rate as training proceeds, which enable a smooth transition from exploration at the beginning and exploitation in the end. We investigate 3 learning rate (LR) schedulers with different decay patterns to determine which one works best on MLPs with CMG input feature modulator (CMG) and without IFM (vanilla MLP). We train MLPs using SGD optimizer, which is one of the most popular optimizers in deep learning. Linear scheduler decays linearly, while sub-linear scheduler decays faster at first and then gradually reduce the decay speed, and supra-linear scheduler has the inverse tendency of sub-linear scheduler. After training for 90% of total epochs, the learning rate for 3 schedulers converge at the same point and remain the same for the rest of the training. The detailed formulas for the schedulers are provided in Method section 4.2. Figure 2B depicts the learning rate decay of 3 schedulers in 150 epochs with L R i n i t = 0.01 . We evaluated these 3 schedulers on CIFAR100 (the most complicated task in the study) to determine which works best and the results shown in Figure 2C reveals that supra-linear scheduler has the best accuracy and AAE in both CMG and vanilla MLP. Thus, supra-linear scheduler is employed in all the following experiments. Furthermore, to show the performance boost brought by CMG is not simply from increasing the number of parameters in the network, we tested 4 other learnable functions with 2 tunable parameters as its counterpart for a fair comparison (Detailed in Supplementary Material 4) using SGD optimizer and CMG was the best performing. Not still satisfied, to foster a deeper and complete research, we tested whether CMG can outperform learnable functions with more than 2 parameters, comparing CMG against a 4-parameter learnable function SReLU (Jin et al., 2015). (Detailed in Supplementary Material 5)
Figure 2D and 2E show the accuracy of MLP modulated by CMG, linear transformation (best-performance counterpart function), and vanilla MLP on test sets across the epochs for CIFAR10 and CIFAR100 respectively. As can be seen here across both datasets:
  • By adopting CMG as IFM, we can greatly improve both the accuracy and learning speed compared to the vanilla MLP. On CIFAR10 (Figure 2D) and CIFAR100 (Figure 2E), we achieve an absolute improvement of 5.04% and 6.38% compared to vanilla MLP. Also, adopting CMG can increase AAE for 3.93% and 5.81% on CIFAR10 and CIFAR100 respectively
  • Among the five different 2-parameter IFM functions, CMG achieves both the best accuracy and highest learning speed followed by linear transformation function (Supplementary Material 4)
  • Compared against SReLU, a 4-parameter IFM function, CMG has higher learning speed on both datasets, and it achieves superior accuracy on CIFAR-10 and comparable accuracy on CIFAR-100. (Supplementary Material 5)
To show IFM’s numerical stability during training, we train vanilla MLP, linear and CMG using another 2 optimizers, AdamW and recently-proposed Muon, which has demonstrated strong empirical performance and improved convergence in training modern neural networks (Loshchilov and Hutter, 2019; Jordan, 2024). The test accuracies are shown in Figure 3 It shows that on whichever optimizer, CMG can consistently outperform linear IFM and vanilla MLP on both CIFAR10 an CIFAR100. Supplementary Table S1 records the occurrence of events including non-finite loss (NF Loss), non-finite gradient (NF Grad) and non-finite parameters (NF Param) during training. It shows that in all experiments, there isn’t occurrence of instable training events, indicating that training is stable regardless of the optimizer used when using CMG as IFM.
Considering training CMG as IFM requires using approximation algorithm to calculate the logit-phase CMG values in the forward pass, we also evaluated the computational overhead, including peak GPU memory usage and training time. As shown in Supplementary Table S2, CMG requires more GPU memory and training time compared to linear IFM and vanilla MLP. This is the current limitation of CMG as IFM, and will be commented in Discussion section.
Furthermore, to gain a better understanding of how CMG improves the learning on MLP, we visualize the test set image of CIFAR10 before and after CMG modulation in Figure 4, where the learned IFM with best accuracy is adopted. Taking category “frog” and “bird” as examples, it could be observed that after CMG modulation, there is a clearer brightness contrast in the image. For example, the important features such as the eye of the frog and the crest of the bird are more highlighted after modulation. To quantitatively validate this phenomenon, for each image, luma (Guidance for operational practicesin HDR television production, 2022) (a proxy for measuring brightness, higher the brighter) of each pixel is calculated. By calculating root mean square (RMS) of lumas in the image, the extent of brightness contrast can be measured (Peli, 1990). The RMS Contrast (Luma) bar plot in Figure 4 shows that the brightness of pixels is more dispersed after CMG modulation, indicating IFM provides a stronger brightness contrast after modulation. The methods to calculate luma and RMS are described in Section 4.5.

2.3. Preliminary Results of CMG as Input Element Modulator in CNN

We also include a preliminary analysis of CMG working as element modulator of the input channels of a CNN architecture with 3 convolutional layers and 1 fully-connected layer, to investigate how CMG might change behavior when applied to more complex structures. This means for each element in the input image to CNN, we assign a trainable CMG curve as modulator. Supplementary Figure 3 shows the test accuracy on CIFAR10 and CIFAR100 datasets, illustrating that CMG trained within the CNN framework has lower accuracy than vanilla CNN. In Supplementary Table S3, the numerical stability results are analyzed, showing instability occurs during training for CMG. Interestingly, the performance of the linear IFM is higher of the vanilla CNN and it does not display any numerical instability during training. These results point out that while the idea of IFM is valid in general, the CMG solution needs adjustment that should be investigated in future studies in order to be applied to more complex architectures than MLP.

2.4. Improving Affinity-Graph-Based Neuron Image Segmentation Algorithm with CMG

In this section, we demonstrate the effectiveness of CMG in improving the accuracy of an algorithm that segment neurons in brain electron microscopy (EM) images. Here we utilize the re-aligned, augmented Drosophila brain EM 3D image from CREMI challenge as benchmark dataset and CREMI score as evaluation metric (detailed in Section 4.6) (Funke et al., 2016). The segmentation pipeline proposed by (Funke et al., 2019) is adopted as SOTA method. (see Figure 5A). Specifically, the pipeline first predicts an affinity graph for the 3D brain image, which estimates the probability that each pair of adjacent voxels in the image belongs to the same neuron segment (referred as the original affinity graph here). We can enhance the sharpness of the affinity graph weights by increasing their contrast with the CMG that acts as a soft thresholding transformation. Then an aggregation algorithm merges the voxels into neuron segments based on the affinity graph weight values, which involves 2 segmentation parameters: merge function (MF) and a thresholding parameter.
Here the motivation to apply CMG as a data transformation tool is to adjust the weight value distribution in the original affinity graph, resulting in transformed affinity graphs, aiming to improve the accuracy of final segmentation. A grid-search is performed on a range of different CMG parameters to determine the optimal combination of inflection rate (μ) and inflection point (I) for enhancing segmentation performance. The upper and lower limits are data specific and are fixed at the minimal and maximal affinity values in the affinity graph to reshape only the distribution of values but keep the affinity value range invariant. For each affinity graph including both transformed and the original, a grid-search involving 5 different merge functions and 9 thresholds are applied to produce the segmentations. Table 1 shows the grid-search space for neuron segmentation experiments. For both the original and transformed affinity graph, the segmentation that yields the lowest CREMI score, is used to represent the performance of the state-of-the-art (SOTA) and CMG-enhanced SOTA (denoted as CMG-enhanced here) respectively.
Compared to the SOTA method proposed by (Funke et al., 2019), CMG-enhanced shows a clear improvement in neuron segmentation quality. As shown in Figure 5B, and 5C, when μ=0.4 and I = 0.3, the segmentation result produced using the transformed affinity graph achieves a CREMI score that is 9.0% lower than that of the original affinity graph.
Additionally, we conduct a qualitative evaluation of neuron segmentations produced with different methods (Figure 5D). In the middle is a neuron segment in CREMI benchmark. Its corresponding neuron segments in CMG-enhanced segmentation is made of 2 segments with the pink one below preserving most of the structures in the benchmark. While those in SOTA segmentation is more fractured which splits the neuron segments in 3 parts, which causes more split errors.

3. Discussion

We developed the Cannistraci–Muscoloni-Gu generalized logistic–logit function (CMG-GLLF), the first unified framework capable of generating both logistic and logit curves. Its tunable parameters enable flexible control over curve type, steepness, asymmetry, and the upper and lower limits on both the x- and y-axes.
By assigning each feature a learnable CMG curve as input feature modulator (IFM) for MLP during backpropagation, we greatly improved the performance of fully connected MLPs with a negligible parameter increase (0.17% in CIFAR-10 and 0.07% in CIFAR-100), achieving superior accuracy and stable training across 3 different optimizers, AdamW, SGD and Muon. Moreover, the CMG curve enhanced the accuracy of a neuron image segmentation algorithm by transforming its intermediate product—the affinity graph. These experiments demonstrate that CMG, as a simple yet flexible computational tool, holds promise for advancing machine learning models.
Despite these advantages, CMG has certain limitations. First, interpreting the deviate inflection point in CMG presents challenges. In classical logistic curves, the inflection points mark the location of maximum growth rate. However, because CMG introduces the parameter Q to control bounds on both axes, inflection point becomes deviated, which means it aligns with the maximum growth rate only when I=0.5. Although our definition of deviate inflection point remains a useful reference for curve behavior, it introduces deviations that should be carefully considered when applying CMG to specific tasks where the position of maximum growth rate is extremely important.
Secondly, while CMG showed to be robust to numerical instability as IFM on MLP architecture regardless of the adopted optimizer, it requires more GPU memory and training time compared to vanilla MLP. Furthermore, evidence of numerical instability appeared on more complicated architectures such as CNN. These issues are associated to the approximation algorithm to invert the logistic curve that requires the use of the implicit function theorem to calculate approximated gradients, which lead to computational overhead and risks of numerical instability during backpropagation. To address this relevant issue, we conducted a new study that is currently available as a pre-print by Gu et al. (Gu and Cannistraci, 2026) in which we present the derivation of a new explicit, fully differentiable CMG expression. We derive this by approximating the inverse of logistic-phase using Newton’s method with one iteration step. The results of the application of the explicit CMG on the MLP and CNN adopted in the current article shows that the issues of numerical instability and computational overhead are addressed, with CMG performing as the best IFM not only on this CNN but also VGG-16 (Simonyan and Zisserman, 2015). We further managed to adopt CMG as hidden layer activation functions on both classical MLP and physics-informed neural networks (PINN),(Raissi et al., 2019) improving the performance in PINN by a magnitude compared to benchmark activation function, illustrating the wider applicability of CMG in the field of deep learning. Furthermore, the fact that CMG parameters can be learned during training allows to use the distribution of their values as a new tool for explainability of MLP and PINN. (Gu and Cannistraci, 2026).
In future studies, CMG’s potential should be investigated on more diverse deep learning architectures and applications. Leveraging CMG as learnable activation function in neural networks could become a key factor for their explainability.

4. Materials and Methods

4.1. Datasets

Here CIFAR-10 and CIFAR-100 datasets are adopted as benchmark image classification datasets for evaluating the performance of input feature modulator. (Krizhevsky, 2009) Both of them consists of 32×32 colored images. CIFAR-10 contains 60,000 images evenly distributed across 10 classes, while CIFAR-100 contains the same number of images but divided into 100 classes with 600 images per class. Each dataset is split into 50,000 training and 10,000 test images.
For neuron segmentation, publicly available Drosophila melanogaster brain electron microscopy (EM) image dataset from MICCAI 2016 CREMI Challenge is adopted here to show the effectiveness of CMG in improving the accuracy of automatic neuron segmentation algorithm. (Funke et al., 2019) Our experiment utilizes re-aligned augmented CREMI training set B provided in (Funke et al., 2019) as benchmark. The ground-truth labeling provided is an instance segmentation of neurons in 3D images, where voxels sharing the same neuron label belongs to the same neuron segment.
The code for reproducing the results in this article can be found on GitHub https://github.com/biomedical-cybernetics/CMG-GLLF.git

4.2. Learning Rate Schedulers

For a deeper investigation of how different learning rate decay patterns can impact the training of MLP with input feature modulator, we designed 3 different learning rate schedulers, in which linear scheduler decays linearly, while sub-linear scheduler decays faster at first and then gradually reduce the decay speed, and supra-linear scheduler has the inverse tendency of sub-linear scheduler. Their expressions are detailed below:
Denoting L R c u r r e n t as current learning rate, L R i n i t as initial learning rate, T as total number of epochs, and t as current epoch
  • Linear scheduler
L R ( t ) = L R i n i t * 0.01 + 0.99 * 1 t 0.9 * T   i f   0 t 0.9 * T   L R i n i t * 0.01   i f   0.9 * T < t T
  • Sub-linear scheduler
L R ( t ) = L R i n i t * 0.01 t 0.9 * T   i f   0 t 0.9 * T   L R i n i t * 0.01   i f   0.9 * T < t T
  • Supra-linear scheduler
L R ( t ) = L R i n i t * 0.01 + 0.99 * 1 t 0.9 * T   i f   0 t 0.9 * T   L R i n i t * 0.01   i f   0.9 * T < t T

4.3. IFM Training Setups

Here to evaluate the performance of IFM, we adopted MLP to train on 2 popular benchmark image classification tasks: CIFAR-10 and CIFAR-100. To analyze performance and stability of CMG on different optimizers, we trained vanilla MLP, linear IFM, and CMG using three optimizers: SGD, AdamW, and Muon. SGD with momentum served as the classical baseline optimizer. AdamW was included as a widely used adaptive optimizer with decoupled weight decay. Muon was included as a recent optimizer designed for stronger performance and improved convergence on modern architectures. For all optimizer settings, the same MLP backbone, scheduler, batch-size search, epoch number, and random seeds were used within each dataset. (Table 2)
To provide a preliminary assessment beyond MLP, we also tested CMG as an IFM on a simple CNN. The CNN consisted of three convolutional blocks with output channels [32,64,128] followed by adaptive global average pooling and a final linear classifier. We train the simple CNN on both CIFAR10 and CIFAR100, and Muon is adopted as optimizer. The training hyper-parameter settings including scheduler, learning rate search, batch-size search, epoch number, and random seeds are the same as MLP in Table 2.
To encourage exploration at the beginning of training, the initial parameters in CMG are sampled from uniform distributions where μ values are sampled from N ~ 0.25,0.75 and I value are sampled from N ~ 0,1 .

4.4. Calculation of CMG Gradients

When we train CMG modulator during backpropagation, for logistic-phase CMG, the gradients for x, μ and I can be easily calculated from the mathematical expression, but for logit-phase it needs some tweaks to calculate the approximate gradient because there lacks explicit expression for this phase. Given f marking the expression of logistic-phase CMG.
Since CMG logistic-phase meets following conditions:
  • bijective (one-to-one correspondence)
  • differentiable everywhere
Thus, we can utilize inverse function rule (Marsden and Weinstein, 1981) to calculate the approximate gradient of x in logit-phase, which is
[ f 1 ] ' ( x ) = 1 f ' ( f 1 ( x ) )
Where f’ can be easily derived and logit-phase value f-1(x) can be obtained from the approximation algorithm.
For the approximate gradient of μ in logit phase, letting y = f(x, μ, I) and F(x, μ, I, y) = f(x, μ, I)-y= 0, so F meets the following conditions:
  • There exists at least one point satisfying F(x, μ, I, y) = f(x, μ, I)-y= 0, which comes naturally given by our definition
  • continuously differentiable regarding to all variables: x, μ, I, y
  • the derivative of F regard to x does not equal to 0, which is F x = f x   0
(see Supplementary Material 3 for the proof)
Thus, we can derive the approximate gradient of μ in logit phase using implicit function theorem (Krantz and Parks, 2013). Specifically, we have
d x d μ = F μ F x = f μ f x
Since the partial derivative of f regarding to μ and x are easy to calculate, we can derive an explicit expression of above equation. If we further replace x=f-1(y, μ, I) on both sides with y=f-1(x, μ, I) whose value can be calculated from approximation algorithm, this can yield the approximate derivative of logit-phase CMG with regard to μ which is d y d μ , and we can also derive that of I, following the same procedure

4.5. Image Modulation and Contrast Analysis

To evaluate the effects of CMG modulation, we compare each original image to its CMG-transformed counterpart using both qualitative and quantitative metrics. We compute luma (a perceptual proxy for brightness) by converting the RGB image to grayscale using the standard BT.601 weights:
Y ' = 0.299 R ' + 0.587 G ' + 0.114 B ' Where R ' , G ' and B ' are gamma-encoded sRGB channels (International Telecommunication Union, 2011; Guidance for operational practicesin HDR television production, 2022).
The extent of brightness contrast is quantified as the root mean square (RMS) of luma (Peli, 1990):
R M S   c o n t r a s t = 1 N i = 1 N ( Y i ' Y ' ¯ ) 2
Where N is the number of pixels and Y ' ¯ is the mean luma of all pixels in the image. A higher RMS contrast value implies a larger difference between pixel intensities in an image

4.6. Quantitative Evaluation Metrics

When evaluating the performance of MLPs in IFM experiments, accuracy is utilized for measuring the classification precision and area across the epochs for measuring the learning speed. The Area Across the Epochs (AAE) is the average performance of an algorithm up to a specific epoch, calculated by dividing the cumulative sum of its accuracy by the number of epochs. Bounded between [0,1], it indicates the algorithm’s learning speed (Zhang et al., 2024).
When quantitatively evaluating the performance of neuron segmentation, CREMI score, lower the better is employed. It is calculated by the geometric mean of variation of information (VOI) and adapted rand error (ARAND) between predicted segmentation and ground truth (Arganda-Carreras et al., 2015).
To evaluate numerical stability during training, we recorded the total number of non-finite loss, gradient, and parameter events. A non-finite loss event indicates that the loss became NaN or Inf. A non-finite gradient event indicates the occurrence of NaN or Inf values in the gradients, and a non-finite parameter event indicates NaN or Inf values in the model parameters. To evaluate computational overhead, we recorded peak GPU memory consumption and training time. For fair comparison of overhead across methods, peak GPU memory and training time were measured under a fixed benchmark with batch size 64 and learning rate 0.001.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org.

Author Contributions

C.V.C. conceived the study. C.V.C., A.M. invented the generalized logistic-logit function and W.G. improved its mathematical formalization and derivation. C.V.C invented, W.G. and Y.Z. implemented the input feature modulator. W.G. implemented the neuron segmentation experiments under C.V.C guidance. W.G, Y.Z. and C.V.C. analyzed the results. W.G realized the figures under the C.V.C. guidance. W.G. wrote the manuscript under C.V.C. guidance with inputs and corrections from all the other authors. C.V.C. led, directed, and supervised the study.

Funding

This work was supported by: the Zhou Yahui Chair Professorship award of Tsinghua University (to C.V.C.); the National High-Level Talent Program of the Ministry of Science and Technology of China (grant number 20241710001, to C.V.C.).

Data Availability Statement

The code for reproducing the results for this study can be found in GitHub https://github.com/biomedical-cybernetics/CMG-GLLF.git.

Acknowledgments

We thank Yuchi Liu, Mo Yang, Lixia Huang, and Weijie Guan for the administrative support at THBI. We also thank Hanming Li for reviewing this work and his kind suggestions. This work was supported by the Zhou Yahui Chair Professorship award of Tsinghua University (to C.V.C.), the National High-Level Talent Program of the Ministry of Science and Technology of China (grant number 20241710001, to C.V.C.) OpenAI ChatGPT 5.1 model was used to refine the language of this manuscript, which is available at https://chatgpt.com/.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Arganda-Carreras, I., Turaga, S. C., Berger, D. R., Cireşan, D., Giusti, A., Gambardella, L. M., et al. (2015). Crowdsourcing the creation of image segmentation algorithms for connectomics. Front. Neuroanat. 9. [CrossRef]
  2. Boateng, E. Y., and Abaye, D. A. (2019). A Review of the Logistic Regression Model with Emphasis on Medical Research. J. Data Anal. Inf. Process. 07, 190. [CrossRef]
  3. Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., and Ronneberger, O. (2016). “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, eds. S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells (Cham: Springer International Publishing), 424–432. [CrossRef]
  4. Cramer, J. S. (2003). The Origins of Logistic Regression. SSRN Electron. J. [CrossRef]
  5. Dubey, S. R., Singh, S. K., and Chaudhuri, B. B. (2022). Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 503, 92–108. [CrossRef]
  6. Funke, J., Saalfeld, S., Bock, D., Turaga, S., and Perlman, E. (2016). CREMI MICCAI Challenge on Circuit Reconstruction from Electron Microscopy Images. https://cremi.org/.
  7. Funke, J., Tschopp, F., Grisaitis, W., Sheridan, A., Singh, C., Saalfeld, S., et al. (2019). Large Scale Image Segmentation with Structured Loss Based Deep Learning for Connectome Reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1669–1680. [CrossRef]
  8. Gu, W., and Cannistraci, C. V. (2026). Explainability of node modulation in classical and physics-informed neural networks via a generalized activation function. (in preparation).
  9. Guidance for operational practicesin HDR television production (2022). International Telecommunication Union, Radiocommunication Sector (ITU-R). Available at: https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2408-5-2022-PDF-E.pdf? (Accessed October 2, 2025).
  10. Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
  11. International Telecommunication Union (2011). Recommendation ITU-R BT.601-7: Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios. Available at: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf.
  12. Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., and Yan, S. (2015). Deep Learning with S-shaped Rectified Linear Activation Units. [CrossRef]
  13. Jordan, K. (2024). Muon: An optimizer for hidden layers in neural networks. Available at: https://kellerjordan.github.io/posts/muon/ (Accessed April 24, 2026).
  14. Krantz, S. G., and Parks, H. R. (2013). The Implicit Function Theorem. Available at: https://link.springer.com/book/10.1007/978-1-4614-5981-1 (Accessed October 2, 2025).
  15. Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images.
  16. Kwasnicki, W. (2013). Logistic growth of the global economy and competitiveness of nations. Technol. Forecast. Soc. Change 80, 50–76. [CrossRef]
  17. Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324. [CrossRef]
  18. Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. [CrossRef]
  19. Marciniak-Czochra, A. (2003). Logistic equations in tumour growth modelling. Int. J. Appl. Math. Comput. Sci. 13, 317–325.
  20. Marsden, J. E., and Weinstein, A. (1981). Calculus Unlimited. Benjamin/Cummings Publishing Company.
  21. Peli, E. (1990). Contrast in complex images. J. Opt. Soc. Am. A 7, 2032–2040. [CrossRef]
  22. Prasetyo, R. B., Kuswanto, H., Iriawan, N., and Ulama, B. S. S. (2020). Binomial Regression Models with a Flexible Generalized Logit Link Function. Symmetry 12, 221. [CrossRef]
  23. Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707. [CrossRef]
  24. Ramos, R. A. (2013). Logistic function as a forecasting model: It‟ s application to business and economics. Int. J. Eng. 2, 2305–8269.
  25. Richards, F. J. (1959). A Flexible Growth Function for Empirical Use. J. Exp. Bot. 10, 290–300.
  26. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature 323, 533–536. [CrossRef]
  27. Simonyan, K., and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. [CrossRef]
  28. Wesselink, W., Grooten, B., Xiao, Q., Campos, C. de, and Pechenizkiy, M. (2024). Nerva: a Truly Sparse Implementation of Neural Networks. [CrossRef]
  29. Wu, K., Darcet, D., Wang, Q., and Sornette, D. (2020). Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world. Nonlinear Dyn. 101, 1561–1581. [CrossRef]
  30. Zhang, Y., Zhao, J., Wu, W., Muscoloni, A., and Cannistraci, C. V. (2024). EPITOPOLOGICAL LEARNING AND CANNISTRACI- HEBB NETWORK SHAPE INTELLIGENCE BRAIN- INSPIRED THEORY FOR ULTRA-SPARSE ADVANTAGE IN DEEP LEARNING. ICLR.
Figure 1. CMG curve. Here shows how inflection parameter μ and deviate inflection point I affect the shape of CMG curve. When μ = 0 , CMG is a step function and the discontinuity happens at inflection point; when 0 < μ < 0.5 it is generalized logistic function, lower μ results in a steeper curve; when μ = 0.5 , CMG becomes a linear function; when 0.5 < μ < 1 , it is a generalized logit function (inverse logistic function) and higher the μ is, more gradual the curve becomes; when μ = 1 , function values remain constant at inflection value. I determines the asymmetry of the curve, specifically at what proportion of x range the deviate inflection point occurs. In this figure, other parameters are set as: x m i n = y L = 0 , x m a x = y R = 1 .
Figure 1. CMG curve. Here shows how inflection parameter μ and deviate inflection point I affect the shape of CMG curve. When μ = 0 , CMG is a step function and the discontinuity happens at inflection point; when 0 < μ < 0.5 it is generalized logistic function, lower μ results in a steeper curve; when μ = 0.5 , CMG becomes a linear function; when 0.5 < μ < 1 , it is a generalized logit function (inverse logistic function) and higher the μ is, more gradual the curve becomes; when μ = 1 , function values remain constant at inflection value. I determines the asymmetry of the curve, specifically at what proportion of x range the deviate inflection point occurs. In this figure, other parameters are set as: x m i n = y L = 0 , x m a x = y R = 1 .
Preprints 211168 g001
Figure 2. CMG as input feature modulator for MLP on SGD. (A) Illustration of MLP where the features at input layer are modulated by CMG (B) Comparison of different schedulers’ learning rate decay (C) Evaluation results of different schedulers. In the legend, each trial’s accuracy (ACC) and area across the epochs (AAE) are shown. The accuracy of each trial is also shown beside the curve, and this is the same for D and E (D) Test accuracy curves of MLP modulated by CMG, linear (best-performance counterpart learnable function), and vanilla MLP on CIFAR10 (E) Test accuracy curves of MLP modulated by CMG, linear (best-performance counterpart learnable function), and vanilla MLP on CIFAR100.
Figure 2. CMG as input feature modulator for MLP on SGD. (A) Illustration of MLP where the features at input layer are modulated by CMG (B) Comparison of different schedulers’ learning rate decay (C) Evaluation results of different schedulers. In the legend, each trial’s accuracy (ACC) and area across the epochs (AAE) are shown. The accuracy of each trial is also shown beside the curve, and this is the same for D and E (D) Test accuracy curves of MLP modulated by CMG, linear (best-performance counterpart learnable function), and vanilla MLP on CIFAR10 (E) Test accuracy curves of MLP modulated by CMG, linear (best-performance counterpart learnable function), and vanilla MLP on CIFAR100.
Preprints 211168 g002
Figure 3. Test accuracy of vanilla MLP, linear and CMG on different optimizers. (A) Accuracy results of different methods on CIFAR10 (B) Accuracy results of different methods on CIFAR100.
Figure 3. Test accuracy of vanilla MLP, linear and CMG on different optimizers. (A) Accuracy results of different methods on CIFAR10 (B) Accuracy results of different methods on CIFAR100.
Preprints 211168 g003
Figure 4. Comparison between images before and after CMG modulation. (Leftmost) The original image in CIFAR10 test set (Middle) Image after CMG modulation (Right) Bar plot comparing the RMS contrast between original image and CMG-modulated image.
Figure 4. Comparison between images before and after CMG modulation. (Leftmost) The original image in CIFAR10 test set (Middle) Image after CMG modulation (Right) Bar plot comparing the RMS contrast between original image and CMG-modulated image.
Preprints 211168 g004
Figure 5. CMG transformation improves EM image neuron segmentation algorithm. (A): The working pipeline of neuron segmentation experiment. Firstly, the 3D brain EM image volume is sent into a 3D U-Net(Çiçek et al., 2016) which predicts the affinity values (black arrows between smaller cubes) between neighboring voxels in the volume. Affinity values in the original affinity graph (middle left) are then transformed by CMG function for enhancement and the new values are indicated by blue arrows in the transformed affinity graph (middle right). Finally, the transformed affinity graphs are segmented using the pipeline proposed by (Funke et al., 2019) to produce the predicted neuron segments (rightmost). (B): CREMI score (the lower the better) evaluation on CREMI training set B. Each line represents one inflection point (I) and x-axis shows the μ values. SOTA (black dot) and best CMG-enhanced (μ=0.4 and I = 0.3) are highlighted with black and green arrows respectively. (C): Bar plot comparing the CREMI score (the lower the better) between the best result obtained using CMG-enhanced (μ=0.4 and I = 0.3) and SOTA. (D): Segmentation quality comparison between CMG-enhanced, CREMI benchmark and SOTA.
Figure 5. CMG transformation improves EM image neuron segmentation algorithm. (A): The working pipeline of neuron segmentation experiment. Firstly, the 3D brain EM image volume is sent into a 3D U-Net(Çiçek et al., 2016) which predicts the affinity values (black arrows between smaller cubes) between neighboring voxels in the volume. Affinity values in the original affinity graph (middle left) are then transformed by CMG function for enhancement and the new values are indicated by blue arrows in the transformed affinity graph (middle right). Finally, the transformed affinity graphs are segmented using the pipeline proposed by (Funke et al., 2019) to produce the predicted neuron segments (rightmost). (B): CREMI score (the lower the better) evaluation on CREMI training set B. Each line represents one inflection point (I) and x-axis shows the μ values. SOTA (black dot) and best CMG-enhanced (μ=0.4 and I = 0.3) are highlighted with black and green arrows respectively. (C): Bar plot comparing the CREMI score (the lower the better) between the best result obtained using CMG-enhanced (μ=0.4 and I = 0.3) and SOTA. (D): Segmentation quality comparison between CMG-enhanced, CREMI benchmark and SOTA.
Preprints 211168 g005
Table 1. Grid-search space for neuron segmentation experiments.
Table 1. Grid-search space for neuron segmentation experiments.
Inflection rates (μ) 0.1 – 0.9 in increments of 0.1
Deviate inflection points (I) 0.1 – 0.9 in increments of 0.1
Merge functions (MF) {median_aff_histograms,
85_aff_histograms,
median_aff,
85_aff,
max_10}
Thresholds 0.1 – 0.9 in increments of 0.1
Table 2. Configuration of MLP training.
Table 2. Configuration of MLP training.
Optimizer AdamW, SGD, Muon
Batch sizes {32, 64, 128}
Initial learning rates AdamW: {0.003, 0.001, 0.0003}
SGD, Muon: {0.025, 0.01, 0.001}
Dropout rate 0.3
Learning rate scheduler Supra-linear scheduler
Seeds {0, 1, 2}
MLP hidden layer size CIFAR10 (512),1024
CIFAR100 (1024),2048)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated