Fine-Tuning LLMs for Real-Time Fuzzy Insulin Control in Type I Diabetes

Jordan Kralev

doi:10.20944/preprints202604.1316.v1

Submitted:

17 April 2026

Posted:

20 April 2026

You are already at the latest version

Abstract

The paper propose a novel fine-tuning framework that adapts a large language model (LLM) to real-time fuzzy insulin control for type 1 diabetes mellitus (T1DM). The method combines a Hovorka-based glucose-insulin model, fuzzy membership encoding of glucose and insulin states, and a causal language model trained through low-rank adaptation to map recent physiological history to insulin dosing decisions. The central idea is to represent glucose and insulin variables through linguistic fuzzy terms, such as hypoglycemia, target range, hyperglycemia, and zero, low, or high insulin dose, and to embed these terms directly into the language model’s token space. This enables the model to act as a sequence-aware fuzzy controller while preserving interpretability through membership functions and defuzzification of the output logits into a crisp insulin dose. The proposed controller is trained in closed loop using virtual patients generated from the Hovorka model, with the objective of minimizing glucose deviation from a clinically relevant target. Additional validation in the UVa/Padova simulator demonstrates that the learned policy transfers to a standardized benchmark environment and achieves strong time-in-range performance with low hypoglycemic risk. The study shows that a fine-tuned language model can be repurposed as a real-time biomedical decision-support component when its inputs and outputs are structured through fuzzy logic. This hybrid framework offers a promising direction for interpretable and adaptive artificial pancreas control, combining physiological modeling, linguistic reasoning, and modern language-model adaptation in a unified closed-loop system.

Keywords:

fine-tuning

;

fuzzy insulin control

;

type 1 diabetes

;

large language models

;

artificial pancreas

Subject:

Medicine and Pharmacology - Endocrinology and Metabolism

1. Introduction

Type 1 diabetes (T1D) is a chronic autoimmune disease, characterized by the T-cell-mediated destruction of insulin-producing

β

cells in pancreatic islets, that results in insulin deficiency [1,2,3]. Therefore, insulin substitution therapy is required for T1D subjects by subcutaneously or intravenously administration of insulin substitutes as short acting Lispro [4]. Healthy blood glucose range is between 90 and 180 mg/dL, with optimal target of 105 mg/dL. If the glucose is above 300 mg/dL a hyperglycemic episode is onset which when extended or frequently repeated can lead to diabetic keto-acidosis or hyperosmolar hyperglycemic state. If the glucose is below 70 mg/dL a hypoglycemic episode is onset, which if not treated by carbohydrate ingestion can lead to unconsciousness or seizures. Chronic consequence of unregulated glucose can be retinopathy, nephropathy, and neuropathy, ardiovascular disease, cerebrovascular disease, and peripheral vascular disease. In T1D the glucagon secretion from insulin activated pancreatic alpha cells is also impacted [5,6]. This additional metabolic disturbance manifests as delayed or absent onset of glucagon secretion during hypoglycemia, or as counter-regulatory glucagon secretion during onset of hyperglycemic period.

With advances of portable, battery powered wearable subcutaneous glucose sensors [7,8] and insulin pumps [9], many automatic insulin delivery systems, called artificial pancreas (AP), are under development - OpenAPS [10], MiniMed [11], Nightscout [12], INCA [13]. While intravenous insulin delivery would guarantee minimal glucose deviations [14], the subcutaneous pathway is more inert [15] and posing significant control challenge [16,17,18]. A key requirement for AP system is aperiodic behaviour because once administered, insulin cannot be removed. That’s why, a hyperglycemic peak is usually followed by a hypoglycemic period due to over regulation. The American Food and Drug Administration (FDA) accepted the UVa/Padova large-scale metabolic simulator [5] for T1D treatment as a substitute for pre-clinical animal model studies, where novel control algorithms can be benchmarked in standardized way. The simulator offers a population of ten adult, adolescent and children virtual patient models, as well as, capability to design meal administration scenarios. Once a T1D controller is submitted to FDA for approval, an internal testing with more than hundred virtual patient population is carried out.

There are many control approaches for AP system with announced or unannounced meals [19,20,21]. Authors in [22] are using temporal cost function with discount factors reflecting inter-subject variability. In [23], authors propose event-triggered model predictive controller, also zone oriented model predictive control can be seen in [24]. We can find also multi-model PID control tuned with a genetic algorithm [25] with fuzzy gain scheduling strategy. Machine learning (ML) in AP systems [26] could detect anomalous patterns. Robust

μ

-synthesis technique is applied in [27], also

H_{\infty}

control by [28,29] or multi-objective

H_{2} / H_{\infty}

design by [30].

Fuzzy logic methods in control theory have long history of success [31]. The design of a fuzzy controller aims to incorporate expert linguistic knowledge for system behaviour, which is defined as logical expressions between propositions with continuous degree of validity between true and false alternatives. The fuzzy inference systems are built after establishment of membership functions for input and output variables, fuzzy rule declarations and defuzzifying algorithm [32,33]. Numerous attempts for control of blood glucose with fuzzy non-linear controllers can be found varying from very basic approaches [34] to modifications like introduction of personalization factor in [35], and more advanced type-2 fuzzy control with the aim to manage the uncertainty from inter-subject variability [36].

The connection between fuzzy set theory and natural language processing (NLP) was surveyed by [37]. Classically, common NLP tasks are text classification, question answering, translation, summarization, text generation, fill-mask, etc. [38]. The [39,40] examine fuzzy theoretic interpretation of similarity score between word embedding vectors for fuzzy control, also [41] use fuzzy scores for top-k selection during retrieval augmented translation (RAT). Fuzzy reasoning can be applied as a mechanism for sentient classification by [42]. Fuzzy decision making is employed for test-to-sql generator models [43]. However, fuzzy logic alone is not able to model the full complexity of the natural language or even to pretend to be reasonable model to capture semantics without further development as [44] shows.

There are two groups of methods in NLP. First group analyse morphology, syntax and semantics using conventional pattern recognition and formal grammar techniques leading to knowledge representation as graphs. The second group of methods is solving language modelling problem by collecting large corpora of texts with sole aim to train a large scale machine learning models. The transformer-based models [45] have significant impact in the natural language processing systems enabling improvements over preceding encoder/decoder recurrent networks used for language to language translation. The transformer is performing block correlation analysis over window of input tokens without reliance on any recurrent connections. This improves model training performance for next token prediction task and also significantly reduce inference time.

The three levels in the construction of LLMs for practical uses [46] are:

Pre-training on the large corpus or text data from diverse sources;
Fine-tuning of the model to a target application domain with application specific data;
Prompt adaptation providing context basis.

Technological advances of single instruction multiple data (SIMD) processors mainly employed in graphical processing units (GPU), such as increased video memory, reduced execution time, richer instruction set, allow optimization of models with hundred of billion of parameters. Still, a full pre-training of an LLM is slow and costly. The bigger the corpus and bigger the number of parameters more iterations are required. Therefore, a majority of LLM applications operate on the level of prompt adaptation where various techniques are possible - experiments with prompt structure or with system expression, optimization of a fixed prompt prefix. A notable technique is retrieval augmented generation where given input query, a similarity search is performed in an external dataset to obtain actual information, to be further fed into a LLM processor for summarization in respect to input query. Since LLM operate as a model of conditional probability distribution over input sequences, the aim in prompt engineering is to "fine-tune" the conditional distribution.

As a middle ground between full pre-training and prompt adaptation, a parametric fine-tuning of LLM over small number of parameters is possible. This techniques stem from the LLM property that when they are applied to specific knowledge domain, their intrinsic dimension is drastically reduced [46,47]. Therefore, the tuning in low dimensional projection of full parameter space can be as efficient as training the model over full parameter space. The fine tuning setup is the same as full scale training setup, after conditioning the model with trainable adapter blocks while fixing the parameters of the original model.

The ability of LLM models to generate fast and correct predictions over very large and diverse data stimulates LLM applications outside the NLP studies. The place of LLM in control theory is reviewed recently by [48] with corpus of 260 references. There, a direct LLM embodiments in a control systems, amongst others, are seen at higher stratification layers. For example to dynamically suggest parameters for a feedback loop controller like PID or LQR, to generate sequence of action trajectories or to explore design space for a system architecture. However, a direct substitution of a classical feedback controller with an LLM for real-time decision making is not advised.

In the present study we aim to fine-tune a small LLM to operate as an actual feedback loop controller for a metabolic process like glucose regulation in T1D. Such execution is possible, because of reasonable decision time requirement for AP systems varying between 1 to 5 minutes, which is enough for a compact LLM with several billion of parameters running on a middle grade GPU. Moreover, multiple requests to the model can be batched in a centralized fashion for processing on a cloud GPU, similar to one medic for multiple patients scenario.

The key problem in using LLM directly in control feedback loop is in the unmatched signals domains. While physical systems are converting quantities, LLMs are transforming high-dimensional representations of input token stream. The representation of physical quantities in the LLM input space can be naively approached by direct substitution of a decimal string of digits in the LLM prompt. However, LLM are reduced in performance when doing pure mathematical evaluations, especially, smaller models below 10 billion of parameters. The LLM treat such decimal string representation as a general character sequence and performing correlation arithmetic over it instead of arithmetic with quantities.

In this study, we propose that token meaning in the LLM embedding space is dominated by vector direction and a given physical quantity can be represented as a modulation of the amplitude of a selected carrier token. Therefore, a perturbation of the model vocabulary embedding vectors is introduced to smoothly vary the degree of membership of physical quantities to a collection of fuzzy sets mapped to this vocabulary. The benefit from such representation is that a single modulated carrier token is used to represent a numerical quantity instead of tokens corresponding to a string of decimals. Furthermore, to improve training convergence, a linear combination of embedding vectors is assigned to a physical quantity by using fuzzy set membership for modulation. Contributions in the preset work can be summarized as:

1.: Using a fine-tuned LLM model to execute real-time decisions for insulin regulation in an artificial pancreas system.
2.: Introduction of a novel technique to seamlessly represent quantified data into LLM embedding space as amplitude modulated carrier tokens.
3.: Closed-loop fine-tuning of fuzzy LLM controller with multiple randomized metabolic models of T1D virtual patients.
4.: Validation of fine-tuned fuzzy LLM controller in UVa/Padova simulator for 10 virtual patients from adult population.

The organisation of the paper is as follows. Section 2 review briefly the Hovorka T1D model, fuzzy control theory applied for blood glucose regulation and the architecture of TinyLlama LLM used for experiments. The Section 3 presents fuzzy embedding approach for LLM, gives the equations of the closed loop system and training objective. Then in Section 4 brief presentation of fine tuning setup with code snippets is given. The Section 5 summarize the results from fine-tuning and UVa/Padova simulation.

2. Preliminaries

2.1. Hovorka Model

There are several widely used metabolic models in the field of AP systems. In this article we use Hovorka [49] model as most commonly adapted in T1D studies for insulin control predictions with a short-term acting Lispro replacement therapy. The Hovorka model describe glucose-insulin interaction with a two-compartment pharmacokinetic model incorporating gut absorption dynamics obtained from the intake of carbohydrates, subcutaneous insulin absorption dynamics, insulin interaction with the plasma glucose, and rate of endogenous glucose production. The two compartments in the model are the subcutaneous fluid and the blood plasma.

The state-space equations of the Hovorka model include glucose metabolism model

\begin{matrix} {\dot{Q}}_{1} (t) = E G P_{0} (1.0 - x_{3} (t)) + U_{G} - F_{R} - (x_{1} (t) + \frac{F_{01}^{c}}{V_{G} G (t)}) Q_{1} (t) + k_{12} * Q_{2} (t) \\ {\dot{Q}}_{2} (t) = x_{1} (t) Q_{1} (t) - (k_{12} + x_{2} (t)) Q_{2} (t) \end{matrix},

(1)

where

Q_{1}

and

Q_{2}

measured in mmol are glucose amounts in the accessible and non-accessible compartments. Accessible compartment is subcutaneous fluid where measurements can be taken with portable glucose sensor, while non-accessible compartment is blood plasma where blood sample is required in order to perform a measurement. The kinetic rate

k_{12}

characterizes the transfer between both compartments. The constant

E G P_{0}

represents endogenous glucose production, which is present even without carbohydrate ingestion and extrapolated to zero insulin. Therefore

E G P = E G P_{0} (1.0 - x_{3} (t))

is insulin dependent glucose production where one of insulin actions

x_{3} (t)

is to dampen the glucose release from tissues. The measurable glucose concentration G in mmol/liter from the sensor is calculated by

G (t) = \frac{Q_{1} (t)}{V_{G}},

(2)

where

V_{G}

is glucose distribution volume in accessible compartment. Glucose readings are either represented in mmol/l or in mg/dl, after applying a conversion factor of 18 mg/dl for every mmol/l.

The

U_{G}

term represents ingested carbohydrates which are absorbed in the gut with some versions of the model detailing the gut absorption model further if that is required.

U_{G} (t) = \sum_{m = 1}^{N_{m}} \frac{D_{G, m} A_{G} τ_{m} e^{- τ_{m} / t_{m a x, G}}}{t_{m a x, G}^{2}},

(3)

where

D_{G, m}

is the ingested carbohydrate amount with meal

m \in 1 \dots N_{m}

in mmol,

A_{G}

is carbohydrate absorption ratio (bioavailablility),

t_{m a x, G}

is time to peak of carbohydrate concentration in tissues and

τ_{m}

is relative meal time defined as

τ_{m} = t - t_{m}, t > t_{m} .

(4)

The presented glucose metabolism model includes also renal excretion component for extreme hyperglycaemia as

F_{R} (t) = 0.003 (G (t) - 9) V_{G}

(5)

which is non-zero for

G \geq 9

mmol/l. Also we have total non-insulin dependent glucose flux (for example in CNS) corrected for ambient glucose concentration as

F_{01}^{c} (t) = \{\begin{matrix} F_{01}, & G \geq 4.5 \\ F_{01} G (t) / 4.5 \end{matrix}

(6)

Insulin absorption subsystem is modelled again as two compartment kinetic system

\begin{matrix} {\dot{S}}_{1} (t) = u (t) - \frac{S_{1} (t)}{t_{m a x, I}} \\ {\dot{S}}_{2} (t) = \frac{S_{1} (t)}{t_{m a x, I}} - \frac{S_{2} (t)}{t_{m a x, I}} \end{matrix}

(7)

where

S_{1}

and

S_{2}

are a two-compartment subcutaneous volumes absorbing short acting insulin in mU, and

u (t)

represents administration of insulin in mU/min,

t_{m a x, I}

in min is the time-to-maximum insulin absorption. The plasma insulin concentration

I (t)

in mU/L is obtained as

\dot{I} (t) = \frac{S_{2} (t)}{t_{m a x, I} V_{I}} - k_{e} I (t),

(8)

where

V_{I}

is insulin distribution volume and

k_{e}

is insulin elimination rate.

The insulin action subsystem is modelled as

{\dot{x}}_{i} (t) = k_{a, i} x_{i} (t) + k_{b, i} I (t), i = 1, 2, 3

(9)

where

x_{1}

,

x_{2}

and

x_{3}

represent the effects of insulin on glucose distribution/transport, glucose disposal and endogenous glucose production as evident from the above equations,

k_{a, i}

are deactivation rate constants, and

k_{b, i}

are activation rate constants

\begin{matrix} k_{b, 1} = S_{I}^{T} k_{a, 1} \\ k_{b, 2} = S_{I}^{D} k_{a, 2} \\ k_{b, 3} = S_{I}^{E} k_{a, 3} \end{matrix}

(10)

where

S_{I}^{T}

is insulin sensitivity of glucose transport from interstitial to plasma,

S_{I}^{D}

is insulin sensitivity of glucose elimination from plasma into tissues and

S_{I}^{E}

is insulin sensitivity of endogenous glucose production.

Original paper of Hovorka [49] divides the model parameters into tunable and constant, because [50] proves some of the parameters to be likely unidentifiable from data. On the other hand, tunable model parameters aim to retain ability to represent the wide range of glucose variations observed with type 1 diabetes. Tunable model parameters are

S_{I}^{T}

,

S_{I}^{D}

,

S_{I}^{E}

,

E G P_{0}

,

F_{01}

and

t_{m a x, I}

. The numerical values of the model parameters are summarized in Table 1.

2.2. Fuzzy Logic Control Basics

Because this paper is offering a new way of implementing a fuzzy inference controller by embedding linguistic variables in the LLM we look briefly over principles behind a Mamdani-type fuzzy controller. The Mamdani-type controller is known as universal fuzzy controller as proved by [51,52]. The control action is obtained as state feedback

u (t) = g (x (t)) = \sum_{l = 1}^{m} g_{l} (x (t)) μ_{l} (x (t))

(11)

where the non-linear function

g (•)

over n dimensional state space

R^{n}

is defined though application of m fuzzy inference rules

R^{l} : IF x_{1} is F_{1}^{l} AND \dots x_{n} is F_{n}^{l} THEN u (t) = g_{l} (x (t)), l = 1 \dots m

(12)

S_{l} = \prod_{i = 1}^{n} F_{i}^{l}

defines the lth fuzzy set,

g_{l}

is the lth local control law,

μ_{l} (•)

is normalized membership function for the inferred fuzzy set

S_{l}

. For example

μ_{l} (x) = min_{i} {μ_{i}^{l} (x_{1}) | i = 1 \dots n}

(13)

where

μ_{i}^{l} (•)

is the membership functions describing the fuzzy set

F_{i}^{l}

. In such inference the control signal

u (t)

is obtained as fuzzy variable with different membership values to m output sets as (Section 2.2) shows.

Equivalently for a finite dimensional system, the state of the system can be reconstructed by looking at n past values of output and input so we can have

x (t) ≅ {(G (t), G (t - T_{s}), \dots, G (t - n T_{s}), u (t), u (t - T_{s}), \dots, u (t - n T_{s}))}^{T}

(14)

For examination of the insulin delivery system we define 3 fuzzy sets for the glucose level corresponding to hypoglycemic (90-105 mg/dl), target (90-180 mg/dl) and hyperglycemic (150-300 mg/dl) regions (Figure 1) with

\begin{matrix} G_{h y p o} : μ_{h y p o} (G) = max (min (- \frac{18 G - 105}{15}, 1), 0) \\ G_{t a r g e t} : μ_{t a r g e t} (G) = max (min (min (\frac{18 G - 90}{15}, \frac{180 - 18 G}{75}), 1), 0) \\ G_{h y p e r} : μ_{h y p e r} (G) = max (min (\frac{18 G - 150}{150}, 1), 0) \end{matrix}

(15)

The insulin dose fuzzy sets describe the possible minute infusion per kg of body weight as zero,low and high dose (Figure 2) with

\begin{matrix} U_{z} : μ_{z} (u) = max (min (- \frac{u}{0.01}, 1), 0) \\ U_{l} : μ_{l} (u) = max (min (min (\frac{u}{0.01}, \frac{0.1 - u}{0.99}), 1), 0) \\ U_{h} : μ_{h} (u) = max (min (\frac{u - 0.1}{1.4}, 1), 0) \end{matrix}

(16)

Therefore a simplified version of insulin controller would be defined as

\begin{matrix} R^{1} : IF G (t) is G_{h y p o} THEN u (t) is U_{z} \\ R^{2} : IF G (t) is G_{t a r g e t} THEN u (t) is U_{l} \\ R^{3} : IF G (t) is G_{h y p e r} THEN u (t) is U_{h} \end{matrix}

(17)

Then the control signal fuzzy set

U_{o u t} = \sum_{i} U_{i}

defined with

U_{o u t} : μ_{o u t} (u) = μ_{z} (u) μ_{h y p o} (G (t)) + μ_{l} (u) μ_{t a r g e t} (G (t)) + μ_{h} (u) μ_{h y p e r} (G (t))

(18)

A centroid defuzzification procedure is applied to obtain the crisp value of control signal

u (t) = B W \frac{\int_{0}^{u_{m a x}} u μ_{o u t} (u) d u}{\int_{0}^{u_{m a x}} μ_{o u t} (u) d u}

(19)

or after discretization with step h such that

u_{i} \in {0, h, 2 h, 3 h, \dots, u_{m a x}}

u (t) = B W \frac{\sum_{i} u_{i} μ_{o u t} (u_{i})}{\sum_{i} μ_{o u t} (u_{i})}

(20)

2.3. Large Language Models

A paper from [45] defines the transformers architecture as optimization over preceding encoder/decoder recurrent networks used for language to language translation. The transformer is performing block correlation analysis over window of input tokens without reliance on any recurrent connections, which greatly improves model training performance for next token prediction task. Here we focus on few architecture decisions with reference to Llama group of LLM which are described as decoder only models.

The input to the language processing is a token sequence

w_{t o k} = (w_{1}, w_{2}, \dots, w_{T})

. Depending on language model architecture the tokens can be whole words, workpieces or even sentences formed in natural language. Following tokenization is mapping of each token

w_{i}

into an embedding vector

e_{i}

by the embedding layer of the LLM without positional information. For TinyLlama model the embedding dimension is

d = 2048

and token dictionary size is

N_{t o k} = 32000

. Resultant token sequence after embedding is

e = (e_{1}, e_{2}, \dots, e_{T}) \in R^{d \times T} .

(21)

Such high dimensional representation is a key to distinguishing meaning of various tokens and to allow freedom in internal transformations by the model. A general observation is that tokens with more distinct or opposite meanings map to embedding vectors with larger cosine distance while the tokens with similar meaning cluster together in the

R^{d}

.

At the core of LLM performance is self-attention mechanism which works as analogy to database query over index key. After producing query, key and value mappings of each input sequence token

\begin{matrix} q_{m} = f_{q} (e_{m}, m) \\ k_{n} = f_{k} (e_{n}, n) \\ v_{n} = f_{v} (e_{n}, n) \end{matrix},

(22)

the output of the l-th hidden model layer in the transformer LLM is a function over its input sequence

h_{l} (e) = (h_{l, 1}, h_{l, 2}, \dots, h_{l, T})

(23)

where the components

h_{l, m}

is computed as a weighted sum of the values

v_{n}

h_{l, m} (e) = \sum_{n = 1}^{T} a_{m, n} v_{n},

(24)

and the weight

a_{m, n}

of each value is reflecting the matching degree between query

q_{m}

and key

k_{n}

pairs, which is expressed as a scalar product

〈 q_{m}, k_{n} 〉

a_{m, n} = f_{s o f t m a x} (〈 q_{m}, k_{1} 〉, \dots, 〈 q_{m}, k_{T} 〉) = \frac{e^{〈 q_{m}, k_{n} 〉}}{\sum_{i = 1}^{T} e^{〈 q_{m}, k_{i} 〉}} .

(25)

The scalar product between query and key sequences is how the information transfer between tokens at different positions happens in LLM. As can be seen in (22) the maps

f_{m}

,

f_{k}

and

f_{v}

require to incorporate position information of the m-th and n-th tokens in the sequence such that attention scores are increasing for the key values

k_{n}

, which are situated closer to a m-th query position

q_{m}

. Therefore the scalar product in attention scores calculation must encode explicitly the position information

〈 q_{m}, k_{n} 〉 = g (q_{m}, k_{n}, m - n)

(26)

A distinguishable characteristic of Llama models [53] is application of rotation matrix transformation

R_{Θ, m}^{d}

over the embedding vectors to reflect positional information [54] as

{\tilde{e}}_{m} = R_{Θ, m}^{d} e_{m},

(27)

where the d-dimensional vector space is decomposed into direct sum of

d / 2

two dimensional spaces and two dimensional rotation is applied in each of the subspaces

R_{Θ, m}^{d} = d i a g ((\begin{matrix} cos m θ_{1} & - sin m θ_{1} \\ sin m θ_{1} & cos m θ_{1} \end{matrix}), \dots, (\begin{matrix} cos m θ_{d / 2} & - sin m θ_{d / 2} \\ sin m θ_{d / 2} & cos m θ_{d / 2} \end{matrix})),

(28)

where

Θ = {θ_{i} = 10000^{- 2 (i - 1) / d}, i = 1 \dots d / 2}

. Therefore the input embedding vector components are rotated to angles proportional to their position in the sequence and to their dimension. The transformation functions of a hidden layer become

f_{q} (e_{m}, m) = R_{Θ, m}^{d} W_{q}, f_{k} (e_{n}, n) = R_{Θ, n}^{d} W_{k}, f_{v} (e_{n}, n) = R_{Θ, n}^{d} W_{v}

(29)

and the scalar product function is

〈 q_{m}, k_{n} 〉 = \frac{{(R_{Θ, m}^{d} W_{q} e_{m})}^{T} (R_{Θ, n}^{d} W_{k} e_{n})}{\sqrt{d}} = \frac{e_{m}^{T} W_{q}^{T} R_{Θ, m - n}^{d} W_{k} e_{n}}{\sqrt{d}}

(30)

with guaranteed relative position sensitivity. In practical implementation of Llama models where multiple hidden layers

h_{l, m}

are stacked together, the rotational encoding is applied before the first hidden layer, and is followed by a normalization layer [55], which is related to training performance. Another feature of LLM is multi-head attention, where the input embedding space is decomposed into few lower dimensional components and attention weights are applied to each component separately, followed by concatenation of the results.

In our application we look into causal language modelling framework, where every processed token from the input window is allowed to attend only to the preceding ones through application of causal mask over attention weights. The mask is upper triangular matrix with entries

u_{i, j} = \{\begin{matrix} 0, & i \geq j \\ - \infty, & i < j \end{matrix}

(31)

The final layer of the transformer is a projection layer

L_{t o k} : R^{d} \to R^{N_{t o k}}

mapping dimensions of embedding space into token scores. After the composition of

L_{t o k}

with the transformation in embedding space performed by each hidden layer

h_{l} : R^{d} \to R^{d}

over positionally encoded input sequence

\tilde{e}

we have

y = (L_{t o k} \circ h_{n_{l}} \circ h_{n_{l} - 1} \circ \dots \circ h_{1}) (\tilde{e}),

(32)

where

y = (y_{1}, \dots, y_{T}) \in R^{N_{t o k} \times T}

(33)

is an output sequence of scores over model vocabulary. Each output vector

y_{i} \in R^{N_{t o k}}

from the output sequence y of the LLM is containing class scores

y_{i, j}

over model vocabulary of

N_{t o k}

tokens, which allows classification for example by

y_{i, t o k} = {argmax}_{j} (y_{i, j}) \in [1, N_{t o k}],

(34)

which returns the index of the token where output scores are maximized.

The LLM are trained over large dataset of plain text, programming code and conversational data by looking to maximize probability

p (w_{T + 1} | w_{1 : T})

for correct prediction of next token in a sequence based on the window of previous tokens, which corresponds to maximizing the scores for

y_{T, w_{T + 1}}

. In practice this is performed by calculating cross entropy loss function

l (y_{i}, w_{i + 1}) = - log \frac{e^{y_{i, w_{i + 1}}}}{\sum_{j = 1}^{N_{t o k}} e^{y_{i, j}}}

(35)

between output scores

y_{i}

and target token index

w_{i + 1}

obtained from one step ahead shifted version of the input sequence

w_{t o k}

.

In generation mode, the LLM decoder operates in an autoregressive manner. After initialized with a starting sequence

w_{t o k}

, at each generation step the predicted token

{\hat{w}}_{T + 1} = y_{T, t o k}

is appended to the input sequence. The process is repeated either for predefined maximal number or steps or till a special end of string token is produced. Therefore, for the following analysis we primary look into last embedding from the sequence

y_{T} = y_{- 1} \in R^{d}

which can be expressed as

y_{- 1} = F_{L L M} (e), F_{L L M} : R^{d \times T} \to R^{d}

(36)

3. Proposed LLM-Fuzzy Framework

The key problem in using LLM directly in control feedback loop are unmatched signals domains. While control plant is operating as a functional mapping between signal spaces

G = F_{H o v} (u), u : R \to R, G : R \to R,

(37)

the LLM is operating as a functional mapping over a high-dimensional space like

R^{d \times T}

of word embedding sequences with length T as (36). We are looking for natural connection between physical signals and high dimensional LLM-specific embeddings.

A trivial approach would be to convert numerical signal level to a string of model recognized tokens, which for

G (t)

would look like

w_{G} (t) = (w_{d_{1}}, w_{d_{2}}, w_{d_{3}}, \dots, w_{d_{p}}, w_{d_{- 1}}, w_{d_{- 2}}, \dots, w_{d_{- q}}),

(38)

where

G (t) \approx \bar{d_{1} d_{2} \dots d_{p} . d_{- 1} d_{- 2} \dots d_{- q}}

(39)

is the decimal representations of signal level

G (t)

using p digits for the integer part and q digits for the fractional part, and

w_{d_{i}}

is the LLM token corresponding to a digit

d_{i}

. The token sequence

w_{G} (t)

after converted to a sequence of embedding vectors

e_{G, i} (t)

, can be fed into the LLM model prompt template. However, a known limitations exists in LLM to perform precise mathematical calculation. The processing of a tokenized decimal representation

w_{G} (t)

through the model will treat the represented quantity

G (t)

no different than other general character sequence and will calculate attention scores

a_{m, n}

to perform a weighted average of value sequence

v_{n}

. This is not a conventional algebraic processing of

G (t)

but instead a correlation analysis on string of digits. So if we want the model to calculate the control action

u (t) = \bar{d_{u, 1} \dots d_{u, p} . d_{u, - 1} \dots d_{u, - q}}

(40)

will be generated digit by digit in probabilistic manner by sampling a distribution

p (d_{u, i} | d_{u, i - 1} \dots, d_{1}, \dots, d_{p}, d_{- 1}, \dots, d_{- q}) .

(41)

Alternatively, in this study, we propose that token meaning is primary encoded as embedding vector direction obtained as

{\hat{e}}_{i} = e_{i} / ∥ e_{i} ∥,

(42)

where

∥ e_{i} ∥

is the 2-norm of the embedding vector. Therefore, embedding vectors

k_{1} {\hat{e}}_{i}

and

k_{2} {\hat{e}}_{i}

, which point in the same direction, but have different lengths, still identify same token from the input vocabulary in maximum likelihood sense. Or formally, for every embedding vector

e_{i} \in R^{d}

, there exists a positive

ϵ (e_{i}) > 0

such that for all

δ < ϵ (e_{i})

min_{j} ∥ (1 - δ) e_{i} - e_{j} ∥, j \in 0 \dots N_{t o k}

(43)

is obtained for

i = j

, i.e. no other token embedding vector is closer to the rescaled vector

(1 - ϵ) e_{i}

. Therefore, in such framework, the instantaneous amplitude of an observable signal

G (t)

can be encoded through proportionally scaling a selected token from vocabulary

e_{i}

by

e_{G} (t) = (1 - \frac{G_{m a x} - G (t)}{G_{m a x} - G_{m i n}} ϵ (e_{i})) e_{i} .

(44)

As can be seen, this representation requires a single carrier token

e_{i}

to represent the instantaneous level of a signal, compared to a decimal-based token representation (38). Obviously, such encoding will create a perturbation

ϵ (e_{i})

in the pre-learned embeddings, so a fine-tuning of the modified model will be required.

To generalize further, let

e_{i_{1}}, e_{i_{2}}, \dots, e_{i_{n}} \in R^{d}

are selected embedding vectors. Then, there exists corresponding perturbation bounds

ϵ (e_{i_{k}}) > 0

such that for any

δ_{k} < ϵ (e_{i_{k}})

a linear combination between vectors with

1 - δ_{k}

when compared to the embedding vectors from the learned vocabulary with

min_{j} ∥\sum_{k = 1}^{n} (1 - δ_{k}) e_{i_{k}} - e_{j}∥, j \in 0 \dots N_{t o k}

(45)

obtains a minimum for

j \in {i_{1}, i_{2}, \dots, n}

. Therefore, the meaning of the linear combination of embedding vectors is still closer to any of the vectors in the linear combination, compared to the rest of learned embeddings.

A question remains, how to select a carrier token

w_{c t, G}

corresponding for a given measurable quantity. A natural choice is to have the name (e.g. blood glucose) or physical unit(e.g. mg/dL) of the quantity as a token carrier. The benefit from such a choice is in the conditioning of the pre-learned by the model probability distribution function

p (w_{T + 1} | w_{1 : T}, w_{c t, G})

on an application specific knowledge from the physical domain where the quantity belongs. The final outcome will be increased scores for appropriate output tokens corresponding to the same physical domain. If

w_{c t, u}

is the carrier token of the LLM generated output signal

u (t)

(e.g. insulin dose) then we assume

p (w_{c t, u} | w_{1 : T}, w_{c t, G}) > p (w_{c t, u} | w_{1 : T}),

(46)

meaning that output scores of

w_{c t, u}

will increase if we use a contextual input carrier token. Of course, such correlation is not guaranteed, but can be tracked by initial experiments with the model when selecting appropriate input and output carrier tokens. Higher scores of the output token will accelerate consequent fine tuning of the model by requiring less training steeps in direction of the token.

3.1. Fuzzy Embeddings

The fuzzy logic is a well founded control framework enabling encapsulation of expert knowledge expressed in relative linguistic terms amounting to non-linear controllers. In the proposed Fuzzy-LLM framework the linguistic terms describing fuzzy logic sets allow fine grained selection of multiple meaningful carrier tokens, and also giving a natural rules for scaling the corresponding embedding vectors through calculated membership values, instead to absolute physical values.

The first step is to encode fuzzy linguistic terms

{l_{1}, l_{2}, \dots, l_{n}}

for the input and output system variables into carrier token sequences recognizable from LLM as

w_{t o k, l_{i}} = (w_{1}, w_{2}, \dots, w_{T_{l_{i}}})

. The selected linguistic terms and their carrier encodings for the described fuzzy sets in Section 2.2 are given in Table 2. Note that some of the terms are encoded with a single token, while others with 2,4 or 5 tokens, which depends on model vocabulary. Modification or extension on the model vocabulary is also possible and requires a dedicated fine-tuning process.

The next step in preparation of model input is calculation of the embedding vectors for each token

e_{l_{i}} = (e_{1}, e_{2}, \dots, e_{T_{l_{i}}}) \in R^{d \times T_{l_{i}}}

. However in the proposed approach the sequences of embedding vectors corresponding to each input fuzzy set are scaled with corresponding membership functions to produce

e_{g l u} (G) = μ_{h y p o} (G) e_{h y p o} + μ_{t a r g e t} (G) e_{t a r g e t} + μ_{h y p e r} (G) e_{h y p e r} = E_{c, g l u} {\vec{μ}}_{g} (G),

(47)

where

E_{c, g l u} = (e_{h y p o}, e_{t a r g e t}, e_{h y p e r}) \in R^{d \times 3}

is a matrix of carrier tokens used to capture glucose level and

{\vec{μ}}_{g} = {(μ_{h y p o}, μ_{t a r g e t}, μ_{h y p e r})}^{T}

. The embedding sequence

e_{g l u} (G) \in R^{d \times 5}

is fuzzified representation of the numerical glucose concentration value in model embedding space as weighted sum of embedding vectors corresponding to input terms, where the weights are obtained from fuzzy set membership functions.

Similarly the output insulin dose can also be represented as weighted sum in input embedding space

e_{i n s} (u) = μ_{z} (u) e_{z e r o} + μ_{l} (u) e_{l o w} + μ_{h} (u) e_{h i g h} = E_{c, i n s} {\vec{μ}}_{i} (u) .

(48)

where

E_{c, i n s} = (e_{z e r o}, e_{l o w}, e_{h i g h}) \in R^{d \times 3}

is a matrix of carrier tokens used to capture glucose level and

{\vec{μ}}_{i} = {(μ_{z}, μ_{l}, μ_{h})}^{T}

. However because the embeddings of the insulin dose terms are one dimensional we have

e_{i n s} (u) \in R^{d}

.

3.2. LLM for Fuzzy Inference

Let start by defining input sequence

w_{t o k}

Listing 1. Tiny Llama prompt for fuzzy decision making.

In this prompt we have placeholders <Gi> and <Ui> for the past 10 measured glucose concentrations and past applied insulin dosages and for the final dose <U>, which will be predicted by the model. These placeholders are not actually filled in textual form because proposed fuzzy embedding scheme is not offering reverse textual mapping but works directly in embedding space. Therefore we map the sections from the prompt 1 into embedding space where the placeholders for <Gi> and <Ui> are filled by concatenation with fuzzy embedding of past glucose and insulin dose values.

e_{i n} (\vec{G}, \vec{u}) = (e_{p r e, 1}, e_{g l u, 10} (\vec{G}), e_{p r e, 2}, e_{i n s, 10} (\vec{u}), e_{p o s t}) \in R^{d \times T},

(49)

where

e_{p r e, 1}

are prompt embeddings from beginning of the prompt up to placeholder for <G10>,

\vec{G}

and

\vec{u}

are vectors of past 10 glucose measurements and insulin dosages. Then the sequence

e_{g l u, 10} (\vec{G}) = (e_{g l u} (G (t - 9 T_{s})), e_{g l u} (G (t - 8 T_{s})), \dots, e_{g l u} (G (t))),

(50)

where

T_{S}

is sampling period of AP controller. The

e_{p r e, 2}

are prompt embeddings after placeholder for <G1> up to placeholder for <U10>, then

e_{i n s, 10} (\vec{u}) = (e_{i n s} (u (t - 10 T_{s})), e_{i n s} (u (t - 9 T_{s})), \dots, e_{i n s} (u (t - T_{s}))),

(51)

and

e_{p o s t}

are prompt embeddings after placeholder for <U1> up to placeholder for <U>. The overall prompt can be expressed as composition of fixed and variable part as

e_{i n} = P_{g l u} e_{g l u, 10} + P_{i n s} e_{i n s, 10} + e_{f i x},

(52)

where

P_{g l u}, P_{i n s} : R^{d \times 10} \to R^{d \times T}

are linear projection operators with appropriate tensor representations.

The output of the model after processing the input embedding sequence

e_{i n} (\vec{G}, \vec{u})

y (\vec{G}, \vec{u}) = (L_{t o k} \circ h_{n_{l}} \circ h_{n_{l} - 1} \circ \dots \circ h_{1}) ({\tilde{e}}_{i n} (\vec{G}, \vec{u})) .

(53)

And the embedding vector corresponding to the prediction for token <U> is

y_{- 1} (\vec{G}, \vec{u}) = F_{L L M} (e (\vec{G}, \vec{u})) \in R^{d},

(54)

which is the last column from the sequence matrix

y (\vec{G}, \vec{u})

. The membership values of this token to output fuzzy sets is obtained after applying projection to carrier token space

{\bar{y}}_{- 1} = I_{u} y_{- 1}

(55)

where

I_{u} = (\begin{matrix} 0_{1, N_{z} - 1} & 1 & 0_{1, N_{t o k} - N_{z} - 1} \\ 0_{1, N_{l} - 1} & 1 & 0_{1, N_{t o k} - N_{l} - 1} \\ 0_{1, N_{h} - 1} & 1 & 0_{1, N_{t o k} - N_{h} - 1} \end{matrix}) \in R^{3 \times N_{t o k}},

(56)

where

N_{z} = 5225

,

N_{l} = 4482

and

N_{h} = 1880

are the token indices for the output fuzzy terms according to Table 2. After softmax operation

f_{s o f t m a x}

output scores are converted to normalized values

p_{u} = f_{s o f t m a x} (\bar{y}) = \frac{1}{e^{- y_{- 1, N_{z}}} + e^{- y_{- 1, N_{l}}} + e^{- y_{- 1, N_{h}}}} {(e^{- y_{- 1, N_{z}}}, e^{- y_{- 1, N_{l}}}, e^{- y_{- 1, N_{h}}})}^{T} = {(p_{z}, p_{l}, p_{h})}^{T},

(57)

which can be also interpreted as conditional probability distribution

p (| e_{z}, e_{l}, e_{h})

.The resultant output fuzzy set membership function becomes

U_{o u t} : μ_{u} (w) = μ_{z} (w) p_{z} + μ_{l} (w) p_{l} + μ_{h} (w) p_{h} = p_{u} . {\vec{μ}}_{i} (w),

(58)

which allows the computation of new insulin dose

u (t)

with (19) and (20).

As can be seen the proposed representation scheme for the numerical values of the signals as linear combinations of embedding vectors matches the natural internal representation of the linguistic information in the language model. However, the functional relationship between degree of amplification of input fuzzy embeddings into output class scores is not established and needs to be learned in the model by proper training loop.

3.3. Closed-Loop System with LLM

The components of the closed loop system with the Fuzzy-LLM controller are summarized in Figure 3. As presented the Hovorka metabolic model for type 1 diabetes is calculating a glucose concentration signal

G (t)

as a dynamic function on insulin dose

u (t)

and ingested meal. Let the state of the Hovorka model be given in a state vector

x = {(Q_{1}, Q_{2}, x_{1}, x_{2}, x_{3}, S_{1}, S_{2}, I)}^{T},

(59)

then the dynamics of metabolic model is represented in state space form as

\begin{matrix} \dot{x} (t) = H (t, x (t), u) \\ G (t) = Q_{1} (t) / V_{G} = C_{Q} x (t) . \end{matrix}

(60)

where

C_{Q} = (1 / V_{G}, 0_{1, 7})

.

The full controller expression is obtained by composition of fuzzy embedding operation

e_{g l u} (•)

, tapped delay operator, concatenation with fixed prompt, application of llm function

F_{L L M} (•)

, token selection projection

I_{u}

, softmax operation

f_{s o f t m a x} (•)

, output membership function generation and defuzzifying. This can be expressed in compact form as

u (t) = \frac{\int_{0}^{u_{m a x}} w p_{u} . {\vec{μ}}_{i} (w) d w}{\int_{0}^{u_{m a x}} p_{u} . {\vec{μ}}_{i} (w) d w},

(61)

p_{u} = f_{s o f t m a x} (I_{u} F_{L L M} (P_{g l u} τ_{10} (E_{c, g l u} {\vec{μ}}_{g} (G (t))) + P_{i n s} τ_{10} (E_{c, i n s} {\vec{μ}}_{i} (u (t))) + e_{f i x}, π))

(62)

where

τ_{k} (•)

is tapped delay operator over the embedding vector space

τ_{k} (e (t)) = (e (t - (k - 1) T_{S}), e (t - (k - 1) T_{S}), \dots, e (t - T_{S})),

(63)

and

π = {π_{i} | i = 1 \dots N_{p a r}}

is the vector of tunable parameters LLM model LoRA adapter.

3.4. Training Objective

It is known that optimal BG levels are around

G_{r e f} = 105

mg/dL. Therefore, we define a quadratic cost of population worst case deviation from the target for the current time period t as

J (π) = max_{s_{i} \in S} \frac{1}{2} {(G (t + k T_{S}, π (t), s_{i}) - G_{r e f})}^{2},

(64)

where

s_{i} \in S

is a subject from an examined T1D population characterized in general with specific initial state conditions, specific meal times and meal amounts, specific parameter settings for the glucose metabolic model, and

k = c o n s t

is a fixed extrapolation horizon. If we aim to minimize

J (π)

using a gradient descent algorithm we calculate

π (t) = π (t - T_{S}) - α (t) \frac{\partial J}{\partial π} |_{π (t - T_{S})},

(65)

where the gradient of

J (π)

with respect to each of tunable parameters

π_{i}

is

\frac{\partial J}{\partial π_{i}} = (G_{k} (π_{i}, s^{*} (t)) - G_{r e f}) \frac{\partial G_{k}}{\partial π_{i}}, G_{k} = G (t + k T_{S}),

(66)

where

s^{*} (t) \in S

is the subject from the population, where the maximum quadratic deviation is obtained at the time instant t. Therefore, the subject with worst deviation from the target is driving the fine-tuning of the LLM during the current time instant.

For a fixed

u (t) = u_{0}

the solution of the state equation for k steps forward in time with

T_{S}

sample period can be approximated with Euler formula

x_{k} = x_{k - 1} + T_{S} H (t_{k}, x_{k - 1}, u_{k}),

(67)

where

x_{k} = x (t + k T_{S})

,

x_{k - 1} = x (t + (k - 1) T_{S})

,

t_{k} = t + k T_{S}

,

u_{k} = u (k T_{S})

and

t_{k - 1} = t + (k - 1) T_{S}

. The derivative of extrapolated

G_{k}

with respect to parameter

π_{i}

becomes

\frac{\partial G_{k}}{\partial π_{i}} = C_{Q} \frac{\partial x_{k}}{\partial u_{0}} \frac{\partial u_{0}}{\partial π_{i}},

(68)

exposing sensitivity of extrapolated glucose to current control action and sensitivity of current control action to LLM parameters.

Without loss of generality, taking

t = 0

for current time instant,

t > 0

for future (extrapolated) time instants, and assuming

u_{k} = u_{0}

for

k > 0

, then derivative of extrapolated state for k steps ahead with respect to current input signal

u_{0}

is

\frac{\partial x_{k}}{\partial u} = (1 + T_{S} \frac{\partial H}{\partial x} |_{p_{0}}) \frac{\partial x_{k - 1}}{\partial u} |_{p_{0}} + T_{S} \frac{\partial H}{\partial u} |_{p_{0}},

(69)

where

p_{0} = (t_{k}, x_{k - 1}, u_{0})

. This recursive relation expands to

\frac{\partial x_{k}}{\partial u} |_{p_{0}} = {(1 + T_{S} \frac{\partial H}{\partial x} |_{p_{0}})}^{k} \frac{\partial x_{0}}{\partial u} |_{p_{0}} + \sum_{i = 0}^{k - 1} {(1 + T_{S} \frac{\partial H}{\partial x} |_{p_{0}})}^{i} T_{S} \frac{\partial H}{\partial u} |_{p_{0}},

(70)

but because

\partial x_{0} / \partial u_{0} = 0

\frac{\partial x_{k}}{\partial u} |_{p_{0}} = \sum_{i = 0}^{k - 1} {(1 + T_{S} \frac{\partial H}{\partial x} |_{p_{0}})}^{i} T_{S} \frac{\partial H}{\partial u} |_{p_{0}} .

(71)

On the other hand the control derivative term in (68)

\frac{\partial u}{\partial π_{i}} = \frac{\int_{0}^{u_{m a x}} w \frac{\partial p_{u}}{\partial π_{i}} . {\vec{μ}}_{i} (w) d w \int_{0}^{u_{m a x}} p_{u} . {\vec{μ}}_{i} (w) d w - \int_{0}^{u_{m a x}} w p_{u} . {\vec{μ}}_{i} (w) d w \int_{0}^{u_{m a x}} \frac{\partial p_{u}}{\partial π_{i}} . {\vec{μ}}_{i} (w) d w}{{(\int_{0}^{u_{m a x}} p_{u} . {\vec{μ}}_{i} (w) d w)}^{2}}

(72)

where

\frac{\partial p_{u}}{\partial π_{i}} = f_{s o f t m a x}^{'} (I_{u} \frac{\partial F_{L L M}}{\partial π_{i}} |_{e_{i n} (t), π_{i} (t)}),

(73)

where

f_{s o f t m a x}^{'} (•)

is the derivative of the softmax function and

e_{i n} (t) = P_{g l u} τ_{10} (E_{c, g l u} {\vec{μ}}_{g} (G (t))) + P_{i n s} τ_{10} (E_{c, i n s} {\vec{μ}}_{i} (u (t))) + e_{f i x}

(74)

is the fixed prompt embedding being function on the historic glucose and insulin values.

Therefore, the parameters of the LLM

π_{i}

are tuned only with respect to current control signal

u (t)

to optimize its impact on the extrapolated glucose concentration for

k > 0

steps ahead of current time t. For k we can take longer period between 1 to 5 hours, which is compatible with residual insulin action to boost the sensitivity of the

G_{k}

to

u (t)

. Otherwise, if k is small this sensitivity may vanish and compromise training convergence.

Important note on the cost

J (π)

is that it is atypical for fine tuning LLM where usually token classification is aimed, hence, cross-entropy loss is employed. Contrary, in our application we look for closed loop performance with respect to inferred from LLM signal levels. We’ve also experimented with hybrid loss accounting for suggested fuzzy rule classification scheme through addition of cross entropy term, however the results were not as good for closed loop performance in this case.

4. Fine-Tuning Implementation

In this section we give the specifics around training loop implementation. To setup the training a couple of components must interact. First, Hovorka metabolic model have to be implemented in vectorized form in GPU to allow parallel simulation of multiple virtual patients. Then input prompt for the LLM needs to be constructed according to proposed fuzzy embedding schema. Appropriate adapter needs to be initialized for LLM fine tuning. The software components from the training phase a then reused during model inference. For the inference a server module is initialized to allow remote calculation of insulin dosage for multiple patients, which will be used for the simulation with the UVa/Padova. Note: some of the squeeze and repeat operations are omitted in the provided code snippets for clarity.

4.1. Metabolic Model in GPU

In Hovorka model a virtual subjects are parametrized using body weight parameter

B W

, which is initialized as torch vector. The initial state is also initialized and repeated for number of virtual subjects, which identifies the batch size. A predefined vector of three meal portions are scheduled for 8, 12 and 19 o’clock on daily basis (Listing 2).

Listing 2. Virtual patients parametrization.

BW = torch.tensor([70., 75. ...])
state = torch.tensor(init_state).repeat(batch_size,1)
# ingestion time, min/ Digested CHO mmol
meals = [(8*60,250), (12*60,250), (19*60,250)]

The model derivatives are calculated according to the presented equations in Section 2.1. Since the model parameters are one dimensional torch tensors with length batch_size, the vectorized model states become a two dimensional torch tensor state with dimension batch_size×Nstate. Internally, each state is assigned to a variable used in calculating derivatives along batch dimensions. Then the derivatives are stacked in a tensor along the state dimension (Listing 3).

In Listing 4 we show the Euler integration of Hovorka model derivatives. The current state is updated with the latest calculated control signal. On the other hand, state extrapolation is carried out for a constant input signal. Note that tensor state is detached at each iteration from torch autograd graph.

Listing 3. Model derivative calculation.

def ap_model(t,state,uin):
    VG = 0.16*BW
    ...
    Q1 = state[:,0]
    Q2 = state[:,1]
    ...
    G = Q1/VG # mmol/L
    ...
    Q1dot = EGP0*(1.0-x3) + UG - FR - (x1+F01c/(VG*G))*Q1 + k12*Q2
    ...
    dstate = torch.hstack([
        Q1dot.unsqueeze(1),
        Q2dot.unsqueeze(1),...])
    return (dstate,G)

Listing 4. Hovorka model integration.

(dstate,G) = ap_model(tt,state,u_new)
state = state + dstate*Ts
state_extrap = state
for k in range(0,Textrap):
(dstate,Gk) = ap_model(tt,state_extrap,u_new)
state_extrap = state_extrap + dstate*Ts

4.2. LoRA Configuration

For efficient training we employ the so called low rank adaptation framework where the pre-trained linear weights

W \in R^{d \times d}

of the LLM layers are not modified. Only a low dimensional correction

Δ W = B A

is applied where

B \in R^{d \times r}

,

A \in R^{r \times d}

and

r ≪ d

. This reduces the number of trainable parameters more than 100 times, which have effect on required memory and model evaluation time. The initialization of LoRA adapter is given in Listing 5. There we set

r = 16

and scaling 32 for

Δ W

, introduce 10% random dropout during training and apply the adapter to query and value matrices of the self-attention layers. Using these settings we reduce trainable parameters from 1,102,301,184 to 2,252,800.

Listing 5. LLM Initialization with LoRA Adapter.

llm_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
llm = AutoModelForCausalLM.from_pretrained(llm_name,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(llm_name)
lora_config = LoraConfig(r=16,lora_alpha=32,...)
llm = get_peft_model(llm, lora_config)

4.3. Fuzzy Embeddings

Listing 6 shows how we generate input embedding vector for given glucose measurement. After definition of the fuzzy variables in input_terms array, the corresponding indices of these variables in the LLM vocabulary are recovered with the tokenizer function. As in our case, multiple tokens correspond to a fuzzy variable, hence, a padding is added to align the lengths of representations. Then using the embed_token layer of the LLM, the corresponding embedding vectors of the selected vocabulary tokens are obtained in input_terms_vec0. Finally, the fuzzy membersip functions are applied to actual glucose measurements contained in yt tensor, multiplied with embedding vectors and aggregated in a single embedding.

Listing 6. LLM embedding of glucose concentration.

input_terms = [’hypoglycemia’,’in range’,’hyperglycemia’]
input_terms_tok = tokenizer(input_terms,padding=True,...)...
input_terms_vec0 = llm.model.embed_tokens(input_terms_tok)
input_terms_vec = hypo_glucose(yt)*input_terms_vec0[0,:,:]
+target_glucose(yt)*input_terms_vec0[1,:,:]
+hyper_glucose(yt)*input_terms_vec0[2,:,:]

4.4. Inference and Defuzzification

The input prompt combined_embeds to LLM is concatenation of parts where some are fixed pre-generated embedding like vect_msg1 and other contain the fuzzy representation of signals in embedding space. The last column from model output is extracted, which components represent the next token prediction scores. The pytorch gather function extracts only the scores matching output fuzzy variable token indices in LLM vocabulary. Using softmax the extracted scores are normalized, after which, they are multiplied with discretized output membership function values in mf_vals through pytorch einsum function. The deffuzification is performed by weithed integration with pytorch tapz function.

Listing 7. Calculating new control action.

combined_embeds = torch.cat([vect_msg1,input_terms_vec,...],...
outputs = llm(inputs_embeds=combined_embeds,use_cache=False)
output_logits = outputs.logits[:,-1]
zero_dose_p = output_logits.gather(1,output_terms_tok[0,0])
low_dose_p = output_logits.gather(1,output_terms_tok[1,0])
high_dose_p = output_logits.gather(1,output_terms_tok[2,0])
mf_block_norm = softmax(torch.cat([zero_dose_p,low_dose_p,high_dose_p],...
agg_mu = torch.einsum(’br,rx->bx’, mf_block_norm, mf_vals)
u_new = BW*(torch.trapz(agg_mu * x_vals.unsqueeze(0), x_vals, dim=1)
/ (torch.trapz(agg_mu, x_vals, dim=1) + 1e-8))

4.5. Training Loop

A custom training loop is implemented in pytorch (Listing 8) using a variation of stochastic gradient optimization [56], known as AdamW torch optimizer. The learning rate is set to

2 \times 10^{- 5}

. The loss function is computed at every training step as quadratic deviation of extrapolated glucose concentration from the target glucose level of 105 mg/dL for a given insulin dosage. During training multiple virtual patients are propageted through the closed loop system in parallel. Therefore, the maximal deviation of quadratic loss over the population is taken. Then the gradients of the parameters are obtained for the subject with maximal deviation as well. The tensors yt and ut contain the historical values of glucose and injected insulin. They are updated at each training step by shifting left previous values and concatenating the current values, i.e. implementing tapped delay line. Historical value tensors are detached from automatic gradient computation graph in the beginning of every iteration.

Listing 8. Training Loop.

for t in range(0,num_training_steps):
    ...
    optimizer.zero_grad()
    total_loss = ((Gk - 105/18)*(Gk - 105/18)).max()
    total_loss.backward()
    optimizer.step()
    yt = torch.hstack([yt[:,input_toks_per_term:],G])
    ut = torch.hstack([ut[:,output_toks_per_term:],u_per_kg])

4.6. Deployment

Running a simulation with even small LLM as TinyLLama require significant computation resources due to massive number of parameters. Therefore a feasible setup is having the actual Fuzzy-LLM controller installed on a computation server with GPU capability. For our experiments we setup the model on NVIDIA RTX A6000 GPU with 48GB of video RAM, even though. For this specific model a lower grade GPU would be suitable too. The architecture from Figure 4 can handle multiple virtual patients simulations by independently keeping the controller state - previous values of glucose and insulin dosages for the tapped delay line, specific patient parameters like identification and body weight. The setup allows parallel evaluation of multiple virtual patients through batching them together. A HTTP REST API endpoint is the way to interface with the centralized controller. In UVa/Padova simulation a stub controller is implemented as an Interpreted MATLAB Function, which only purpose is to collect current input signals and send to the server in blocking model, till waiting for the response. The response contains the suggested insulin dosage.

5. Results

5.1. Fine-Tuning Performance

The Figure 5 shows blood glucose trajectories for multiple virtual subjects over 48 hours during training, with the curves tightly clustered around the target region near 100 mg/dL for much of the simulation. In the initial 15-20 hours from the training we see postprandial hyperglycemic peaks after meal events above 180 mg/dL, as well as, downward dips toward hypoglycemic range of 70 mg/dL. The transient spikes after meals become less disruptive with training, even though they are not fully eliminated. During later training phases the blood glucose look smoother around the 90-110 mg/dL region, with fewer deep drops and less persistent overshoot after peaks. Therefore, the fine-tuning is improving both robustness and timing of insulin dosage. The insulin is being delivered earlier or more appropriately relative to glucose rises, so the closed loop compensates faster without overcorrecting.

Despite subject-to-subject variability, the trajectories remain bounded and eventually exhibit a recurrent pattern around the target range, which suggests the controller is learning a reasonable closed-loop policy. As training progresses, the glucose trajectories of the virtual subjects become more tightly organized around the target region, with the population mean settling near the desired range and the spread across subjects gradually shrinking. Early in training, the curves show larger oscillations and taller post-meal excursions, including some pronounced hyperglycemic peaks. Later in training, the responses remain dynamic around meals, but the baseline is better regulated and the inter-subject variability is visibly reduced.

The insulin dosage plot (Figure 6 shows a clear progression with training toward more structured and repeatable control actions. During early training phases, doses vary more smoothly and stay relatively modest, but later the controller produces sharper, better-timed micro-boluses that align with glucose disturbances. This suggests the model is learning when to hold back insulin during stable periods and when to intervene decisively after meal-related rises or persistent hyperglycemia. The learned policy remains individualized across virtual subjects while still converging to a common dosing pattern. The peaks become more pronounced in the later segments, but they are not random spikes; they appear at consistent times and with similar magnitudes across the batch, which indicates the controller is capturing the recurring meal structure in the simulation.

The training loss curve (Figure 7) shows a decreasing trend over time, which indicates that the controller is progressively learning a better mapping from the fuzzified glucose history to the insulin action. The large spikes at the beginning are consistent with the model still exploring poor dosing decisions and encountering large glucose deviations, so the objective is temporarily high. As training continues, the baseline loss drops close to zero for long intervals, suggesting that the learned policy is increasingly able to keep the predicted glucose near the target level. The remaining isolated spikes are also informative: they likely correspond to harder scenarios, such as meal-induced disturbances or subjects with stronger variability, where the controller must react more aggressively. Importantly, these spikes become less frequent and the low-loss segments become longer, which implies improved stability and better generalization across virtual subjects.

The Figure 8 should be interpreted as a monitoring signal rather than an optimization objective. The cross-entropy here measures how well the current input-side glucose membership terms, such as hypo, target, and hyper, align with the target insulin classes zero, low, and high. Because it is not directly minimized, the curve does not need to decrease monotonically, and the visible fluctuations are expected. The overall level stays in a fairly narrow band for much of training, which suggests that the linguistic mapping between glucose patterns and insulin categories remains reasonably stable. The spikes indicate moments where the current fuzzy representation is less consistent with the target insulin label, likely due to harder meal-driven transitions or borderline glucose states. In other words, these peaks show mismatch or ambiguity in the class correspondence, indicating a trivial dosing strategy (high insulin when glucose is high) wouldn’t be sufficient for quadratic performance. Much of the trace remains centered around a moderate entropy level. That means the fuzzy tokenization still carries useful information for classification, even as the controller is being trained by the separate closed-loop loss.

The Figure 9 shows how the insulin-dose membership levels evolve during training for the three output classes: zero dose, low dose, and high dose. The high-dose membership remains dominant for long intervals, especially after the first several hours, which indicates that the controller often interprets the current glucose context as requiring an active corrective response. At the same time, the zero-dose curve is strongest mainly in the early part of the horizon and then tends to remain near zero except for brief impulses, suggesting that the policy becomes less conservative once the training loop learns the recurrent hyperglycemic patterns.

The low-dose membership acts as an intermediate channel and stays active more continuously than the zero-dose class, but with smaller amplitude than the high-dose class. This is a useful sign because it means the model is not collapsing into a purely binary strategy; instead, it preserves graded control behavior where modest corrections are still available when the glucose state is near the transition region. The short upward excursions in the low- and zero-dose curves likely correspond to ambiguous or boundary cases where the fuzzy representation allows multiple insulin actions to remain partially plausible.

Another clear trend is that the high-dose membership becomes more stable and repeatedly saturates at a high level later in training. That pattern suggests the learned controller has become more confident in associating many of the observed glucose histories with stronger insulin action, which is consistent with meal-driven hyperglycemic excursions in the simulation. The frequent sharp drops and recoveries across all three curves reflect switching behavior in a fuzzy controller, where the output is not a single crisp class but a soft membership distribution over insulin options.

5.2. Simulation with UVa/Padova

The controller was also evaluated in the environment of the UVa/Padova simulator, which is a state-of-the-art tool for proving the feasibility of closed-loop controllers for type 1 diabetes. The controller is applied to the provided simulator group of 10 adult subjects, plus 1 subject representing an average adult population. The selected scenario is a 31 h period with three main meals. The timings of the meals were set to 7:00, 14:00 and 21:00 with the amount of 40 g of carbohydrates and meal duration of 15 min. Detailed results are presented for all subjects in Table 3.

Several well-recognized metrics in the AP field are presented in Table 3. The UVa/Padova results show that the Fuzzy-LLM controller trained on the Hovorka model achieves generally good glycemic regulation across the 10 adult virtual subjects. The final blood glucose for averaged subject is 138 mg/dL, with 97% time in range and essentially no time below 50 mg/dL or above 300 mg/dL, which indicates a strong safety profile and effective prevention of severe hypo- and hyperglycemia. A key strength of the table is the consistency across subjects: most cases stay close to the target zone, with only moderate excursions above 180 mg/dL in a few harder scenarios, such as subjects 5 and 7. Note subject 7 pose a significant challenges for many AP controllers since it is an adult with 46kg body weight. Even there, the controller still avoids dangerous extremes, and the risk indices remain low to moderate, with average LBGI and HBGI both around 2. This suggests that the controller is conservative enough to remain safe, while still responsive enough to keep the majority of the trajectory within the clinically desirable range.

The CVGA-related metrics also support this interpretation as well CVGA plot in Figure 12. The average RoC is only 0.6, which implies relatively smooth glucose evolution rather than unstable oscillatory behavior, and the A+B metric is high at 75%, showing that most trajectories lie in the safer CVGA zones. The E+F value is 0% for all subjects, confirming that the controller successfully avoids the most dangerous regions of the clinical risk map.

The population graph trace of the BG variation for the selected scenario is presented in Figure 10, where the mean BG for the population is plotted along with minimal, maximal, and standard deviation bounds. All curves are in the acceptable range. Figure 11 presents the hourly calculated glucose risk indices where HBGI is positive number and LBGI is a negative number, along with their standard deviations taken for the examined population.

The glucose density function (Figure 13) is sharply concentrated in the clinically relevant region between the two green thresholds, with the dominant mass sitting around roughly 110-190 mg/dL. This indicates that most simulated glucose values remain near the target and upper target range, rather than spreading broadly across extreme hypo- or hyperglycemic regions. The tall, narrow peak near about 115 mg/dL suggests a strong clustering around the preferred operating point of the controller. The annotated percentages show that only a very small fraction of samples lie below the lower bound, while the majority, about 91%, lies in the central band, and a smaller but non-negligible tail, about 8%, extends into the hyperglycemic side. That shape is consistent with a controller that is generally effective but still allows occasional postprandial overshoots. In other words, the learned policy maintains most trajectories inside or close to the target zone, but does not completely eliminate excursions after meals or during more difficult subject-specific dynamics.

Figure 12. CVGA analysis.

6. Conclusions

The results show that the proposed Fuzzy-LLM controller can regulate glucose effectively in both the internal Hovorka-based training environment and the external UVa/Padova benchmark. In the training simulations, glucose trajectories remain largely within the clinically acceptable zone, while the UVa/Padova validation confirms that this behavior transfers to a standardized simulator with multiple virtual adult subjects. This is important because it indicates that the controller is not simply overfit to one model instance, but can generalize across a broader population of patient dynamics.

A major strength of the approach is that it combines the interpretability of fuzzy logic with the sequence-handling capacity of a fine-tuned language model. The fuzzy membership representation provides a natural bridge between physiological variables and the LLM token space, while the closed-loop training objective aligns the learned policy with actual glycemic regulation rather than isolated classification accuracy. The figures and tables support this claim: glucose remains mostly in range, severe hypo- and hyperglycemia are avoided, and the risk indices stay low across the evaluated subjects.

At the same time, the results also show that the task is not trivial. Some subjects still experience moderate postprandial excursions, which is expected in a setting with meal disturbances and inter-subject variability. This suggests that the controller is robust but still conservative, especially when it must trade off between aggressive correction and safety. The cross-entropy diagnostic and insulin membership evolution also indicate that the fuzzy representation remains meaningful throughout training, even though the main optimization target is the closed-loop glucose error.

From a control perspective, the UVa/Padova outcomes are especially encouraging because they are obtained under a benchmark environment commonly used to assess artificial pancreas algorithms. The fact that the controller avoids dangerous regions in the control-variability grid and keeps the time in range high supports the viability of the proposed Fuzzy-LLM framework as a real-time decision-support strategy. This can be seen as evidence that language-model fine-tuning can be repurposed beyond text generation and opens possibility for structured biomedical control when the input and output signals are carefully embedded.

In conclusion, the study suggests a promising direction for closed-loop glucose regulation by combining Hovorka-based simulation, fuzzy membership encoding, and LLM fine-tuning. Future work should focus on longer simulation horizons, broader meal variability, additional safety constraints, and eventually evaluation on more extensive datasets.

Funding

The present study is carried out within the project Infrastructure for Fine-tuning Pre-trained Large Language Models, Grant Agreement No. ПВУ – 55 from 12.12.2024 /BG-RRP-2.017-0030-C01/.

Data Availability Statement

Due to privacy or ethical restrictions no data is available.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

American Diabetes Association. Standards of Care in Diabetes to Guide Prevention, Diagnosis, and Treatment for People Living with Diabetes; American Diabetes Association: Arlington, VA, USA, 2023. [Google Scholar]
National Institute for Health and Care Excellence. Type 1 Diabetes in Adults: Diagnosis and Management; National Institute for Health and Care Excellence: London, UK, 2017. [Google Scholar]
Nakrani, M.N.; Wineland, R.H.; Anjum, F. Physiology, Glucose Metabolism. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2023; Available online: https://www.ncbi.nlm.nih.gov/books/NBK560599/ (accessed on 1 June 2023).
Melo, K.F.S.; Bahia, L.R.; Pasinato, B.; Porfirio, G.J.M.; Martimbianco, A.L.; Riera, R.; Calliari, L.E.P.; Minicucci, W.J.; Turatti, L.A.A.; Pedrosa, H.C.; et al. Short-acting insulin analogues versus regular human insulin on postprandial glucose and hypoglycemia in type 1 diabetes mellitus: A systematic review and meta-analysis. Diabetol Metab. Syndr. 2019, 11. [Google Scholar] [CrossRef] [PubMed]
Man, C.D.; Micheletto, F.; Lv, D.; Breton, M.; Kovatchev, B.; Cobelli, C. The UVA/PADOVA Type 1 Diabetes Simulator: New Features. J Diabetes Sci Technol. 2014, 26–34. [Google Scholar] [CrossRef] [PubMed]
Fushimi, E.; De Battista, H.; Garelli, F. A Dual-Hormone Multicontroller for Artificial Pancreas Systems. IEEE J. Biomed. Health Inform. 2022, 26, 4743–4750. [Google Scholar] [CrossRef] [PubMed]
Peng, Z.; Xie, X.; Tan, Q.; Kang, H.; Cui, J.; Zhang, X.; Li, W.; Feng, G. Blood glucose sensors and recent advances: A review. J. Innov. Opt. Health Sci. 2022, 15. [Google Scholar] [CrossRef]
Huyett, L.M.; Dassau, E.; Zisser, H.C.; Doyle, F.J. Glucose Sensor Dynamics and the Artificial Pancreas. IEEE Control. Syst. Mag. 2018, 38, 30–46. [Google Scholar] [CrossRef]
Berget, C.; Messer, L.H.; Forlenza, G.P. A Clinical Overview of Insulin Pump Therapy for the Management of Diabetes: Past, Present, and Future of Intensive Therapy. Diabetes Spectr. 2019, 32, 194–204. [Google Scholar] [CrossRef]
Lewis, D.; Leibrand, S. Real-World Use of Open Source Artificial Pancreas Systems. J. Diabetes Sci. Technol. 2016. [Google Scholar] [CrossRef]
Knebel, T.; Neumiller, J.J. Medtronic MiniMed 670G Hybrid Closed-Loop System. Clin. Diabetes 2019, 37, 94–95. [Google Scholar] [CrossRef]
Kublin, O.; Stępień, M. The Nightscout system—Description of the system and its evaluation in scientific publications. Pediatr. Endocrinol. Diabetes Metab. 2020, 26, 140–143. [Google Scholar] [CrossRef]
Gomez, E.J.; Pérez, M.E.H.; Vering, T.; Cros, M.R.; Bott, O.; García-Sáez, G.; Pretschner, P.; Brugués, E.; Schnell, O.; Patte, C.; et al. The INCA System: A Further Step Towards a Telemedical Artificial Pancreas. IEEE Trans. Inf. Technol. Biomed. 2008, 12, 470–479. [Google Scholar] [CrossRef]
Pfeiffer, E.F. Artificial pancreas: State of the Art. Int. J. Artif. Organs 1988, 11, 13–26. [Google Scholar] [CrossRef] [PubMed]
Bondia, J.; Romero-Vivó, S.; Ricarte, B.; Díez, J.L. Insulin Estimation and Prediction. IEEE Control. Syst. Mag. 2018, 38, 47–66. [Google Scholar] [CrossRef]
Seron, M.M.; Braslavsky, J.H.; Goodwin, G.C. Fundamental Limitations in Filtering and Control; Springer: London, UK, 1997; ISBN 9781447112440. [Google Scholar]
Ramkissoon, C.M.; Aufderheide, B.; Bequette, B.W.; Vehi, J. A Review of Safety and Hazards Associated With the Artificial Pancreas. IEEE Rev. Biomed. Eng. 2017, 10, 44–62. [Google Scholar] [CrossRef]
Borri, A.; Cacace, F.; Gaetano, A.; Germani, A.; Manes, C.; Palumbo, P.; Panunzi, S.; Pepe, P. Observers for Nonlinear Time-Delay Systems with Application to the Artificial Pancreas IEEE Control. Syst. Mag. 2017, 37, 33–49. [Google Scholar]
Sanz, R.; Garcia, P.; Diez, J.-L.; Bondia, J. Artificial Pancreas System With Unannounced Meals Based on a Disturbance Observer and Feedforward Compensation. IEEE Trans. Control. Syst. Technol. 2021, 29, 454–460. [Google Scholar] [CrossRef]
Turksoy, K.; Samadi, S.; Feng, J.; Littlejohn, E.; Quinn, L.; Cinar, A. Meal Detection in Patients With Type 1 Diabetes: A New Module for the Multivariable Adaptive Artificial Pancreas Control System. IEEE J. Biomed. Health Inform. 2016, 20, 47–54. [Google Scholar] [CrossRef]
Paoletti, N.; Liu, K.S.; Chen, H.; Smolka, S.A.; Lin, S. Data-Driven Robust Control for a Closed-Loop Artificial Pancreas. IEEE/Acm Trans. Comput. Biol. Bioinform. 2020, 17, 1981–1993. [Google Scholar] [CrossRef]
Lee, S.; Kim, J.; Park, S.W.; Jin, S.-M.; Park, S.-M. Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Validation. IEEE J. Biomed. Health Inform. 2021, 25, 536–546. [Google Scholar] [CrossRef] [PubMed]
Chakrabarty, A.; Zavitsanou, S.; Doyle, F.J.; Dassau, E. Event-Triggered Model Predictive Control for Embedded Artificial Pancreas Systems. IEEE Trans. Biomed. Eng. 2018, 65, 575–586. [Google Scholar] [CrossRef]
Chakrabarty, A.; Healey, E.; Shi, D.; Zavitsanou, S.; Doyle, F.J.; Dassau, E. Embedded Model Predictive Control for a Wearable Artificial Pancreas. IEEE Trans. Control. Syst. Technol. 2020, 28, 2600–2607. [Google Scholar] [CrossRef]
Batmani, Y.; Khodakaramzadeh, S.; Moradi, P. Automatic Artificial Pancreas Systems Using an Intelligent Multiple-Model PID Strategy. IEEE J. Biomed. Health Inform. 2022, 26, 1708–1717. [Google Scholar] [CrossRef] [PubMed]
Meneghetti, L.; Terzi, M.; Favero, S.; Susto, G.A.; Cobelli, C. Data-Driven Anomaly Recognition for Unsupervised Model-Free Fault Detection in Artificial Pancreas. IEEE Trans. Control. Syst. Technol. 2020, 28, 33–47. [Google Scholar] [CrossRef]
Ruiz-Velázquez, E.; García-Rodríguez, J.; Quiroz, G.; Femat, R. Robust μ-synthesis: Towards a unified glucose control in adults, adolescents and children with T1DM. J. Frankl. Inst. 2020, 357, 9633–9653. [Google Scholar] [CrossRef]
Cassany, L.; Gucik-Derigny, D.; Cieslak, J.; Henry, D.; Franco, R.; Ferreira de Loza, A.; Ríos, H.; Olcomendy, L.; Pirog, A.; Bornat, Y.; Renaud, S.; Catargi, B. A Robust H-∞ Control Approach for Blood Glucose Regulation in Type-1 Diabetes. IFAC-PapersOnLine 2021, 54, 460–465. [Google Scholar] [CrossRef]
Cassany, L.; Gucik-Derigny, D.; Cieslak, J.; Henry, D.; Franco, R.; De Loza, A.F.; Rios, H.; Olçomendy, L.; Pirog, A.; Bornat, Y.; et al. A Robust Control solution for Glycaemia Regulation of Type-1 Diabetes Mellitus. In 2021 European Control Conference (ECC); IEEE: Delft, Netherlands, 2021; pp. 327–332. [Google Scholar] [CrossRef]
Mandal, S.; Sutradhar, A. Robust controller for artificial pancreas for patients with type-1 diabetes. Res. Biomed. Eng. 2023, 39, 437–450. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy Logic. Computer 1988, 21(4), 83–93. [Google Scholar] [CrossRef]
Lee, C.C. Fuzzy Logic in Control Systems: Fuzzy Logic Controller. I. IEEE Transactions on Systems, Man, and Cybernetics 1990, 20(2), 404–18. [Google Scholar] [CrossRef]
Mamdani, E.H.; Assilian, S. An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. International Journal of Man-Machine Studies 1975, 7(1), 1–13. [Google Scholar] [CrossRef]
Grant, P. A New Approach to Diabetic Control: Fuzzy Logic and Insulin Pump Technology. Medical Engineering & Physics 2007, 29(7). [Google Scholar] [CrossRef]
Mauseth, R.; Wang, Y.; Dassau, E. Proposed Clinical Application for Tuning Fuzzy Logic Controller of Artificial Pancreas Utilizing a Personalization Factor. Journal of Diabetes Science and Technology 2010, 4(4). [Google Scholar] [CrossRef] [PubMed]
Yan, S.-R.; Alattas, K.A.; Bakouri, M.; Alanazi, A.K.; Mohammadzadeh, A.; Mobayen, S.; Zhilenkov, A.; Guo, W. Generalized Type-2 Fuzzy Control for Type-I Diabetes: Analytical Robust System. Mathematics 2022, 10(690). [Google Scholar] [CrossRef]
Liu, M.; Zhang, H.; Xu, Z.; Ding, K. The fusion of fuzzy theories and natural language processing: A state-of-the-art survey. Appl. Soft Comput. 2024. [Google Scholar] [CrossRef]
Zhang, H.; Shang, J. Natural Language Processing and Applications. In Tsinghua University Press; 2025; p. 9789819797387. [Google Scholar]
Adel, N.; Crockett, K.; Carvalho, J.P.; Cross, V. Fuzzy Influence in Fuzzy Semantic Similarity Measures. 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2021 IEEE 1–7. [Google Scholar]
Adel, N.; Crockett, K.; Livesey, D.; Carvalho, J.P. IEEE Access202210 81506–81521; An interval type-2 fuzzy ontological similarity measure.
Hoang, C.; Sachan, D.; Mathur, P. Improving retrieval augmented neural machine translation by controlling source and fuzzy-match interactions. Findings of the Association for Computational Linguistics: EACL 2023 2023, 289–295. [Google Scholar]
Yan, R.; Yu, Y.; Qiu, D. Emotion-enhanced classification based on fuzzy reasoning. International Journal of Machine Learning and Cybernetics 13(3), 839–850. [CrossRef]
Li, Q.; Li, L.; Li, Q.; Zhong, J. A comprehensive exploration on spider with fuzzy decision text-to-SQL model. IEEE Transactions on Industrial Informatics 2019, 16(4), 2542–2550. [Google Scholar] [CrossRef]
Novák, V. Fuzzy logic in natural language processing. 2017 IEEE international conference on fuzzy systems (FUZZ-IEEE); 2017; pp. 1–6. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention Is All You Need. arXiv. 2023.
Hu, E.J.; Shen, Y.; Wallis, P.; et al. LoRA: Low-Rank Adaptation of Large Language Models. arXiv. 2021.
Aghajanyan, A.; Zettlemoyer, L.; Gupta, S. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. arXiv. 2020.
Nosrati, K.; Tepljakov, A.; Belikov, J.; Petlenkov, E. When control meets large language models: From words to dynamics. arXiv. 2026.
Hovorka, R.; Canonico, V.; Chassin, L.J.; Haueter, U.; Massi-Benedetti, M.; Federici, M.O.; Wilinska, M.E. Nonlinear model predictive control of glucose concentration in subjects with type 1 diabetes. Physiol. Meas. 2004, 25, 905. [Google Scholar] [CrossRef] [PubMed]
Carson, E.R.; Cobelli, C.; Finkelstein, L. The mathematical modeling of metabolic and endocrine systems: model formulation, identification, and validation; Wiley: New York, 1983; Available online: https://api.semanticscholar.org/CorpusID:83410068.
Cao, S.G.; Rees, N.W.; Feng, G. Analysis and design of fuzzy control systems using dynamic fuzzy global models. Fuzzy Sets and Systems 1995, 75(1), 47–62. [Google Scholar] [CrossRef]
Cao, S.G.; Rees, N.W.; Feng, G. Mamdani-type fuzzy controllers are universal fuzzy controllers. Fuzzy Sets and Systems 2001, 123(3), 359–367. [Google Scholar] [CrossRef]
Touvron, H.; Lavril, T.; Izacard, G.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv. 2023.
Su, J.; Lu, Y.; Pan, S.; et al. RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv. 2023.
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. 2016.
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv. 2017.

Figure 1. Membership functions for glucose concentration fuzzy sets.

Figure 2. Membership functions for insulin dosage fuzzy sets.

Figure 3. Closed-loop system diagram.

Figure 4. Centralized server deployement of Fuzzy-LLM controller.

Figure 5. Blood Glucose for multiple virtual subjects during training.

Figure 6. Insulin dosage for multiple virtual subjects during training.

Figure 7. Loss function during training.

Figure 8. Cross entropy during training. (Cross entropy is reported only as an auxiliary diagnostic of the consistency between glucose membership patterns and insulin class labels; it is not used as the training loss.)

Figure 9. Insulin dose membership function levels during training.

Figure 10. Glucose trace for the 10-adult population from the UVA/Padova simulator. The green line represents the average glucose, the orange line represents the

\pm 1

standard deviation interval, and the red line is the minimal and maximal values from the envelope.

Figure 10. Glucose trace for the 10-adult population from the UVA/Padova simulator. The green line represents the average glucose, the orange line represents the

\pm 1

standard deviation interval, and the red line is the minimal and maximal values from the envelope.

Figure 11. Glucose risk index calculated for each hour with

\pm 1

standard deviation confidence.

Figure 11. Glucose risk index calculated for each hour with

\pm 1

standard deviation confidence.

Figure 13. Glucose density function.

Table 1. Model parameters.

Parameter	Symbol	Unit	Value
Glucose distrib. volume	$V_{G}$	L	$0.16 B W$
Insulin distrib. volume	$V_{I}$	L	$0.12 B W$
Non-insulin glucose flux	$F_{01}$	mmol/min	$0.0097 B W$
Transfer rate from $Q_{2}$ to $Q_{1}$	$k_{12}$	1/min	0.0066
Deactivation rate	$k_{a, 1}$	1/min	0.006
Deactivation rate	$k_{a, 2}$	1/min	0.06
Deactivation rate	$k_{a, 3}$	1/min	0.03
Insulin sens. of glucose transport	$S_{I}^{T}$	L/min/mU	$51.2 \times 10^{- 4}$
Insulin sens. of glucose distribution	$S_{I}^{D}$	L/min/mU	$8.2 \times 10^{- 4}$
Insulin sens. of EGP	$S_{I}^{E}$	L/min/mU	$520 \times 10^{- 4}$
EGP at 0 insulin	$E G P_{0}$	mmol/min	$0.0161 B W$
Carbohydrate bioavailability	$A_{G}$	-	0.8
Time to max carbohydrate	$t_{m a x, G}$	min	40
Time to max insulin	$t_{m a x, I}$	min	55
Insulin elimination from plasma	$k_{e}$	1/min	0.138

Table 2. Fuzzy Carrier Tokenization with Padding.

Fuzzy Set	Term	Token IDs	Tokens	Embedding
$G_{h y p o}$	hypoglycemia	10163, 468, 368, 19335, 423	hyp-og-ly-cem-ia	$e_{h y p o} \in R^{2048 \times 5}$
$G_{t a r g e t}$	in range	297, 3464, 2, 2, 2	in -range	$e_{t a r g e t} \in R^{2048 \times 5}$
$G_{h y p e r}$	hyperglycemia	11266, 16808, 19335, 423, 2	hyper-gly-cem-ia	$e_{h y p e r} \in R^{2048 \times 5}$
$U_{z}$	zero	5225	zero	$e_{z e r o} \in R^{2048}$
$U_{l}$	low	4482	low	$e_{l o w} \in R^{2048}$
$U_{h}$	high	1880	high	$e_{h i g h} \in R^{2048}$

’2’ is padding token id in TinyLLama tokenizer.

Table 3. Result summary from the UVa/Padova simulation with the

μ

-controller for 10 subjects from the adult population.

Table 3. Result summary from the UVa/Padova simulation with the

μ

-controller for 10 subjects from the adult population.

ID	BG	$T_{< 90}$	$T_{r e f}$	$T_{> 180}$	LBGI	HBGI	BGRI	RoC	A+B
1	143	0	97	3	0	3	3	0.5	63
2	131	0	100	0	0	1	1	0.3	90
3	142	0	100	0	0	2	2	0.5	62
4	140	0	99	1	0	2	2	0.5	64
5	158	0	73	26	0	5	5	0.7	32
6	133	0	95	5	0	2	2	0.5	79
7	168	0	60	40	0	7	7	0.9	27
8	112	3	96	0	1	1	2	0.7	80
9	135	0	98	2	0	2	2	0.5	78
10	125	0	100	0	0	1	1	0.4	86
AVG	138	0	97	3	0	2	2	0.6	75

Columns: ID—subject identification, BG—blood glucose concentration in mg/dL,

T_{< 50}

—time below 50 mg/dL (%),

T_{< 90}

—time below 90 mg/dL (%),

T_{r e f}

—time in range (%),

T_{> 180}

—time above 180 mg/dL (%),

T_{> 300}

—time above 300 mg/dL (%), LBGI—low blood glucose index, HBGI—high blood glucose index, BGRI—blood glucose risk index, RoC—standard deviation of BG rate of change, A + B—% of time in A and B zones from CVGA, E + F—% of time in A and B zones from CVGA.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Fine-Tuning LLMs for Real-Time Fuzzy Insulin Control in Type I Diabetes

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries

2.1. Hovorka Model

2.2. Fuzzy Logic Control Basics

2.3. Large Language Models

3. Proposed LLM-Fuzzy Framework

3.1. Fuzzy Embeddings

3.2. LLM for Fuzzy Inference

3.3. Closed-Loop System with LLM

3.4. Training Objective

4. Fine-Tuning Implementation

4.1. Metabolic Model in GPU

4.2. LoRA Configuration

4.3. Fuzzy Embeddings

4.4. Inference and Defuzzification

4.5. Training Loop

4.6. Deployment

5. Results

5.1. Fine-Tuning Performance

5.2. Simulation with UVa/Padova

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe

ID	BG	$T_{< 90}$	$T_{r e f}$	$T_{> 180}$	LBGI	HBGI	BGRI	RoC	A+B
1	143	0	97	3	0	3	3	0.5	63
2	131	0	100	0	0	1	1	0.3	90
3	142	0	100	0	0	2	2	0.5	62
4	140	0	99	1	0	2	2	0.5	64
5	158	0	73	26	0	5	5	0.7	32
6	133	0	95	5	0	2	2	0.5	79
7	168	0	60	40	0	7	7	0.9	27
8	112	3	96	0	1	1	2	0.7	80
9	135	0	98	2	0	2	2	0.5	78
10	125	0	100	0	0	1	1	0.4	86
AVG	138	0	97	3	0	2	2	0.6	75

ID	BG	$T_{< 90}$	$T_{r e f}$	$T_{> 180}$	LBGI	HBGI	BGRI	RoC	A+B
1	143	0	97	3	0	3	3	0.5	63
2	131	0	100	0	0	1	1	0.3	90
3	142	0	100	0	0	2	2	0.5	62
4	140	0	99	1	0	2	2	0.5	64
5	158	0	73	26	0	5	5	0.7	32
6	133	0	95	5	0	2	2	0.5	79
7	168	0	60	40	0	7	7	0.9	27
8	112	3	96	0	1	1	2	0.7	80
9	135	0	98	2	0	2	2	0.5	78
10	125	0	100	0	0	1	1	0.4	86
AVG	138	0	97	3	0	2	2	0.6	75

ID	BG	$T_{< 90}$	$T_{r e f}$	$T_{> 180}$	LBGI	HBGI	BGRI	RoC	A+B
1	143	0	97	3	0	3	3	0.5	63
2	131	0	100	0	0	1	1	0.3	90
3	142	0	100	0	0	2	2	0.5	62
4	140	0	99	1	0	2	2	0.5	64
5	158	0	73	26	0	5	5	0.7	32
6	133	0	95	5	0	2	2	0.5	79
7	168	0	60	40	0	7	7	0.9	27
8	112	3	96	0	1	1	2	0.7	80
9	135	0	98	2	0	2	2	0.5	78
10	125	0	100	0	0	1	1	0.4	86
AVG	138	0	97	3	0	2	2	0.6	75