2.2. The Digital Twin for Fish Feeding Management
The first step to construct the digital twin for fish feeding management is to understand the aquaculture processes to operate a fish farm. The first step of the cage culturing is the development of net cages for fish breeding. As shown in
Figure 3, in this study, the net cages for experiments are in the offshore Pingtung and Penghu, Taiwan. The circular perimeter of inner pipe of the former is 102 meters, which is a large submersible cage; on the contrary, the net cages in Pinghu are relatively small and divided into types: the circular cage with the perimeter of inner pipe of the former to be about 30 meters; the squared cage of 10x10x8 m3 in size. The cages in Pingtung are at open sea areas which have deeper waters and stronger ocean currents, and thus could prevent environmental pollution and fish diseases. The tested offshore net cage in Pingtung would have annual output of the cultured snubnose pompano up to 200 tons/cage year. However, the risk and the operation cost of the aquaculture with large offshore net cages are relatively high as compared with small net cages in Penghu. Thus, the sensors and the automatic machines of IoT systems should be different for different environments to facilitate the farmer to operate aquaculture processes.
Studying fish growth models is crucial in the context of aquaculture processes. Fish growth, defined as the change in body mass over time, results from the interplay of two opposing processes – one that increases body mass and another that decreases it. This growth phenomenon is intricately linked to weight and time, allowing growth data to be constructed directly or indirectly using these parameters [
29]. From an energetic perspective, Ursin's model [
30] characterizes fish growth as the difference between anabolism (the building-up phase) and catabolism (the breakdown phase). This energetic concept is reflected in the energy budget presented in
Figure 8, and Ursin's bioenergetic model can be expressed as
where
represents the weight of the fish at time
t,
is the energy absorbed in the entire anabolism, and
and
represent the anabolism coefficient and slope related to anabolism and fish weight, respectively. On the other hand,
is the energy lost in the whole catabolism where
and
is the catabolism coefficient and the slope related to the catabolism and fish weight, respectively. Yang [
31] defined the detailed fish growth model derived from Ursin’ model and took into account the effect of environmental factors such as growth and water quality. The model includes the effects of different parameters such as water temperature (
T), body size and weight (
w), un-ionized ammonia (
UIA), dissolved oxygen (
DO), photoperiod (
p), and relative feeding ratio (
f). Thus, the growth rate model of a fish is again described as the difference between anabolism and catabolism:
where
(g
1− mday
− 1 ) and
are the coefficients of anabolism;
m is the exponent of body weight for net anabolism;
k(
T) (g
1− n day
− 1 ) is the coefficient of fasting catabolism;
n is the exponent of body weight for fasting catabolism. Although more factors should be considered to realize the fish growth model, Equation (2) lists the most important factors to measure the fish growth rate. In [
32], based on (2), Chahid
et al. proposed a Fish growth trajectory tracking using Q-learning [
33] to determine a fish feeding policy which achieves 1.7% and 6.6% relative trajectory tracking errors of the average total weight of Nile Tilapia (
O. niloticus) from both tanks on land and floating cages, respectively.
Figure 4.
The wireless communication based AIoT system for automatic data collection from net cages, which are further inputted to our digital twins for performing intelligent fish feeding control.
Figure 4.
The wireless communication based AIoT system for automatic data collection from net cages, which are further inputted to our digital twins for performing intelligent fish feeding control.
The fish model mentioned in (2) was initially developed within the context of a semi-intensive aquaculture pond to determine the nutrient requirements for Nile tilapia [
31]. This implies that in such a setting, fish growth and production are partially controlled, with aquatic species' growth relying on a combination of natural feeds and aquafeeds. It is evident that the mathematical fish model can be adapted for net cage culturing by substituting the parameters specific to Nile tilapia with those corresponding to the target aquatic species [
34]. By incorporating a knowledge base pertaining to the target aquatic species to define the parameters in the growth model, we can create a synthetic training dataset that accounts for the uncertainties and measurement noise present in real aquaculture environments. This dataset is then utilized to train a transformer-based deep learning model [
35], which predicts fish growth rates based on environmental factors and feeding.
Furthermore, the fish growth model is fine-tuned by incorporating test data obtained from our experimental net cages using the deployed IoT sensors. While this approach is one possible way to design our fish feeding Digital Twin (DT) object, it remains somewhat detached from real-world aquaculture processes. In this study, we aim to design our DT architecture based on the physical processes, as defined by aquaculture experts, using a simplified version of (2).
Figure 4 illustrates the proposed AIoT architecture, which collects water quality data, optical stereo RGB videos, and sonar images from our experimental net cages. The digital twin is deployed on our cloud servers. In practice, as depicted in
Figure 5, the hardware architecture of the proposed data collection system is further divided into three distinct sub-systems. These include an AI buoy for water quality inspection, as well as two lightweight broadband carriers for operating optical stereo cameras and sonar image cameras, respectively. Our AI buoy system primarily consists of a solar panel, a control box, two lifebuoys, a steel skeleton, and sensors for collecting data on water flow, temperature (T), dissolved oxygen (DO), and salinity. Additionally, three server-side AI programs are also established on the shore server within the cloud [
13].
For each data collection sub-system, we have implemented sonar panels to recharge the battery set, ensuring a continuous power supply to operate the sensors, wireless base station, and communication hub. As a result, these systems achieve the goal of promoting environmental, social, and government (ESG) standards without introducing any additional carbon footprint in fish production.
Figure 5.
The hardware architecture of the proposed data collection system is divided into three different sub-systems: (a) the AI buoy for water quality prediction; (b) the lightweight broadband carrier for operating optical stereo camera; (c) the lightweight broadband carrier for operating sonar image camera.
Figure 5.
The hardware architecture of the proposed data collection system is divided into three different sub-systems: (a) the AI buoy for water quality prediction; (b) the lightweight broadband carrier for operating optical stereo camera; (c) the lightweight broadband carrier for operating sonar image camera.
Figure 6.
The sequence diagram of the underwater video surveillance system for fish metrics estimation.
Figure 6.
The sequence diagram of the underwater video surveillance system for fish metrics estimation.
During the daily fish feeding process, our two-mode underwater video surveillance system is activated to capture surveillance videos, which are then automatically transmitted to the cloud for fish metrics estimation using our broadband wireless communication system. Additionally, a software daemon is implemented in the cloud to monitor incoming surveillance videos, storing them into the network-attached storage (NAS), and activating pre-trained deep learning models for fish count, length, and weight estimation [
17,
21].
Figure 6 illustrates the sequence diagram of the underwater video surveillance system, which provides fish metrics data to the digital twin for further decision-making in fish feeding management.
Figure 7 showcases the fish metrics estimation models that were proposed in our previous work [
17]. Instead of introducing new deep learning models to complete the AI functions of the fish metrics estimation Digital Twin (DT) object, our current study focuses on integrating mature prediction models into our DT objects. It is worth noting that the performance of prediction models can be enhanced by replacing them with new state-of-the-art models. The cloud infrastructure supports a server-side maintenance scheme, offering plug-in services to front-end users. Consequently, our DT objects are autonomous and possess self-learning capabilities, making them adaptable and evolvable.
Figure 7.
The stereo image based fish metrics estimation model proposed in our previous work [
17].
Figure 7.
The stereo image based fish metrics estimation model proposed in our previous work [
17].
Figure 8.
The MLP based regression model for fish food prediction.
Figure 8.
The MLP based regression model for fish food prediction.
The next step of our digital twin approach is to estimate the amount of food for the diary fish feeding. At time
t, conventional fish farm determines the amount of fish food
Ft for daily feeding process by the following equation:
where
is the relative feeding rate,
is the number of fish and
is the average fish weight. In practice, the value of
is set to be about 0.3 when the fish are kept on a well-controlled fish pond; however, it’s value is obviously dependent on the actual environmental factors. For example, the low temperature would decrease the appetite of fish and thus the resulting value of
should be decreased to reduce the food residual. Actually, we cannot find a complete model to determine the optimal value of
in the literature. Instead of using a function to determine
, we adopt a data-driven approach to predict the daily amount of fish food using an artificial neural network (ANN) regression model, which is a simple regression model with multi-layer perceptron (MLP) [
36]. As shown in
Figure 8, the inputs to the food prediction model include water quality data and fish metrics estimation results,
e.g., the fish count and average weight; the output is the predicted food amount. A training dataset of about 200 records is collected from real fish farms are used to train the model. The testing accuracy is up to 98%. The fish food prediction DT is thus designed to predict fish food amount based on the human being experiences.
Figure 9.
Flowchart of fish feeding procedure.
Figure 9.
Flowchart of fish feeding procedure.
To carry out the fish feeding process, the predicted amount of fish food is dispensed into the food bucket of our intelligent feeding machine.
Figure 9 provides a flowchart depicting the fish feeding procedure, which operates during actual feeding times. It collaborates with various digital twin (DT) objects, integrating multifunctional AI capabilities, thus rendering the framework versatile for feeding management. We have integrated six DT objects deployed within the AI-empowered cloud to oversee fish feeding activities alongside the smart fish feeder machine.
Initially, the suitability of water quality conditions for feeding is assessed. The Water Quality DT is then invoked to evaluate the current water quality condition and provides feedback, which serves as the basis for initiating feeding. Once water quality approval is obtained, the smart feeding machine is controlled to release the initial batch of fish food. Additionally, other DT objects, such as the Fish Count Estimator, Fish Size Estimator, and Fish Weight Estimator, perform their respective AI functions as indicated by their names. All estimations generated by these three DTs are stored in the cloud's database for further data analytics, including growth predictions and optimizations.
As the water quality condition meets the criteria, the smart feeder dispenses an initial amount of bait. When the fish are released from the smart feeder machine, they retrieve the food from the water's surface, resulting in visible water splashes that become more pronounced as feeding progresses. Leveraging fish feeding behavior, the area with significant surface motion determines the current position of the fish within the cage. To ensure accurate food placement, we have integrated a Fish Feeding Position DT, which detects the current feeding area or zone and adjusts the smart feeder's gun barrel position based on the estimated feeding location. This optimization ensures that the feeder dispenses feed at the current feeding spot.
Additionally, the smart fish feeder is equipped with actuators for control and can receive information to assist with feeding, such as determining the feeding amount. Another DT object, known as the Fish Feeding Intensity Evaluator [
12], is employed to assess the fish's feeding intensity, categorized into four levels: strong, medium, weak, and normal. As the feeding intensity decreases from strong to weak, the amount of feed dispensed is reduced. When the observed fish feeding intensity reaches none, feeding ceases. The total feeding amount is also monitored and stored in the database, along with fish count information and average weight.
To minimize food wastage, the predicted food amount Ft is divided into n parts for fish feeding when we specify n intervals to complete the daily feeding process using the smart feeding machine. Essentially, for each part, our fish feeding control module issues a command to the feeding machine to dispense the quantity of food Ft⁄n into the target cage for fish feeding. However, this approach can lead to a high probability of food wastage when the current fish feeding intensity level is normal or weak. To address this challenge, we employ a Markov decision process (MDP) to model the decision-making process regarding the amount of fish food to be used for each part of the feeding. The MDP model for fish feeding management comprises the following components.
State: S is a set of finite states that describes the status of the fish feeding process. It contains two parameters i and to describe the part number and the accumulative amount of food, respectively. For the daily feeding time t, the values of and i are limited by the predicted food amount and the number of parts n, respectively. In other words, the states define the possible responses of the fish feeding machine to the possible input actions. To simplify the state set defined as of MDP as a lookup table, we discretize the continuous parameter into m levels, which define the accumulative amount of food, i.e., . The finite set S with the state is defined as the pair (food amount, part number).
Action: is a set of possible control actions which activate the feeding machine to send relative feeding rate f of the schedule food at state s. To simplify the control of our feeding machine, the relative feeding rate is quantized into two levels, i.e., 0 and 1, corresponding to two operation modes, i.e., to turn on and to turn off the feeding machine, for sending the scheduled food amount and zero food into the cage, respectively. On the contrary, the parameter e, i.e., the food eaten ratio, is quantized into 4 levels corresponding to the four fish feeding intensity levels, i.e., ‘Strong’, ‘Medium’, ‘Weak’ and ‘Normal’. In practice, the setting of e is based on the predicted results of the fish feeding intensity evaluation DT. For each action a, the MDP model will result in a state transition.
Reward:
r(
s,
s′) is reward received after transitioning from state
s to state
s′ , after applying the action
a. In this work, the optimal
Q-learning policy based on MDP tracks a desired fish feeding trajectory while penalizing the food eaten ratio (
e) for aquaculture cages. We formulate the reward function
r(
s,
a) for offshore cages as a regularized trajectory tracking error and food eaten ratio as follows:
where
is feeding ratio and
is the is a positive regularization term to assess the feeding input preference. The feeding ratio is defined as
where
is the amount of food sent into the cage by action
and
is the scheduled food amount. It is tuned empirically such that a good compromise between the fish feeding that leads to have a good growth performance and the feeding residues over the admissible space of policies. For fish feeding management, this reward formulation (4) minimizes the food consumption and penalizes the food residue with an optimal fish appetite profile.
Figure 10.
Q-learning feeding strategy framework.
Figure 10.
Q-learning feeding strategy framework.
To solve the MDP fish feeding management problem, we propose a
Q-learning algorithm based on the temporal difference method that learns
Q(
s,
a) function from raw experience for searching the optimal control policy based on reinforcement learning (RL), shown in
Figure 10. Once the function learned, the optimal action a could be selected using the
Q(
s,
a) function, which can be further represented as a
Q-table by discretizing the state and action parameters. As mentioned above, the state space
S is quantized into
states, where
m and
n defines the number of steps to accumulate the feeding food and the number of part the predicted food separated, respectively. Similarly, the action space consists of
actions, where
c and
d defines the number of quantization levels of feeding ratio and and the number of levels to describe the fish feeding intensity. At each feeding part
t, the fish feeding DT learns the fish feeding intensity level from the aquaculture environment’s response, then the temporal difference (TD) method that uses sampling experiences updates the action-value function:
where
α is the learning rate and
γ is the discount factor. Q-Learning is a class and an important breakthrough in RL and has been proven efficient. This approach takes an indirect future reward where a particular action leads to an account, thereby developing strategies to maximize the reward over a series of actions. The learned action-value function
directly approximates an optimal action-value function independent of the policy being followed. The policy determines which action-state pairs are visited and continuously updated to guarantee to find the optimal policy. The algorithm in procedural form is defined as follows:
Algorithm 1: Q-Learning Policy Control Algorithm |
Initialization: arbitrarily, and Repeat (for each episode): Initialize Repeat (for each step of the episode): Choose from using policy-derived (e.g., greedy) Take action , observe ; Until is terminal |
The Q-value, denoted as Q(s, a), represents the weighted sum of future rewards associated with a particular state, s, and action, a. At each time step, the agent selects the action with the highest Q-value for the current state. Whenever an action is executed in the environment, the resulting new state and its corresponding reward are recorded in the Q-Table, along with information about the previous state and action. For decision-making, a random batch of stored experiences, consisting of state, action, reward, and next_state tuples, is retrieved and used for training. The Q-value, Q(s, a), is computed for the chosen action, and it is updated using Equation (6), where α represents the learning rate, controlling the extent of adaptation to new information. The variable r denotes the reward obtained from action a, while signifies the maximum predicted reward for the new state, , considering all possible actions, . The Q-learning framework is employed to seek the optimal policy that maximizes the expected reward value. The weights and biases of the agent's neural network are fine-tuned using gradient descent to align the network's output with the updated Q-values. Exploration of the optimal feeding policy is based on selecting the action with the highest reward value for a given state. This mechanism allows the agent to undergo training with a randomly sampled set of past experiences at each time step.