Research on Measurement Method of Yak Body Size and Weight Based on Convolutional Neural Network and Binocular Vision

In order to solve the labor-intensive and time-consuming problem in the process of 1 measuring yak body size and weight in yak breeding industry in Qinghai Province, a non-contact 2 method for measuring yak body size and weight was proposed in this experiment, and key 3 technologies based on semantic segmentation, binocular ranging and neural network algorithm 4 were studied to boost the development of yak breeding industry in Qinghai Province. Main 5 conclusions: (1) Study yak foreground image extraction, and implement yak foreground image 6 extraction model based on U-net algorithm; select 2263 yak images for experiment, and verify 7 that the accuracy of the model in yak image extraction is over 97%. (2) Develop an algorithm 8 for estimating yak body size based on binocular vision, and use the extraction algorithm of yak 9 body size related measurement points combined with depth image to estimate yak body size. The 10 final test shows that the average estimation error of body height and body oblique length is 2.6%, 11 and the average estimation error of chest depth is 5.94%. (3) Study the yak weight prediction 12 model; select the body height, body oblique length and chest depth obtained by binocular vision 13 to estimate the yak weight; use two algorithms to establish the yak weight prediction model, 14 and verify that the average estimation error of the model for yak weight is 10.78% and 13.01% 15 respectively. 16


Introduction
Pastoral areas in Qinghai Province account for 96% of the total area of the province 19 and the yak in the province account for about 34% of yak population in the world, among 20 which Yushu Prefecture and Guoluo Prefecture have the largest number of yaks[1]. As 21 a unique cattle breed in the alpine region of Qinghai-Tibet Plateau, Yak mainly grow 22 in the region of Qinghai-Tibet Plateau in China with an altitude of over 3,000 meters 23 and an average temperature below zero. It is the main livestock and economic source of 24 herdsmen in the plateau region [2]. Qinghai Province has the largest number of yaks in 25 the world. After long-term reproduction and growing in natural closed environment in 26 various regions of the province, species resources such as Qinghai plateau yak, Huanhu 27 yak and Xueduo yak with high consistency in body shape and economic characters and 28 stable genetic performance have been formed [3]. In the protection and utilization of 29 yak resources, whether it is the investigation of variety resources, variety selection and 30 matching of yak, sale, calculation of ration and dosage, etc., it is necessary to take the 31 weight of yak as the basis [4]. Because yak is big and strong, manual measurement needs 32 the cooperation of multiple people to complete the statistics of the whole set of body size 33 data of a yak [5].
In 2003, Teng Guanghui and others began to use visual methods to measure the 35 weight and body size of pigs [6]. In 2006, Huang Junran of Hebei Agricultural University 36 completed the measurement of dairy cow's body size by using reference object calibra- 37 tion, and designed and accomplished the management system of dairy cow's image 38 and information [7]. In 2008, Bewley used the images of lactating cows to evaluate their 39 physical condition. When the dairy cows passed through the weighing station, the im- 40 ages of the cows were taken, and the USBCS and UKBCS physical condition evaluation 41 system were used to evaluate their physical condition [8]. In 2011, Tasdemir used digital 42 image analysis method to measure the body size data of Holstein cows, and established 43 the weight prediction model of Holstein cows based on fuzzy rules through regression 44 analysis of the measured data [9]. In 2013, in order to study the comparison of dairy Technology designed and implemented a sheep physical sign monitoring system using 51 the algorithm in OpenCV visual library [11]. In 2015, Yukako Kuzuhara et al. used ASUS 52 Xtion Pro sensor to measure dairy cow's back posture. By measuring and recording six 53 characteristic positions of the dairy cow, the linear regression algorithm was used to eval-54 uate the dairy cow's physical condition. This verified that the three-dimensional camera 55 system could be applied to cow body measurement and performance analysis [12]. In 56 2016, Doeschl et al. recorded the growth process of pigs by using a visual image analysis 57 system based on a single camera, and established the relationship between pig body size 58 parameters and time [13]. In 2021, Sun Zijie used machine vision to estimate yak weight, 59 and adopted traditional image processing methods to obtain yak foreground images [14].  The flow chart of the measurement method in this paper is shown in Figure 1. This study 69 can save time and labor costs, guide herders to manage yak reasonably, and provide a 70 guarantee for the sustainable development of alpine pastoral areas.

82
At present, many scholars have applied semantic segmentation algorithm to animal 83 image segmentation [15]. For non-contact measurement of weight and body size of 84 yak, firstly, the collected yak images should be separated from the background, and 85 then the weight and body size are estimated combining the binocular vision. We use 86 U-net algorithm to observe the overall situation firstly, roughly determine the location 87 of the target area [16], and then consider the more detailed information for further 88 judgment. This can make the segmentation results more accurate [17]. Therefore, the 89 U-net algorithm is used for yak image segmentation.

90
The U-net algorithm was adopted for semantic segmentation of the yak's fore-91 ground and background, with two categories [18]. The image was manually segmented 92 by Labelme, and the foreground and background of yak's image were separated to 93 obtain a json file in coco dataset format. The image shown in Figure 3 was obtained after 94 program processing. The data set was enhanced by increasing the sample size to 3,409 95 to improve the generalization ability. The obtained data set were randomly divided 96 into the training set, the test set, and the verification set according to the ratio of 7:2:1, 97 and the data set was input into the network of U-net [19]. The pre-training model of 98 the VOCdevkit data set was adopted and the iteration was performed for 200 times.

99
The random gradient descent method was used, and the learning rate was set to be 100 0.0001, Batch_size to be 8, and base_size to be 521521, then the final training model is 101 obtained [20]. increase the receptive field and improve the most obvious characteristics of the image 106 through maximum pooling [21,22]. In this way, the down-sampling process is completed,     as the corresponding matching point can be found on the image of right camera [32].

157
Before binocular ranging, it is necessary to calibrate the camera to obtain intrin-158 sic and extrinsic parameters [33]. We select Zhang Youzheng chessboard calibration 159 method [34], and obtain the focal length f, baseline distance B of the binocular camera as 160 well as the translation vector T and rotational vector R between two cameras through 161 calibration. Table 2 shows the data obtained by binocular calibration. There is a high correlation between the weight and body size of yak [35]. In this 169 study, the body height, body oblique length and chest depth of yak were selected for mod-170 eling by using method such as linear regression [36] and support vector regression [37].  Table 3. It can be seen that the body size data has a high correlation with 174 weight. In the data set, W, tg, tx, xw indicate the weight, the body height, the body 175 oblique length, and the chest circumference respectively. In the linear regression algorithm model, each data has n features, and each feature corresponds to its own weight value, and the product with the weight plus an offset value is the linear regression model [38]. The formula is as follows:

162
(1) If w 0 = b,x 0 = 1, then you can get: Now there are m samples to get the matrix represented as: Among them, the weight w is represented as: The model function of support vector regression model is also a linear function, but

185
The evaluation indexes of semantic segmentation mainly include three indexes:

195
PA is the pixel accuracy, that is the ratio of the number of correctly classified pixels to the number of all pixels is represented as: CPA is the category pixel accuracy, that is the accuracy of pixels that really belong to category i in category i prediction is represented as: MPA is the mean pixel accuracy, that is the average value of the ratio between the number of correctly classified pixels in each category and the number of all pixels in that category is represented as: MIoU is the mean intersection over union, that is the average value of IoU of each category (the ratio of intersection and union of the predicted results and true values of a certain category of the model) is represented as: In this study, the training was repeated many times, and the results of the model 196 prediction accuracy obtained are shown in Table 4.
197 Table 4. Evaluation of Model Prediction Accuracy.   The data of 10 yaks were randomly selected to show the results, as shown in Table   219 6.    Table 7 shows the average error and fitting equation of the 235 four models on 90 groups of test data.

237
In this paper, the images taken by binocular cameras are used as experimental The following abbreviations are used in this manuscript: