Estimating Full Regional Skeletal Muscle Fibre 2 Curvature from b-Mode Ultrasound Images Using 3 Convolutional-Deconvolutional Neural Networks 4

Direct measurement of strain within muscle is important for understanding muscle 8 function in health and disease. Current technology (kinematics, dynamometry, electromyography) 9 provides limited ability to measure strain within muscle. Regional fiber orientation and length are 10 related with active/passive strain within muscle. Currently, ultrasound imaging provides the only 11 non-invasive means of observing regional fiber orientation within muscle during dynamic tasks. 12 Previous attempts to automatically estimate fiber orientation from ultrasound are not adequate, 13 often requiring manual region selection, feature engineering, providing low-resolution estimations 14 (one angle per muscle), and deep muscles are often not attempted. Here, we propose 15 deconvolutional neural networks (DCNN) for estimating fiber orientation at the pixel-level. 16 Dynamic ultrasound images sequences of the calf muscles were acquired (25Hz) from 8 healthy 17 volunteers (4 male, ages: 25–36, median 30). A combination of expert annotation and 18 interpolation/extrapolation provided labels of regional fiber orientation for each image. We then 19 trained DCNNs both with and without dropout using leave one out cross-validation. Our results 20 demonstrated robust estimation of regional fiber orientation to within 5° error, which was 21 comparable to previous methods. The methods presented here provide new potential to study 22 muscle in disease and health. 23


Introduction
This paper presents a novel application of deconvolutional neural networks for estimating full regional skeletal muscle fiber orientation directly from standard frame rate (25Hz) b-mode ultrasound images.In recent years, ultrasound has become a valuable and ubiquitous clinical and research tool for understanding the changes which take place within muscle in ageing, disease, atrophy, and exercise.Ultrasound has been proposed [1] as a non-invasive alternative to intramuscular electromyography (iEMG) for measuring twitch frequency, useful for early the diagnosis of motor neuron disease (MND).Ultrasound has also recently demonstrated application to rehabilitative biofeedback [2].Other computational techniques have been developed for muscleultrasound analysis which would allow estimation of changes in muscle length during contraction [3], and changes in fiber orientation and length [4]- [8].Regional muscle fiber orientation (curvature) and length change when muscle is under active (generated internally through contraction) and/or passive strain (generated externally through joint movements or external pressure) [9], [10].Muscle fiber orientation is one of the main identifying features of muscle state [11].Below, we review the previous computational methods which attempt to automatically estimate fascicle orientation from ultrasound images of skeletal muscle.We do not consider feature tracking based approaches [8], [12], since within ultrasound feature tracking is an ill-posed problem, leading to tracking drift and feature dropout due low signal to noise ration and out of plane feature motion [13].
Previous attempts to automatically estimate fiber orientation directly from b-mode ultrasound images are inadequate, typically requiring manual region identification, a priori feature engineering, presumptuous problem approach (e.g.fascicles and connective tissues appear as straight lines), and/or typically providing low resolution estimates (i.e. one angle for an entire muscle region).
Further, to our knowledge, nobody has attempted the deep (more challenging) muscles.All previous methods depend on some parameterization of features such as Gabor wavelets [4], [5], and/or edge detectors and vessel enhancement filters [4]- [8], [14]- [16].In all cases, parameters are empirically chosen and/or are based on assumptions about how the descriptive features will present within an image.In the studies we are aware of, parameters are chosen presumptuously or empirically rather than tuned with a multiple-fold cross-validation set, and since the authors unanimously write that results are sensitive to parameter changes, the assumption is that real world performance may not be as optimistic as reported.
Zhou and colleagues [17] developed a method based on the Revoting Hough Transform (RVHT) which provides an estimate of the overall fiber orientation in a single muscle, the gastrocnemius in the calf.Based upon the incorrect assumption that the fiber paths are straight lines, they use the RVHT to detect an empirically predetermined number of lines within a manually defined region of interest and then take the median orientation of all detected lines as the overall orientation of the fibers.The approach of Zhou and colleagues is fundamentally limited to detecting straight lines, whereas muscle fibers do not always present that way.When observing muscle fibers using ultrasound, there are many other features which can appear as straight lines (blood vessels, noise/dropout, artefacts/reflections, skin/fat layers, connective tissues, etc…).These fundamental facts are a strong limiting factor to the potential of methods such as the RVHT and the Radon Transform [7].
Rana and colleagues [5] introduced a method which potentially allowed estimation of local orientation, although after suggesting this, the mean of local orientations is used to produce another method which provides an estimate of the overall fiber orientation.Local orientations were identified by convolving a bank of Gabor Wavelet filters with a processed version of the image.They compare their proposed method with manual annotations from 10 experts and the Radon Transform, which like the Hough Transform (or RVHT) is a method which can be used for detecting straight lines in images.Both the Radon Transform and Wavelet methods require a preprocessing step described as vessel enhancement filtering, which enhances anisotropic features within the image.The vessel enhancement method is parametric, where parameters were chosen empirically with no crossvalidation or evaluation of results over varying parameters.Their results showed that the Radon Transform was not significantly different to the expert annotations, whereas the Wavelet method was significantly different.Other than significance values, the only accessible results they report are mean differences (1.41°) between the Wavelet method and the expert annotations of real images, and mean differences between both the Radon and Wavelet method applied to synthetic images with known orientations (< 0.06°).Although the Wavelet method performed comparatively poorly, the authors rightly suggest that this approach has potential to allow tracking of fiber paths throughout the muscle, whereas Radon and Hough Transform methods do not.
Namburete and colleagues [6] expanded upon the methods of Rana and colleagues [5], developing the hypothesis that regional fiber orientation can be tracked using local orientated Gabor Wavelets and vessel enhancement filters, for the first time going beyond linear approximations to the overall orientation.The proposed extension is to convolve the local orientations with a median filter, which effectively smooths the local orientations.The fiber trajectories are then tracked between the muscle boundaries on a continuous coordinate system with linear (following the dominant orientation in a local 15 × 15-pixel region centered on the current fiber track) steps of 15 pixels.Finally, curvature is quantified using the Frenet-Serret curvature formula [18].Namburete's method differs significantly from all preceding methods because it provides an estimation of local fiber orientation, rather than a more global estimate.However, the authors do not provide an estimate of the error for example when comparing to expert annotations; instead they evaluate errors on synthetic images with known orientations, which are trivial.Further, they do not apply this method to the deeper muscles, which are even more challenging.
Several important problems have been identified as unaddressed from a review of previous methods for estimating full regional fiber orientation in multiple layers of muscle; most commonly, the lack of error evaluation on real data, estimating overall fiber orientation and not local orientation, limited application to superficial muscles only, and presumptuous justification for chosen parameters.We propose to address these problems by introducing advanced machine learning methods and cross-validation against expert annotation of real data.In recent years the popularity of machine learning (in particular neural networks) has surged since a number of successive algorithmic, methodological, and computational hardware developments were introduced [19]- [23].
The application of machine learning to estimating local fiber orientations has thus far not been considered.In recent years, the development of deconvolutional neural networks (DCNN) allows robust regression (heatmaps [24]- [26]) or classification (semantic segmentation [27]) of pixel level labels in full resolution images, without any feature engineering or preprocessing (i.e.all parameters are learned from data, and cross-validated against held-out validation and test sets).Therefore, we expect DCNNs to perform well at predicting fiber orientation heat maps from ultrasound images of muscle.For comparison we implement a version of the wavelet method described by Namburete, and colleagues [6], since no data are presented on the accuracy of their method on real ultrasound images.

Deep Learning Software
All neural network software was developed from scratch by the authors using C/C++ and CUDA-C (Nvidia Corporation, Santa Clara, California).No libraries other than the standard CUDA libraries (runtime version 8.0 cuda.h,cuda_runtime.h,curand.h,curand_kernel.h,cuda_occupancy.h,and device_functions.h)and the C++ 11 standard library were used.All DCNNs were trained on an Intel Xeon CPU E5-2697 v3 (2.60GHz), 64GB (2133 MHz), Nvidia with a TitanX (Maxwell) GPU.

Data Acquisition
Ultrasound image data were recorded at 25Hz (AlokaSSD-5000 PHD, 7.5MHz) from the calf muscles (medial gastrocnemius and soleus) of 8 healthy volunteers (4 male, ages: 25-36, median 30) during dynamic maximum isometric voluntary contractions.Images represented a 6cm × 5cm (horizontal × vertical axes) cross-section of the calf muscles.Volunteers lay prone on a physio bed with their right ankle strapped to a immobilised pedal.Volunteers were asked to push their toes down against the pedal as hard as they could.A dynamometer (Cybex) recorded the torque at 100Hz at the ankle during the contraction.Matlab (Matlab, R2013a, The MathWorks Inc., Natick, MA) was used to acquire ultrasound frames and a hardware trigger was used to initiate recording at the start of each trial.

Generating Ground Truth
Following data collection, we extracted all frames beginning one frame before contraction started, and ending with the frame when contraction peaked, where initial and peak contraction were identified manually.This resulted in a total of 504 images containing spatiotemporal variations in both fiber orientation and muscle thickness.Two experts were asked to manually identify fiber paths in all muscles/compartments in all images by marking a series of multiple-point lines/curves (typically 20-30 per muscle/compartment).The same experts were asked to identify the boundaries of the medial gastrocnemius (superficial muscle) and the boundaries and internal compartments (where visible) of the Soleus (deep muscle) by marking multiple-points lines/curves around the boundaries and visible intramuscular compartments.
To create labels for each pixel, first a blank image (matrix of zeros with equal dimensions to the ultrasound image) was created.Then, the angles along each of the annotated fiber lines were calculated, and for each pixel under the lines the calculated angle was stored.The result was an image of mostly zeroes and a few angled lines, where the value of the non-zero pixels represents the local angle of the line intersecting each pixel.Then, within each muscle/compartment (defined by the experts), nearest neighbor interpolation was used to fill the gaps between lines, followed by nearest neighbor extrapolation to fill the muscle/compartment.Between muscles/compartments, nearest neighbor interpolation was also used to fill the gaps.All other pixels (outside muscle, e.g.skin) were set to zero.Following, additional data were created to introduce variation in the data/labels and help prevent over-fitting, by randomly (uniform) rotating each image and corresponding labels/angles between -5° and 5°, thus doubling the size dataset to 1008.

Wavelet Filter Bank Method
The wavelet method was implemented faithful to the original paper [6].A bank of 180 Gabor wavelet filters was generated spanning 0°-179° in 1° increments using, where k is the kernel size (k = 20), d is a damping term (d = 51.243),α is the orientation, and f is the spatial frequency (f = 7).Initially, all parameters were set as originally described in [6], and after a first analysis, some parameters (f and k) were varied (f = 9, f = 11, f = 13, k = f*3-1) to give some scope to the potential of this method on real data.Vessel enhancement was performed using a Matlab version of the Frangi multiscale vessel enhancement filter [28] which can be found here [29].
Following [6], filtering was performed at 3 resolutions, 2, 3, and 4. Following a first analysis, a second analysis was done at resolutions, 4, 6, and 8, for all variants of the Gabor wavelet filters described above.
To analyse an image, the image was filtered with the vessel enhancement filter and then convolved with the entire wavelet filter bank.At each spatial location in the image, we take the α corresponding to the filter with the highest convolution in that location.The result is a map of local orientations.Following [6], the image was then convolved with a 35 × 35 median filter resulting in the final map of local fiber orientations.

Deconvolution Neural Network Method
Our primary concern when deciding on a DCNN architecture (number of layer and units per  ) and max-pooling (Pool.)layers, followed by a series of fully connected (FC.) layers, followed by a series of max-un-pooling (UnPool.) and deconvolutional (DeConv.).The green, blue and red arrows and numbers represent the vertical and horizontal axes, and the number of filters (or depth/color channels with respect to the input/output layers) respectively in each layer.This network has 1,286,656 units (neurons), and 4,983,552 trainable weights/biases (synapses).

DCNN Cross-Validation and Regularization
To optimize and test generalization (estimated real-world performance), leave one person out cross-validation was performed on a DCNN with a small amount of L2 weight decay, 5e -4 , to prevent the weights from growing too large.The weights of the DCNN were initialized using Xavier initialization [30].To train the DCNN we used a learning rate of 2e 2 and weight gradient momentum

Results
Following model optimization of the wavelet and DCNN methods, both methods were compared against the ground truth labels using a range of error measures, thus making the work accessible and comparable to others.In sections 3.1 and 3.2 results are presented respectively for the wavelet method and the DCNN method.With respect to data ranges, within the GM muscle, the maximum range of angles within a single muscle region was 34.84°, whereas the maximum range of angles over all participants was 76.58°.Within the Sol muscle (compartment 1), the maximum range of angles within a single muscle region was 35.02°, whereas the maximum range of angles over all participants was 64.92°.Within the Sol muscle (compartment 2), the maximum range of angles within a single muscle region was 65.15°, whereas the maximum range of angles over all participants was 79.05°.

GM
Sol (1 st compartment) Sol (2 nd compartment)  The wavelet method was evaluated using the optimal parameters (within the range of parameters investigated) for each muscle/compartment, separately.To find the optimal parameters within each muscle, the local orientation over each pixel identified by the wavelet method was compared to each pixel of the ground truth by root mean square error (RMSE) (see fig. 3).Within the parameters we investigated, for the GM muscle, smaller wavelet filters resulted in smaller RMSE for the higher-resolution Frangi filter (res= [4,6,8]), conversely larger wavelet filters results in smaller RMSE for the lower-resolution Frangi filter.Within the parameters we investigated, the first compartment of the Sol muscle yields best results with the higher-resolution Frangi filter and larger wavelet filters.Conversely, within the parameters we investigated, the second compartment of the Sol muscle yields best results for smaller wavelet filters and lower-resolution Frangi filter.The models with the lowest RMSE were compared to ground truth using mean difference (MD) and mean absolute error (MAE), to be compared with the DCNN method (section 3.2).
For the optimal models within each muscle/compartment, the deviations of predicted local fibre orientations from ground truth were much too high to be useful in any real context, particularly in the Sol (compartment 1) muscle, which reports an optimal RMSE of over 26° (see fig. 3).Discrepancies were much lower in the GM and Sol (compartment 2) muscles at 11.3° and 7.1°, respectively.Visual comparisons were also made by comparing the wavelet analysis, presented as a heat-map, to the ground truth, also presented as a heat-map (see fig. 4).Visual comparison revealed quite sporadic agreement between wavelet predictions and ground truth.Visual comparison also revealed that any agreement between wavelet predictions and ground truth was not only heavily dependent on the presence of well-defined muscle fibre tracts, but also on the ability of the Frangi filter to extract those features.
Application of the wavelet method was computationally expensive using the Matlab framework, with full regional analysis (whole image including all muscles) taking between one and two hours to process all 504 images for a single set of parameters.We note that this can be significantly improved upon with the use of graphics processing units (GPUs) and/or a full compiled language (such as C/C++).

DCNN Method
The DCNN method was evaluated using the optimal models as identified by cross-validation (see sec.2.6).Following, the output heat-map from the DCNN for all 504 images was compared to the ground truth using mean difference (MD) and mean absolute error (MAE), to be compared with the wavelet method (see sec.3.1).Visual comparisons were also made by comparing the DCNN predictions, presented as a heat-map, to the ground truth, also presented as a heat-map (see fig. 5).
The most immediate result of visual comparisons was the ability of the DCNN to 'color-code' the correct regions; the GM and Sol (compartment 1) muscles typically present with opposing fibre orientations, with negative angles in the Sol muscle and positive angles in the GM muscle.In almost all images the DCNN was able to identify the Sol and GM muscles simply by predicting negative and positive angles respectively for each compartment.Furthermore, this was broadly true for entire muscle regions, even where there were no visually identifiable fibre tracts (see fig. 5).   2  23.09° ± 13.61° 12.14° ± 8.75° RMSE 3  26.80°14.97° Sol (2nd Compartment) MD 1  1.61° ± 6.92° 11.79° ± 10.86° MAE 2  5.03° ± 5.01° 13.33° ± 8.90° RMSE 3  7.11° 16.03° 1 mean difference. 2mean absolute error. 3root mean square error.

Discussion
In this paper we reviewed the state of the art computational methods for estimating regional fibre orientation from ultrasound images of skeletal muscle.We found that the previous approaches were fundamentally lacking in their comparison to real ultrasound data, thus rendering them incomparable unless implemented.We also found that previous methods were targeted at estimating global/overall muscle fibre orientation by using some line detection technique such as the Hough or Radon transform [5], [7], [14], [17].Fibres are known to curve within a muscle, and here we have also provided data in support of this, based on expert identification of fibre tracts within 3 muscles/compartments over 8 people, which show how fibre orientation is different within a muscle/compartment (see paragraph 1 of Results section).
In review of previous methods we found an approach which demonstrated potential to provide estimates of local orientation [5], [6].The authors did not present any comparable data to assess the performance of this method on real ultrasound data, only reporting that there were significant differences between their method and expert annotations of fibre orientation in real ultrasound images.The authors also highlight weaknesses in their approach such as the need for manual region identification, where the region selected should contain visibly well-defined fibre tracts.We recognized the potential for this method to fail in the presence of noise, artefacts, other quasi-line structures (blood vessels, connective tissues, etc.), and other muscles where fibres do not present as clearly, such as the deep muscles.
We proposed the application of a deep learning method known as deconvolutional neural networks (DCNN).We identified that such a method is capable of robust regression (heatmaps [24]- [26]) or classification (semantic segmentation [27]) of pixel level labels in full resolution images.The advantage of a deep learning approach is that the model can learn to identify fibres in well-defined regions, interpolate between well-defined regions and extrapolate from regions to fill the muscle/compartment with predictions of local fibre orientation.Deep learning models can also learn to ignore regions with spurious (non-fibrous) features.After training a DCNN using 8-fold crossvalidation and early stopping using a validation set, we compared the results to the wavelet method applied to the same data.Results were compared within specific muscle regions (identified by two experts) since large variations were expected due to the challenging textural appearance of the deep muscles (Sol).Results revealed that both methods showed large deviations from the ground truth, however the DCNN method gave much more accurate predictions for the deep muscles than the wavelet method.Those predictions were also much more consistent as identified by standard deviations of the error metrics (see table.1).Regardless of the similarity in performance, we suggest that the DCNN method could be improved with more sophisticated regularization (e.g.dropout and residual connections), or with additional labelled data.In contrast, although figure 3 indicates that there is scope to tune the wavelet method, it is much harder to do so in such a way which generalizes and deals with all of the problems we identified as well as the original authors (image artefacts, spurious fibre features, noise, reflections, etc.).

Conclusions
In this paper have we provided the first comprehensive analysis of an existing and a novel computational method for estimating full regional fibre orientation from ultrasound images of human skeletal muscle.We have proposed a novel application of deep learning to a long-standing and challenging problem, and demonstrated state of the art results.We present analyses in a form which is comparable to any future developments, and we also publish our ultrasound and ground truth data to support this end.The application of DCNNs to this problem has opened up new potential to hi-resolution analysis of skeletal muscle, from prediction of motion maps to segmentation of muscles and other structures of interest.This paper provides further evidence that deep learning methods can surpass state of the art performance, even when there is not an abundance of labeled data available.With additional data we propose that this project could easily be extended successfully, and this preliminary muscle analysis step could very likely form part of a skeletal muscle analysis system which accurately predicts the passive and active muscle forces non-invasively directly from single ultrasound images and sequences.Such a contribution could enable early diagnoses of diseases such as MND, and would enable personalized musculoskeletal medical diagnosis, monitoring, treatment targeting, and care.

Figure 1 .
Figure 1.Image annotation.(a) shows the raw ultrasound image, (b) shows expert annotated line traces of visible fibre paths, (c) shows nearest neighbor region filling of annotated fiber paths within annotated muscles/compartments, (d) shows line traces over the filled regions.The colors in (b)-(d) represent the local fibre orientation in degrees (see colour bar in (c)).
layer) was having a large enough model to learn the training set.Our secondary concern was training time, since we planned to execute an 8-fold cross-validation.Addressing the latter concern, both input images and labels were down-sampled (bilinear interpolation) to 128 × 128 pixels.Addressing the former concern, the DCNN implemented 16 convolutional filters in the input layer, with 4 additional convolutional layers, each with double the number of filters in the preceding convolutional layer.Between convolutional layers, the spatial dimensions were down-sampled (maxpooling) by a factor of 2. A dense fully connected layer of 512 nodes was fully connected to an initial up-sampling (2 × 2 max-un-pooling) layer, followed by a deconvolution layer.An additional 4 upsampling with deconvolutional layers following, completes the network architecture, with the final layer being a full 128 × 128 resolution regression map.Every layer consisted of rectified linear units (ReLU), with the exception of the final (output) layer, which consisted of linear units.The DCNN was trained by minimizing mean square error (MSE) between the labels and the output layer using online stochastic gradient descent (batch size of 1 sample).

Figure 2 .
Figure 2. DCNN schematic.On the far left the grayscale input image is connected to a series of convolutional (Conv.)and max-pooling (Pool.)layers, followed by a series of fully connected (FC.)

of 9 .
5e -1 , for a maximum of 200 full batch iterations (approx.200,000 weight updates).All learning parameters were chosen empirically on a small fraction of the training data.The DCNN was trained 8 distinct times (one for each person), each time reinitializing the network and leaving a different person out for testing, and randomly splitting the remaining data (7 people) into validation and training sets (10% and 90% respectively).Prior to training, each pixel of the input images were normalized over the training set to zero mean and unit variance, and each pixel of the output images were simply divided by a constant (45) so that each pixel fell between -1 and +1.Validation and test errors were observed periodically, and the test error recorded at the lowest validation error (early stopping method) gave results for the validation set and the test set (estimated real-world generalization error) for the held-out person.Test results for all held-out participants were combined to reveal the performance of the DCNN.

Figure 3 .
Figure 3. Wavelet parameter tuning.This graphic depicts how the error changes with respect to the parameters of the Frangi filter (res), and the Gabor wavelet filters (k).

Figure 4 .
Figure 4. Representative wavelet visual result.(a) shows the raw image, (b) shows the vesselenhanced image, (c) shows the ground truth, (d) shows the result of Gabor wavelet convolution with (b), (e) shows the result of a 35 × 35 median filter applied to (d), (f) shows line traces of (e).It is clear, when comparing (e) to (c), that the heatmaps partial match within the GM region, but almost not at all in the Sol region.The line trace visually confirms that the fibre traces are roughly aligned with the visible fibres in the raw image in the GM muscle, and the second compartment of the Sol muscle only.

Figure 5 .
Figure 5. Representative DCNN visual result.(a) shows the ground truth, (b) shows the output from the DCNN, (c) shows line traces of (b).Compare (b) to (a), and notice that the heatmaps differ around the deep aponeurosis (boundary between Sol and GM).The line traces confirm this, by depicting visually accurate fibre traces throughout each muscle/compartment until the deep aponeurosis, where curvature starts to increase rapidly.