The first three steps are illustrated in Figure 1, and will be described in the following subsections.
Figure 1.
Example output obtained after each step of the proposed method on a test image. Each curve is shown in a different color.
3.1. Guidewire Segmentation
This step uses a deep CNN to obtain a good guidewire segmentation. The quality of the segmentation will be reflected in the quality of the obtained localization result, which is why we aim for the best possible segmentation.
For that reason, we introduce a novel segmentation method called MSLNet, described below.
3.1.1. Proposed MSLNet Segmentation Architecture
The proposed guidewire segmentation architecture is illustrated in
Figure 2. It contains a ResNet (or other type of) feature extractor ) and two convolution filters,
of size
and
of size
, with
in our experiments.
Figure 2.
Diagram of the proposed MSLNet guidewire segmentation architecture.
Figure 2.
Diagram of the proposed MSLNet guidewire segmentation architecture.
The MSLNet segmentation method consists of the following steps:
- (1)
From an image of size , the ResNet is used as an encoder to extract a feature map of size with .
- (2)
An initial segmentation is obtained from the feature map using the convolution kernel , which produces a map of size . Each dimensional vector from this map is reshaped to a patch and placed at the corresponding location in a grid of patches, which together form the initial segmentation of size .
- (3)
From the feature map , a coarse segmentation of size is also obtained using the convolution kernel .
- (4)
The final segmentation
is obtained as
, where
is the indicator function and
resizes the input
to make it
z times larger in each direction, without interpolation, thus
The whole process is summarized in Algorithm 1 below, where the number of channels used in this paper is .
Observe that this approach requires the input image dimensions to be divisible by z. If that’s not the case, the image is padded with zeros to make it divisible.
It is worth noting that this architecture directly predicts the segmentation from the encoded representation without many decoder layers and without skip connections. This reduces the number of trainable parameters and the depth of the CNN, but faces some overfitting issues that are addressed by the coarse segmentation branch .
This approach can be thought as a Marginal Space Learning (MSL) approach [
18,
19], where the marginal space is the space of coarse segmentations
, which is
times smaller than the final segmentation space. Only the
patches corresponding to locations where
are expanded to a fine segmentation, the rest are just set to zero. This is the reason why this approach is called MSLNet.
3.2. Training the MSLNet
Training is done end-to-end using a two-term loss function that encourages a good coarse segmentation
and a good final segmentation
. This is in contrast with [
10], where the coarse segmentation and the UNet are trained separately.
The trainable parameters consist of the ResNet feature extractor parameters and the two convolution kernels .
Given a training example
with input
and target binary segmentation
, the coarse target
is first constructed as a binary indicator for the grid of
patches whether they contain at least one guidewire pixel:
After constructing
, the training loss function for an observation
has two parts,
the coarse segmentation loss
and the fine segmentation loss
, where
is the ResNet feature extractor and ’*’ is the convolution operator.
Inspired by [
3], who combine the Dice and BCE losses, the coarse segmentation loss is the sum
of the Dice loss and the weighted BCE loss. The Dice loss is
where the sums are taken over the coarse pixels, the function
is the sigmoid, and
is a tuning parameter (
in our experiments).
The weighted binary cross-entropy (BCE) loss is
where
are the positive pixels of the coarse target
and
are the negative ones.
The fine segmentation loss is also the sum of the Dice loss and the weighted binary cross entropy (BCE) loss:
where here
and
with
as defined in Equation (
1).
By restricting the fine segmentation loss only to patches where , we make sure that the training data is more balanced, since in this case the percentage of foreground pixels is about , as opposed to when considering the entire image when the percentage of foreground pixels is about .
However, due to inaccuracies in the annotation, the BCE fine segmentation loss might not be the best choice because it is not very robust to labeling noise. For that reason, we also experimented replacing
with the Lorenz loss [
20]:
where
is the ReLU and
are the same as for Equation (
7). This loss is more robust to labeling noise because it penalizes a mistake less than the BCE loss.
3.3. Initial Curve Extraction
To extract the initial curves, the thresholded segmentation result is processed using the thinning morphological operation so that each pixel of the obtained output has a small number of neighbors, enabling the extraction of the initial curves as pixel chains. Thinning [
21] is an iterative morphological algorithm that is applied to a binary image until convergence, and aims to find the centerline of a strip of pixels. In our experiments, we used Matlab’s
bwmorph with the thinning option and scikit-image’s
thin with identical results. We also experimented with two other related morphological operations: skeletonization and medial axis, but observed that thinning obtained slightly better results.
To extract the pixel chains as curves, first the 8-neighbor graph is constructed with V as the positive pixels of the thinned segmentation. On the thinned segmentation result, most nodes of this graph have degree 2 and some have degree 3. Nodes with degrees more than 3 are very rare.
The rest of the curve extraction is described in Algorithm 2 below.
Lines 6-17 extract the initial curves as maximal chains C containing a node i of degree 2. Observe that because it is a chain, each curve C induces an ordering of its nodes, ordering that is unique up to its reversal.
3.4. Perceptual Curve Grouping
Perceptual curve grouping takes the curves extracted in sec:extract and merges them into longer curves using a continuation measure. When two curves are merged, the pixel ordering for one of them might need to be reversed to obtain a consistent ordering for the merged curve. The whole perceptual grouping algorithm is described in Algorithm 3, with its components being described below.
In Algorithm 3, end curve directions are estimated using PCA for each curve, and are used for the curve continuation measure.
So, for n curves, there are PCA models, with models and corresponding to curve . Model is built from the first k points of the curve, as illustrated in Figure 3, while model is built from the last k points. If the curve is less than k points long, the PCA models are estimated from all curve points. We used in experiments.
Figure 3.
The end curve models are PCAs constructed from the first and last pixels of each curve (shown as blue ellipses) and aligned to point outwards from the curve.
Figure 3.
The end curve models are PCAs constructed from the first and last pixels of each curve (shown as blue ellipses) and aligned to point outwards from the curve.
The directions are then aligned in step 6 to point outwards from the curve by making them point towards the respective end of the curve. To align a direction with mean to point towards , first is computed. If , then is already aligned. If , then the direction is reversed: .
The point-direction pairs are checked in line 10 to be within a distance range and an angle alignment. The angle alignment checks that the angles between and , and between and , are less than , corresponding to in line 10 of alg:grouping.
For the pairs that pass the check, a continuation measure is computed as based on fitting a degree 3 polynomial , as specified in alg:poly and illustrated in Figure 4.
Figure 4.
A degree 3 polynomial is fitted between the points and on the coordinate system with the x-axis connecting the two points.
Figure 4.
A degree 3 polynomial is fitted between the points and on the coordinate system with the x-axis connecting the two points.
For that, a coordinate system is constructed, centered at with x-axis towards , thus the x-axis is and the y-axis is .
Then a degree three polynomial is fitted analytically to go through and be tangent to as described in alg:poly. One can easily check that , so the continuation matrix M is symmetric.
The curve ends are matched using the Hungarian algorithm [
22] and the matches with cost
are discarded.
The matches are validated so that only pairs such that i is matched to j and j is matched to i are kept, as described in alg:validate. This step is essential, since the curve merging step would fail without it.
Then the curves are merged based on the validated endpoint matches, as described in Algorithm 4. The function reverses the points of a curve C.