Line Extraction and Player Tracking in Tennis Videos

Abstract: Sports video analysis is an interesting and emerging field in the domain of computer vision. It has become a very popular subject among scholars all over the world in the last decade due to the need of automated classification of events such as offsides, in/out situations, goals and so on. The most important problem in sports video analysis is detection and tracking of ball and players. Extraction of court lines is also another important problem since it is necessary in the classification of aforementioned events. In this study, considering some of the previous work on the subject, we develop a method for tracking the ball and players in tennis videos where the camera position and orientation is supposed fixed.


Introduction
Assuming a fixed camera (intrinsic and extrinsic parameters are constant in all frames), our method presented here consists of four major stages: -The background extraction stage is the process of finding the background frame of a given video.The background frame should include all moving objects and no other.So in an ideal background frame we should have the ball and the players.In this stage we use the median operator to compute the average of each pixel in the video.Since the players and the ball are moving in most of the frames, the median operator should give us a satisfying result.
-The line extraction stage deals with the background image obtained in the previous stage and tries to extract the lines out of it.Although previous work on the subject relies heavily on Hough transform in line extraction [4], morphological operations and thresholding are used because of their speed advantage over Hough transform [5].
-The third step is player segmentation.For each frame, the background image is extracted from the frame, then a series of morphological operations are applied to the resulting image.This yields a picture with only the player, and some noise.The noise is eliminated by simple thresholding and the resulting image is binarized [5].
-The last step performs player tracking on the binarized image.We use a bounding box to locate the player and the information from the previous frames is used to compute the current position of the player.Some constraints on the size and position are applied to the bounding box.
Although this subject is getting more popular over time, there hasn't been any attempts in building a test database of tennis videos.So, to test our method we have come up with 2 videos which contain difficult scenes: different illumination in different parts of the field, different intensity in line regions, player shadows, moving people (like ball boys), etc.

Background Extraction
Background extraction is the first logical step in tracking of moving objects.The aim of background extraction is to come up with the static objects in the scene so that after substracting the background image from each frame, one can find the moving objects in the frame.[1] discusses the efficiency of different background extraction techniques.To extract a background from a series of frames, the following is done: for each pixel in the background image, a median filter is formed from every n-th frame in the sequence from the pixels corresponding to the pixel in the background image.This can be thought as a median filter over the time dimension.This technique yields good results if the moving objects in the scene don't stay at the same position for a long time.Since, in tennis videos, during a service, the players don't change their positions too much, this results in

Line Extraction
Before explaining how the line extraction phase is accomplished, we need to explain some of the basic morphological tools that we used in the process.

-Mathematical Morphology and Basic Morphological Operations
The mathematical morphology theory (MM), is a powerful image analysis framework based on geometry, nowadays fully developed for both binary and greyscale images.Its popularity in the image processing community is due mainly to its rigorous mathematical foundation as well as inherent ability to exploit the spatial relationships of pixels.The morphological framework provides a rich set of tools able to perform from the simplest to the most demanding tasks: noise reduction, edge detection, segmentation, texture and shape analysis, etc.As a methodology, it has been applied to almost all domains dealing with digital image processing.Consequently, it was only a matter of time before attempting to extend the same concepts to colour and more generally multispectral images [6][7][8][9][10].Let f: E → T be a greyscale image, with E the discrete coordinate grid while T represents the set of possible grey values.In the present case, a multi-spectral image of n channels is considered as f: with denoting its i th channel.
In shortly, MM studies the transformations of an image, when it interacts through operators with a matching pattern B, called structuring element (SE).In defining the two basic operations, dilation and erosion we will consider B as a subset of E. Let's define these two operations respectively: A multitude of operators is then derived from dilation and erosion; such as opening and closing, used extensively with smoothing ends and defined respectively as: In general we use the opening and closing transforms in order to isolate bright (opening) and dark (closing) structures in images, where bright/dark means brighter/darker than the surrounding features in the images.

3.2-The Top-Hat operator
The "Top hat" operator results from the theory of mathematical morphology and allows peaks of intensity to be extracted from an image.
This filter can be used by supposing that the required lineaments are brighter than the environment.The main advantage of this filter is to be able to detect an over brightness even if the environment is not uniform.Moreover it is possible to regulate the size or the width of the over brightness very easily.The principle is based on the subtraction of an image from its ``opening''.The opening consists of an erosion followed by a dilation, the size of the structuring element being conditioned by the width of the lineament to be detected.So, the extraction of lines is actually reduced to the careful choice of this structuring element.

Original signal
Opening applied to signal Subtraction (Original image -Opening)

3.3-Line Extraction Method
Given the background image, we begin by transforming the image from RBG color space to HSV color space.The aim here is to separate the luminance of the colors present in the frame from their color.Since the correlation between pixels in the RGB space is too high we need another space to work with.The V channel of HSV space is such a space.We then apply a top-hat operator with a structuring element chosen as disk on the background image.This brightens up the lines in the image and can be thought as an enhancing stage.Denote this image as E. Weighted mean thresholding is applied to E, we obtain a binary image B, which will be used later as a mask.
The next step is to extract horizontal lines.With 20x20 horizontal structuring element we apply gray-opening-by-reconstruction (GOBR) to E. Then we do an opening with a structuring element of 150 horizontal pixels and one with a structuring element of 3 vertical pixels.What results is the horizontal lines longer than 150 pixels and 3 pixels wide.
Vertical lines are detected similarly: we apply GOBR to E, then we do an opening with a structuring element of 75 vertical pixels and one with a structuring element of 3 horizontal pixels.
Vertical and horizontal lines are summed in an image called L. To L we do the following: To get rid of unwanted lines, we apply another weighted thresholding, then by using closing operator we connect the unconnected line segments.By using B we apply a binary reconstruction by dilation and then do 2 openings as a 'cleaning' step: One with 3x3 vertical se and one with 3x3 horizontal se.
So to sum up these steps: 1. Transform background image to HSV and take the V channel.Call it V.
Apply top hat with 7x7 disk se to V.Call it E 2. E thresholded.Call it B.

Player Segmentation and Tracking
Player segmentation stage starts with substracting the background frame from the current frame and obtain a foreground frame (Fg).Then an opening operator with a 3x9 structuring element is applied to Fg.The result is thresholded (mean of image is calculated and pixels brighter than the mean are eliminated).The result for a frame is given in fig.4(a,b,c,d).

Conclusion
We have presented a scheme for detecting players and lines in tennis videos.The method uses common image processing tools such as the median filter, thresholding,