Garments Texture Design Class Identification Using Deep Convolutional Neural Network

Garments Texture Design Class Identification Using Deep Convolutional Neural Network S.M. Sofiqul Islam 1,†,‡*, Emon Kumar Dey 2,‡, Md. Nurul Ahad Tawhid 3,‡ and B. M. Mainul Hossain 4,‡ 1 Affiliation 1; bit0310@iit.du.ac.bd 2 Affiliation 2; emonkd@iit.du.ac.bd 3 Affiliation 3; tawhid@iit.du.ac.bd 4 Affiliation 4; raju@du.ac.bd * Correspondence: bit0310@iit.du.ac.bd; Tel.: +880-152-140-9419 † Current address:Institute of Information Technology, University of Dhaka, Dhaka 1000, Bangladesh; ‡ These authors contributed equally to this work. Academic Editor: Version March 21, 2016 submitted to Computers; Typeset by LATEX using class file mdpi.cls Abstract: Automatic garments design class identification for recommending the fashion trends is 1 important nowadays because of the rapid growth of online shopping. By learning the properties 2 of images efficiently, a machine can give better accuracy of classification. Several methods, based 3 on Hand-Engineered feature coding exist for identifying garments design classes. But, most of the 4 time, those methods do not help to achieve better results. Recently, Deep Convolutional Neural 5 Networks (CNNs) have shown better performances for different object recognition. Deep CNN 6 uses multiple levels of representation and abstraction that helps a machine to understand the types 7 of data (images, sound, and text) more accurately. In this paper, we have applied deep CNN for 8 identifying garments design classes. To evaluate the performances, we used two well-known CNN 9 models AlexNet and VGGNet on two different datasets. We also propose a new CNN model based 10 on AlexNet and found better results than existing state-of-the-art by a significant margin. 11


Introduction
Forecasting fashion trends have a great business value in fashion industry.A fashion forecaster predicts the colours, fabrics and styles for the upcoming seasons.One of the main resources of identifying trends is to analyse the choices of the customers.Online shopping is popular nowadays.
Customer select products from the web pages according to their choice and that can help to predict the direction of trends.If a retailer knows popular design styles of clothing products, it can increase the production of those styles to achieve more profit.Therefore, if a system can classify the garments products according to different style, texture, size etc., it can automatically suggest different products to the customers based on their choices.In this paper, we have developed a system which can classify clothes according to textures.
Effective design classification based on textures, local spatial variations of intensity or colour in images has been an important topic of interest in the past decades.A successful classification, detection or segmentation requires an efficient description of image textures.To fulfill this purpose, lots of well-known Hand-Engineered feature extraction methods such as CENsus Transform hiSTogram (CENTRIST) [1], Local Binary Pattern (LBP) [2], Gabor [3], Histogram of Oriented Gradient (HOG) [4], GIST [5] etc., are exist.LBP gain popularity because of their computational Submitted to Computers, pages 1 -14 www.mdpi.com/journal/computerssimplicities and better accuracies.But, it is very sensitive to uniform and near uniform region.LTP [6], Completed Local Binary Pattern (CLBP) [7] can handle this issue more accurately.Between these two methods, CLBP is better choice because this method is rotation invariant [8].CENTRIST [1] has gain popularity by incorporating Spatial Pyramid (SP) structure.But, most recently Completed CENTRIST (cCENTRIST) and Ternary CENTRIST (tCENTRIST) [8] gained high accuracies for garments design classification.Although several Hand-Engineered feature extraction approaches exist for garments design classification, deep learning [9] is rarely used in this field.Our goal is to apply appropriate deep learning model to measure the performance in garments design identification based on textures.
In recent year, deep learning has become popular in the field of machine learning and computer vision.Using large architectures with numerous features, many deep learning models achieve high performance in the field of object detection [10], text classification [11], image classification [12], face verification [13], gender classification [14], scene-classification [15], digits and traffic signs recognition [16], etc.Some of the available deep learning models are AlexNet [12], VGGNet [17], Berkeley-trained models [18], Places-CNN model [15], Fully Convolutional Semantic Segmentation Models (FCN-Xs) [19], CNN Models for Salient Object Subitizing [20], Places CNDS models on Scene Recognition [21], Models for Age and Gender Classification [22], GoogLeNet model [23], etc.The rest of the paper is structured as follows.Section 2 discusses the background studies, Section 3 discusses the working method, Section 4 presents the experimental results and Section 5 concludes the work.

Background Studies
In this section, we discuss about some existing garments clothing segmentation and classification strategies with existing descriptors.We also describe some deep learning models; that are used for several applications in computer vision.

Garment Product Segmentation and Identification
Yamaguchi et al. [24] proposed a method for clothing parsing.For this work, they create Fashionista dataset, that consist of 158,235 images.From this, they selected 685 images for training and testing.They identified 14 different parts of a body and different clothing regions.For pose detection they used a method described in [25].In [26], they deal with clothing parsing problem using retrieval based approach.Their proposed approach focus on pre-trained global clothing models, local clothing models, and transferred parse.Authors found out that their proposed final parse achieve For computing HOG features [28] from each cell the orientations are grouped into nine bins.For concatenating the projection histogram and HOG features they used multiclass linear support vector for training.Serra et al. [29] did similar type of work, where authors used conditional random field (CRF) for divided outfits.
Vittayakorn et al. [30] used five different features such as color, texture, shape, parse and style descriptor to identify three different visual trends, namely floral print, pastel color and neon color from runway to street fashion.However, using more color and design classes would be more beneficial in this field.Kalantidis et al. [31] proposed a method for identify relevant products.From an input image first they estimate the pose of the person, then segment the clothing area such as shirt, tops, jeans, etc. Finally they applied an image retrieval technique which is 50 times faster than [24] for identifing similar clothes for each class.
Gallagher et al. [32] used grab cut algorithm [33] for clothing part segmentation for identifying a person.Authors successfully extracted the region of interest (ROI) which is only torso region; using this method.Bourdev et al. [34] proposed a new method for attributes and type of cloth identification from an input image.Here attributes are gender, hair style and types of clothes such as t-shirts, pants, jeans, and shorts etc.For this work, they created a dataset consisting of 8000 people images with annotated attributes.

Texture Based Classification
Nowadays garments design classification based on texture become more popular and there are several existing well known methods such as Histogram of Oriented Gradients (HOG) [4], Local Binary Pattern (LBP) features [2], Wavelets transform [35], Noise Adaptive Binary Pattren (NABP) [36], Gabor filters [3], Scale-invariant feature transform (SIFT) [37].Recently, LBP become popular which was proposed for describing the local structure of an image because of computational simplicity.It has been used in several areas such as facial image analysis, including face detection [38], face recognition and facial expression analysis [39]; demographic (gender, race, age, etc.) classification [40], [41]; moving object detection [42], etc.However, LBP is very sensitive in uniform and near uniform regions.In the last few years, lots of researches are done by modification of LBP such as derivative-based LBP [43], dominant LBP [44], Rotation Invariant [45], center-symmetric LBP [46], etc. to improve the performance.
Tan and Triggs [6] proposed a new texture based method Local Ternary Patterns (LTP), which can tolerance noises up to a certain level.They used a fixed threshold (±5), for made LTP more discriminant and less sensitive to noise in a uniform region.Beside this there are several methods that also handle noises in different application areas, such as the methods described by Jun et al. [47].They proposed Local Gradient Pattern (LGP) for texture based face detection.This method is a variant of LBP and uses adaptive threshold for code generation.
Guo et al. [7] proposed a new method Completed Local Binary Pattern (CLBP), which incorporates sign, magnitude and center pixel information.This method also rotation invariant and capable of handling the fluctuation of intensity.We use CNN models AlexNet and VGG_S because of computational simplicity, better performance in several areas and they work well on unsupervised dataset.These two models can handle over-fitting problem when working with large dataset by using data augmentation technique.
Besides, these two models use a recently-developed regularization method called "Dropout" that is proven to be very effective.Also, these two models gain significant results in challenging benchmarks on image recognition and object detection.Brief descriptions about these two models alongside our proposed model are described in the following sub-sections.

Krizhevsky et al. [12] proposed a deep convolutional neural network model, which is known as
AlexNet model.There are three types of layer such as Convolution layer, Pooling layer and Fully-Connected (FC) layers.Full architecture of this model is created by combining these three layers.
In this architecture, there are total eight learned layers: five convolutional layers and three fully connected layers.Convolution layer is the core building block and each of those convolution layer consists of some learnable filters and filters size is different from one another.Full AlexNet model architecture shown in Figure 2.

VGGNet Model
Chatfield et al. [17], based on Caffe toolkit [51] proposed three different architectures of deep CNN models VGG_F, VGG_M and VGG_S each of which explores a different speed/accuracy trade-off: (1) VGG_F: This CNN architecture is almost similar to AlexNet.But VGG_F contain smaller number of filters and small stride in some convolutional layers.
(2) VGG_M: It is a medium size CNN which is very similar proposed by Zeiler et al. [52].First convolution layer has smaller stride and pooling layer.4th convolution layer use smaller numbers of filters for balancing the computational speed.
( Between AlexNet and VGG_S models, the main difference is that VGG_S model has small stride in some convolutional layers and pooling size is large attached with the 1st and 5th convolutional layer.Here, fully-connected layers 6 and 7 are regularized using Dropout [62] and the last layer acts as a multi-way soft-max classifier.recent works show that data augmentation also helps to improve classification performance [12].The full architecture of our Proposed model is shown in Figure 4.

Experimental Result
This section describes the experimental detail which is divided into two sub-sections.First sub-section discusses about the implementation environment and next one shows the results.

Implementation Environment
For this work, we set our experimentation environment by following a straightforward process.
We fine-tune the CaffeNet [51] model and use Ubuntu 12.4 operating system.We consider high speed GPU for this work, because working with large datasets and complex DNN, CPU is nearly ten times slower than GPU.We use NVIDIA GEFORCE GTX 950 4GB GPU and Intel core i7 processor for faster training and testing.

Experimental Result and Discussion
We use two deep convolutional neural network models alongside our proposed model on Fashion and Clothing Attribute datasets for comparing the performance of these models with some existing well-known Hand-Engineering feature extraction approaches for garment design classification.In this work, we use different training, validation and testing sample from these two datasets, that shown in Table 1.The training and testing accuracies of AlexNet, VGG_S and proposed model for these two datasets are provided in Table 2, Table 3, Table 4, and Table 5.These accuracies are calculated based on the training, validation samples/class used for each dataset.From Table 3 and Figure 7 we find that in most of the cases VGG_S performs better than AlexNet model.
For Clothing Attribute dataset, we achieve maximum 75.6% accuracy on AlexNet and 76.8% accuracy on VGG_S, but, our proposed model achieve 77.8% accuracy.For Fashion dataset with 5 classes Table 5 and Figure 8 shows that we have achieved 81.8% accuracy on AlexNet and 82.9% accuracy on VGG_S respectively and our proposed model achieve 84.5% accuracy.From these two tables, it can be concluded that using more training sample, we can produce more accuracy.
Table 6 shows the experimental results using seven different Hand-Engineered feature extraction methods which are (HOG, GIST, LGP, CENTRIST, tCENTRIST, cCENTRIST, and NABP).For these methods Support Vector Machine (SVM) is used for classification purpose.We achieve 77.8% accuracy on Clothing Attribute Dataset with 6 different classes and 84.5% accuracy on Fashion dataset containing 5 texture design categories using our proposed model.When a database contains more generic properties for every class, then a Deep network can extract the generic features easily and accurately.We mentioned that we manually categorized the clothing products and use only a few numbers of classes.For this reason, the classes contain less generic properties most of the time.Additional FC layer helps this model to understand the features from these datasets more accurately.
We hope that, this research work will help other future researchers for choosing appropriate deep learning model for garments texture design classification.In future, we plan to improve the results by adopting more sophisticated strategies in this field.

Figure 1 .Figure 2 .
Figure 1.Basic steps of our working procedure

First convolution layer takesFigure 3 .
Figure 3.The full architecture of VGG_S Model ) VGG_S: This architecture is relatively slow than VGG_F and VGG_M and it is a simplified version of accurate model in the Over-Feat framework[53] which has six convolutional layers.VGG_S model has taken the first five layers from the original model and has a smaller number of filters in 5th layer.It has large pooling size in 1st and 5th convolutional layer than VGG_M.We use this model to evaluate the garments design class identification.As depicted in Figure3, this VGG_S model contains five convolution layers with smaller number of filters in the 5th layer and three fully connected layers.There are another two models based on VGGNet namely VGG-VD16 and VGG-VD19.We use VGG_S model to classify the garments design class in our work.

Figure 5 .
Figure 5.Some example images from Fashion dataset: Each of the row represents jeans, leather, print, single color and stripe category respectively.

Figure 6 .
Figure 6.Example of Clothing Attribute dataset: Column 1 to 6 represents example of Floral, Graphics, Plaid, Solid Color, Spotted and Striped garments respectively.

Figure 7 .
Figure 7.Comparison between AlexNet, VGG S and our proposed models for Clothing Attribute Dataset.
In this paper, we have used deep CNN models for identifying garments design class and also compared the results with several Hand-Engineered feature extraction methods.Using two different datasets we show that a Deep Convolutional Neural Network with five convolutional layers and four fully connected layers show better performance than some existing deep convolutional model as well as several Hand-Engineered feature extraction methods.FC layers and Convolutional layers used in a Deep CNN represent the features more elaborately, which are stronger than any of Hand-Engineered feature extraction techniques.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 July 2016 doi:10.20944/preprints201607.0085.v1 84
[27] parsing accuracy.Menfredi et al.[27]proposed a new approach for automatic garments segmentation and classification.Here, authors classified garments into nine different classes such as skirts, shirt, dresses, etc.For this work, authors used a projection histogram for extracted few specific garments.Since, they divided the whole image into 117 cells and group them into 3 × 3 cells.

27 July 2016 doi:10.20944/preprints201607.0085.v1 based
Completed Local Binary Pattern (CLBP), Local Ternary Pattern (LTP) and CENsus TRanformed hISTogram (CENTRIST).Authors applied these two descriptors on two different publically available databases and achieve nearly about 3% more accuracy than the existing state-of-the art methods.In above, we have discussed about some supervised methods for feature extraction applied on various garments design classification areas.For this work they used Apparel Classification with Style (ACS), Clothing Attribute (CA) and Colorful-Fashion (CF) datasets and find out 50.2% and 74.5% accuracy for clothing style classification and Clothing Attribute datasets.Hu et al. [50] used deep convolutional neural networks for high-resolution remote sensing (HRRS) scene classification.For this work they proposed two models for extracting CNN features from different layers.Authors also use convolutional feature coding scheme for aggregating the dense convolutional features into a global representation.
[49]t al.[1]proposed CENsus Transform hiSTogram (CENTRIST) which is very similar to LBP and mainly work as a visual descriptor for recognizing scene categories.CENTRIST proposes a spatial representation based on a Spatial Pyramid Matching Scheme (SPM)[48]to capture global structure from images.Which is a collection of orderless feature histograms computed over cells.CENTRIST uses total 31 blocks to avoid the artefacts.All blocks are belong to 3 level, here one block from level 0, five blocks from level 1, 25 blocks come from level 2. Dey et al.[8]proposed two new descriptors for garments design class identification namely Completed CENTRIST (cCENTRIST) and Ternary CENTRIST (tCENTRIST).These descriptors arePreprints (www.preprints.org)|NOTPEER-REVIEWED|Posted:DeepNetworks(AUDN) by constructing a deep architecture for facial expression recognition.For extracting high level features from each AU-aware receptive fields (AURF), authors used restricted Boltzmann machine (RBMs).Later, this technique was applied on three expression database namely CK+, MMI and SFEW, and found that better or at least competitive results are achieved.However, this method fails when several kinds of challenging images (eg., the subjects have higher expression non-uniformity, most of them have moustache and wear accessories such as glasses) are appeared.Krizhevsky et al.[12]proposed a new CNN architecture which achieved top-1 and top-5 error rates of 37.5% and 17.0% on the test data.However, there is still an open issue that, if a single convolutional layer is removed, Network's performance is degraded.Here, authors did not use any unsupervised pre-training data to simplify this work but it could be more helpful if the computational power and size of the Network was increased.Dey et al.[8]used deep learning model in texture based garments design classification.In their experiment, using Berkeley-trained model[18], they obtained 73.54% accuracy in Clothing Attribute dataset.However, they claimed that accuracy might be improved by changing layers and other related issues.Zhoub et al.[15]proposed a technique which extracted the difference between the density and diversity of image datasets.Here, authors used CNN to learn deep features for scene recognition tasks.For this dataset VGG S-16 models achieved 88.8% accuracy in top-5 val/test.However, there exist some difficulties such as the variability in camera poses, decoration styles or the objects that appear in the scene.Lao et al.[49]used Convolutional Neural Network for fashion class identification.Authors divided their work into four parts those are multiclass classification of clothing type; Clothing Attribute classification; clothing retrieval of nearest neighbors; and clothing object detection.Preprints (www.

Table 1
represents proper training and validation samples about Clothing Attribute and Fashion datasets.For Clothing Attribute dataset, we use different training and validation sample such as 10, 20, 30 images per class and rest of the images for testing to identify the classification results.In

Table 2 .
Recognition rate (%) in training phase of CAD

Table 3 .
Recognition rate (%) in testing phase of CAD

Table 4 .
Recognition rate (%) in training phase of Fashion Dataset

Table 5 .
Recognition rate (%) in testing phase of Fashion Dataset

Dataset Fashion Dataset with 6 classes
Figure 8.Comparison between AlexNet, VGG S and our proposed models for Fashion Dataset.

Table 6 .
Experimental results of different methods for Clothing Attribute dataset.

(www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 July 2016 doi:10.20944/preprints201607.0085.v1Table 7 .
Table 6 also shows the result of three deep learning models Berkeley, AlexNet, VGG_S along with our proposed model for Clothing Attribute dataset.From this table, it is clear that performance of different deep learning model is better than any Hand-Engineering feature extraction method.From Table 7, we can see that for Fashion dataset our proposed method performs better.Though AlexNet and VGG_S show slightly less accuracy than tCENTRIST, cCENTRIST and NABP.Experimental results of different methods for Fashion dataset.