Written by Long Yung, Software Engineer
Multi-label or open-class classification has long been a difficult problem to handle in the field of machine learning. And now, with the growth of neural networks, would those problems become more complicated and hard to reach in deep learning?
In this article, I would like to share some of my ideas for dealing with the multi-class labeling problem with deep learning, applied in the product for recognizing the textile and leathers with different patterns and materials.
Convolutional Neural Network
There is no doubt the most common neural network being used in computer vision should be convolutional neural network, CNN. The most important feature of it, with those visible and understandable filters, allows us to know what the network has learnt throughout the training, e.g. Deep dream . It definitely enhances the power for classifying different kinds of object. Facial recognition, object detection or any vision related tasks get many benefits from it and most of the fields are having nearly perfect performance. However, there exists a boundary, which the computer could only recognize one feature at one moment. For example, if we trained a network ideally which could recognize leather and leopard pattern with 100% accuracy and sigmoid function as the last layer, would it able to give us 50/50 as the result when we input leopard pattern leather?
The Embedding Layer
In view of the previous case, we know we are getting stuck in a dilemma. We rely on the sigmoid function or softmax function to build the “deep” classifier, but they are the bottleneck for multi label classifier. To solve the problem, we abandon to adjust the threshold value of softmax or sigmoid to fit the result. Indeed, we move our focus to layer above it, the fully dense layer or sometime we called them embedding layer.
The simplest design of CNN would be the sequence in order with input, convolution layer, fully dense layer and then the softmax/sigmoid layer. In the past, the fully dense layer is treated as “hidden” layer, as we may not understand its meaning of the weight when the neuron in this layer is more than 1. However, with the help of t-SNE , we understand the high dimensional information from this hidden layer is pretty useful, especially for the case we mentioned above.
In fact, this value alone is meaningless. But when we compared it with the other output from different sources, it becomes valuable. The distance between two points could indicate the similarity of two values. The closer distance to each other, the closer similarity it can be. By making use of this property, we may easier to deal with the multi-class problem.