Face Detection: Retina FaceNet

Face Detection: Retina FaceNet

retinaface face detection algorithm

dessert

I have been learning about face detection algorithms recently, so I have also tried to learn multiple face detection frameworks. So I will share them with you here.

Retinaface is similar to ordinary object detection algorithms. Some prior boxes are pre-set on the image. These prior boxes will be distributed on the entire image. The internal structure of the network will judge these prior boxes to see if they contain faces. At the same time, it will also adjust the position and give each prior box a confidence level.

In the Retinaface prior frame, not only the face position is obtained, but also the five key points of each face are obtained.

Next, our implementation process of Retinaface is actually to pre-set the prior box on the image. The network's prediction result will determine whether the prior box contains a face and adjust the prior box to obtain the predicted box and five facial key points.

Backbone feature extraction network

  • mobileNet and Resnet
  • In the backbone network (such as mobileNetv1), feature extraction is continuously performed. The feature extraction process is the process of compressing the length and width to the depth (channel expansion) (downsampling).

mobileNet

The MobileNet network was proposed by the Google team in 2017. It focuses on lightweight CNN networks in mobile and embedded devices. It greatly reduces model parameters and computational complexity with only a slight decrease in accuracy.

Strengthen feature extraction network FPN and SHH

FPN construction is to generate feature maps for fusion, and then upsample and merge them with the effective feature layer of the previous layer.

The idea of ​​SSH is very simple. It uses three parallel structures and stacks 3 x 3 convolutions to replace the effects of 5 x 5 and 7 x 7 convolutions.


retina head

The backbone network outputs grids of different sizes for detecting targets of different sizes. The default number of prior boxes is 2. These prior boxes are used to detect targets, and then the target bounding boxes are obtained by adjustment.

  • Face classification is used to detect whether there is a face in the prior frame. That is, to determine whether the prior frame contains the target. Using a 1 x 1 convolution, the number of SSH channels is adjusted to num_anchors x 2, which is used to represent the probability of each prior frame containing a face. Here I feel it is necessary to explain 2. Usually, one probability is used to represent the probability of a face in the prior frame, but here two values ​​are used to represent the probability of a face in the prior frame. In fact, among the two values, if the first value is larger, it means there is a face, and if the second value is larger, it means there is no face.
  • Face box regression is used to adjust the center, width and height of the prior box, and four parameters are used to adjust the prior box. At this time, 1 x 1 convolution can be used to adjust the number of channels of SSH to num_anchors x 4 to represent the adjustment parameters of each prior box.
  • Facial landmark regression adjusts the prior frame to obtain facial key points. Each facial key point requires two adjustment parameters, and there are five facial key points in total. At this time, using a 1 x 1 convolution, the SSH channel is adjusted to num_anchor (num_anchors x 5 x 2) to represent the adjustment of each facial key point in each prior frame. 5 means 5 key points on the face, and 2 here represents the parameter for adjusting the center point of the face.

FPN

  1. class FPN(nn.Module):
  2. def __init__(self,in_channels_list,out_channels):
  3. super(FPN,self).__init__()
  4. leaky = 0
  5. if (out_channels <= 64):
  6. leaky = 0.1
  7.          
  8. # Use 1x1 convolution to adjust the number of channels of the 3 effective feature layers obtained, and the number of output channels is 64
  9. self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
  10. self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
  11. self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride = 1, leaky = leaky)
  12.  
  13. self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
  14. self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky)
  15.  
  16. def forward (self, input):
  17. # names = list(input.keys())
  18. input = list(input. values ​​())
  19.  
  20. #
  21. output1 = self.output1(input[0])
  22. output2 = self.output2(input[1])
  23. output3 = self.output3(input[2])
  24.  
  25. # Upsample the smallest feature layer to get up3
  26. up3 = F.interpolate(output3, size =[output2. size (2), output2. size (3)], mode= "nearest" )
  27. # Then add the minimum feature layer to the intermediate valid feature layer after the above method is used to obtain the result
  28. output2 = output2 + up3
  29. # Perform 64-channel convolution for feature integration
  30. output2 = self.merge2(output2)
  31.  
  32. # This step is similar to the above
  33. up2 = F.interpolate(output2, size =[output1. size (2), output1. size (3)], mode= "nearest" )
  34. output1 = output1 + up2
  35. output1 = self.merge1(output1)
  36.  
  37. out = [output1, output2, output3]
  38. return   out  

SSH

  1. class SSH(nn.Module):
  2. def __init__(self, in_channel, out_channel):
  3. super(SSH, self).__init__()
  4. assert out_channel % 4 == 0
  5. leaky = 0
  6. if (out_channel <= 64):
  7. leaky = 0.1
  8. self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1)
  9.  
  10. # Use 2 3 x 3 convolutions instead of 5 x 5 convolutions
  11. self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
  12. self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  13. # Use 3 3 x 3 convolutions instead of 7 x 7 convolutions
  14. self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
  15. self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  16.  
  17. def forward (self, input):
  18. conv3X3 = self.conv3X3(input)
  19.  
  20. conv5X5_1 = self.conv5X5_1(input)
  21. conv5X5 = self.conv5X5_2(conv5X5_1)
  22.  
  23. conv7X7_2 = self.conv7X7_2(conv5X5_1)
  24. conv7X7 = self.conv7x7_3(conv7X7_2)
  25.  
  26. # Stacking
  27. out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1)
  28. out = F.relu( out )
  29. return   out  

Prior box adjustment

Depthwise separable convolution

The advantage of depthwise separable convolution is that it can reduce the number of parameters, thereby reducing the cost of calculation. It often appears in some lightweight network structures (these network structures are suitable for mobile devices or embedded devices). Depthwise separable convolution is composed of DW (depthwise) and PW (pointwise).


Here we explain how depth-wise separable convolution reduces parameters by comparing it to ordinary convolutional neural networks.

  • The number of convolution kernel channels is consistent with the number of input channels
  • The number of output channels is consistent with the number of convolution kernels.

DW (Depthwise Conv)

Let's first look at the DW part in the figure. In this part, each convolution kernel has 1 channel. Each convolution kernel corresponds to one input channel for calculation. It can be imagined that the number of output channels is consistent with the number of convolution kernels and the number of input channels.

To sum up briefly, there are two points:

  • The number of convolution kernel channels is 1
  • The number of input channels is equal to the number of convolution kernels, which is equal to the number of output channels.

PW (Pointwise Conv)

The PW convolution kernel is similar to the normal convolution kernel, except that the PW convolution kernel size is 1, the convolution kernel depth is the same as the number of input channels, and the number of convolution kernels is the same as the number of output channels.






<<:  600,000 new 5G base stations will be built in 2021

>>:  Huawei 5G is dead? 5G order numbers have not been updated for 10 months, Ren Zhengfei has other plans

Recommend

6 IT roles that need retraining

Given the rapid pace of change in the technology ...

Italian media: 5G will bring 210 billion euros in revenue to Europe

According to the latest research by research firm...

The legend of network protocols (Part 2): TCP emerges

This section will formally enter the content of n...

Predictions for IT development after COVID-19

Gartner and IDC predict that global IT spending w...

Australia completes first millimeter wave auction, 5 companies receive licenses

According to foreign media, Australia has complet...

Let’s talk about 6G communication technology again

2020 is coming to an end. With the advancement of...

There are four misunderstandings about network intelligence

If you don't talk about AI after dinner, you ...