Face Detection: Retina FaceNet

retinaface face detection algorithm

dessert

I have been learning about face detection algorithms recently, so I have also tried to learn multiple face detection frameworks. So I will share them with you here.

Retinaface is similar to ordinary object detection algorithms. Some prior boxes are pre-set on the image. These prior boxes will be distributed on the entire image. The internal structure of the network will judge these prior boxes to see if they contain faces. At the same time, it will also adjust the position and give each prior box a confidence level.

In the Retinaface prior frame, not only the face position is obtained, but also the five key points of each face are obtained.

Next, our implementation process of Retinaface is actually to pre-set the prior box on the image. The network's prediction result will determine whether the prior box contains a face and adjust the prior box to obtain the predicted box and five facial key points.

Backbone feature extraction network

mobileNet and Resnet
In the backbone network (such as mobileNetv1), feature extraction is continuously performed. The feature extraction process is the process of compressing the length and width to the depth (channel expansion) (downsampling).

mobileNet

The MobileNet network was proposed by the Google team in 2017. It focuses on lightweight CNN networks in mobile and embedded devices. It greatly reduces model parameters and computational complexity with only a slight decrease in accuracy.

Strengthen feature extraction network FPN and SHH

FPN construction is to generate feature maps for fusion, and then upsample and merge them with the effective feature layer of the previous layer.

The idea of SSH is very simple. It uses three parallel structures and stacks 3 x 3 convolutions to replace the effects of 5 x 5 and 7 x 7 convolutions.

retina head

The backbone network outputs grids of different sizes for detecting targets of different sizes. The default number of prior boxes is 2. These prior boxes are used to detect targets, and then the target bounding boxes are obtained by adjustment.

Face classification is used to detect whether there is a face in the prior frame. That is, to determine whether the prior frame contains the target. Using a 1 x 1 convolution, the number of SSH channels is adjusted to num_anchors x 2, which is used to represent the probability of each prior frame containing a face. Here I feel it is necessary to explain 2. Usually, one probability is used to represent the probability of a face in the prior frame, but here two values are used to represent the probability of a face in the prior frame. In fact, among the two values, if the first value is larger, it means there is a face, and if the second value is larger, it means there is no face.
Face box regression is used to adjust the center, width and height of the prior box, and four parameters are used to adjust the prior box. At this time, 1 x 1 convolution can be used to adjust the number of channels of SSH to num_anchors x 4 to represent the adjustment parameters of each prior box.
Facial landmark regression adjusts the prior frame to obtain facial key points. Each facial key point requires two adjustment parameters, and there are five facial key points in total. At this time, using a 1 x 1 convolution, the SSH channel is adjusted to num_anchor (num_anchors x 5 x 2) to represent the adjustment of each facial key point in each prior frame. 5 means 5 key points on the face, and 2 here represents the parameter for adjusting the center point of the face.

FPN

 class FPN(nn.Module):
    def __init__(self,in_channels_list,out_channels):
        super(FPN,self).__init__()
 leaky = 0
        if (out_channels <= 64):
 leaky = 0.1 
         
 # Use 1x1 convolution to adjust the number of channels of the 3 effective feature layers obtained, and the number of output channels is 64
        self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
        self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
        self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride = 1, leaky = leaky) 
 
        self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
        self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky) 
 
    def forward (self, input):
        # names = list(input.keys())
        input = list(input. values ()) 
 
 #
        output1 = self.output1(input[0])
        output2 = self.output2(input[1])
        output3 = self.output3(input[2]) 
 
 # Upsample the smallest feature layer to get up3
        up3 = F.interpolate(output3, size =[output2. size (2), output2. size (3)], mode= "nearest" )
 # Then add the minimum feature layer to the intermediate valid feature layer after the above method is used to obtain the result
        output2 = output2 + up3
 # Perform 64-channel convolution for feature integration
        output2 = self.merge2(output2) 
 
 # This step is similar to the above
        up2 = F.interpolate(output2, size =[output1. size (2), output1. size (3)], mode= "nearest" )
        output1 = output1 + up2
        output1 = self.merge1(output1) 
 
 out = [output1, output2, output3]
 return   out

SSH

 class SSH(nn.Module):
    def __init__(self, in_channel, out_channel):
        super(SSH, self).__init__()
        assert out_channel % 4 == 0
 leaky = 0
        if (out_channel <= 64):
 leaky = 0.1
        self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1) 
 
 # Use 2 3 x 3 convolutions instead of 5 x 5 convolutions
        self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
        self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
 # Use 3 3 x 3 convolutions instead of 7 x 7 convolutions
        self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
        self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1) 
 
    def forward (self, input):
        conv3X3 = self.conv3X3(input) 
 
        conv5X5_1 = self.conv5X5_1(input)
        conv5X5 = self.conv5X5_2(conv5X5_1) 
 
        conv7X7_2 = self.conv7X7_2(conv5X5_1)
        conv7X7 = self.conv7x7_3(conv7X7_2) 
 
 # Stacking
 out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1)
 out = F.relu( out )
 return   out

Prior box adjustment

Depthwise separable convolution

The advantage of depthwise separable convolution is that it can reduce the number of parameters, thereby reducing the cost of calculation. It often appears in some lightweight network structures (these network structures are suitable for mobile devices or embedded devices). Depthwise separable convolution is composed of DW (depthwise) and PW (pointwise).

Here we explain how depth-wise separable convolution reduces parameters by comparing it to ordinary convolutional neural networks.

The number of convolution kernel channels is consistent with the number of input channels
The number of output channels is consistent with the number of convolution kernels.

DW (Depthwise Conv)

Let's first look at the DW part in the figure. In this part, each convolution kernel has 1 channel. Each convolution kernel corresponds to one input channel for calculation. It can be imagined that the number of output channels is consistent with the number of convolution kernels and the number of input channels.

To sum up briefly, there are two points:

The number of convolution kernel channels is 1
The number of input channels is equal to the number of convolution kernels, which is equal to the number of output channels.

PW (Pointwise Conv)

The PW convolution kernel is similar to the normal convolution kernel, except that the PW convolution kernel size is 1, the convolution kernel depth is the same as the number of input channels, and the number of convolution kernels is the same as the number of output channels.

<<: 600,000 new 5G base stations will be built in 2021

>>: Huawei 5G is dead? 5G order numbers have not been updated for 10 months, Ren Zhengfei has other plans

About edge computing: Is it right for your business?

10gbiz: Hong Kong/Japan/US/Singapore Bare Metal Server 34% off for the first month and 35% off for renewal starting at $39.44 USD, E5-2620, 32G memory, 1TB SSD

Blog

Recommend

CloudSilk: 160 yuan/year-512MB/10G SSD/[email protected]/San Jose 4837/optional AS9929

CloudSilk is a domestic hosting company establish...

The efficiency of quantum entanglement purification has increased by more than 6,000 times, far exceeding the international level

At present, quantum technology represented by qua...

edgeNAT VPS 20% off for monthly payment and 30% off for annual payment, Hong Kong/Korea/US data centers available, top up 500 yuan and get 100 yuan free

edgeNAT has just launched a promotion for this mo...

Face Detection: Retina FaceNet

About edge computing: Is it right for your business?

Why some cities are reluctant to adopt 5G

The battle for 5G wide-area coverage has begun. Whose future do you think will be better?

What exactly is RedCap?

Ruijie Smart Town E-Day Tour

Summary information: 51Cloud/Yunji Internet/Hengchuang Technology/LiuliuCloud/Yunmi Technology/Hengtian Cloud

Strong partner ecosystem helps Denodo grow in Greater China

Cloud is still the protagonist of Huawei Connect 2017; Huawei invites you to "grow everything" together in the cloud era

Alibaba Cloud Procurement Season: ECS cloud servers start at 86 yuan/year, with coupons ranging from 50 to 1,600 yuan

10gbiz: Hong Kong/Japan/US/Singapore Bare Metal Server 34% off for the first month and 35% off for renewal starting at $39.44 USD, E5-2620, 32G memory, 1TB SSD

Recommend

CloudSilk: 160 yuan/year-512MB/10G SSD/[email protected]/San Jose 4837/optional AS9929

The efficiency of quantum entanglement purification has increased by more than 6,000 times, far exceeding the international level

Network | 5G secrets that operators don’t want to tell

Where are the telecom operators headed in 2019?

The impact of hybrid IT environments on NetOps professionals

HTTP connection management diagram

CrownCloud: $5/month-4 cores/2GB/30GB/2TB@1Gbps/Los Angeles & Miami & Atlanta & Netherlands data centers

Is 5G ready for IoT?

What exactly is SD-WAN, which is so popular on the Internet?

edgeNAT VPS 20% off for monthly payment and 30% off for annual payment, Hong Kong/Korea/US data centers available, top up 500 yuan and get 100 yuan free

GSA: 140 operators in 59 countries and regions around the world have launched commercial 5G networks

How to comprehensively and objectively evaluate the quality of 5G networks? This is the correct approach

Operators’ deployment of the Metaverse: a good business that will not lose money whether you win or lose

165 million! China Mobile’s 5G user number announced, is 4G really outdated?

TripodCloud: San Jose CN2 GIA line starting at $38.99/half year, optional large hard drive