Keypoints based detector¶

CornerNet (ECCV 2018)¶

CornerNet: Detecting Objects as Paired Keypoints
detect pairs of top-left corner and bottom-right corner of bounding box.
using hourglass as backbone, followed by 2 prediction modules (top-left corners and bottom-right corners).
Corners prediction module output heatmap of corner, embedding (for matching 2 corners) and offsets (to match original resolution).

Contribution:

formulate the task of object detection as a task of detecting and grouping corners with embeddings
the corner pooling layers that help better localize the corners
significantly modify the hourglass architecture and add our novel variant of focal loss (Linet al., 2017) to help better train the network

Corner Loss¶

Focal loss + reduce the penalty within a radiuss of positive location

\[\begin{split}L_det = -\dfrac{1}{N} \sum_{c=1}^C\sum_{i=1}^H\sum_{j=1}^W \begin{cases} (1-p_{cij})^\alpha log(p_{cij}) , & \text{if } y_{cij}=1\\ (1-y_{cij})^\beta (p_{cij})^\alpha log(1-p_{cij}), & \text{otherwise} \end{cases}\end{split}\]

where N is the number of objects in an image, and α and β are the hyper-parameters which control the contribution of each point (we set α to 2 and β to 4 in all experiments). α = \(\gamma\) in focal loss With the Gaussian bumps encoded in \(y_{cij}\) , the \((1-y_{cij})^\beta\) term reduces the penalty around the ground truth locations

Corner pooling Layer¶

It is one-stage detector with ~4fps (even slower than two-stage?)

Backbone: Hourglass¶

Backbone for keypoints is important to keypoint estimation network. It is tested using hourglass increase 8.2 AP.

CornerNet-Lite¶

real-time fps + higher AP than YOLO

CenterNet: Keypoint Triplets for Object Detection (ICCV 2019)¶

CenterNet: Keypoint Triplets for Object Detection
中科院牛津华为诺亚提出CenterNet，one-stage detector可达47AP，已开源！
triplets: top-left + bottom right + center
reduce incorrect bounding boxes via using predicted centre point to check if center keypoint of the same class falling within its central region

ExtremeNet (CVPR 2019)¶

Bottom-up Object Detection by Grouping Extreme and Center Points
based on CornerNet
predict 5 heatmaps: top, left, bottom, right, center + 4 offset map: top, left, bottom, right
No embedding, brute center grouping
code: xingyizhou/ExtremeNet (PyTorch v0.4.1), developed upon CornerNet, fine-tuned on pre-trained CornerNet
Disadvantage: for single-scale testing, AP lower than CornerNet, for larger objects. It is probably due to center response map is not accurate enough to perform well on large objects.

CenterNet: Objects as Points (2019)¶

Objects as Points by same Author of ExtremeNet
It is NOT CenterNet: Keypoint Triplets for Object Detection
code: xingyizhou/CenterNet (pyTorch)
output: heatmap of center points (# of class channel) + width, height of pixel location (2 channels) + offset (2 channels)

From points to bounding boxes (Inference)¶

Get network output keypoints \(\hat{Y}\) x number of class, offset \(O\) x 2 channels (x,y) and size \(S\) x 2 channels
extract the peaks in heatmap for each category independently
1. detect all response whose value greater or equal to its 8 connected neighbors
2. keep top n peaks \(\hat{P}_c\)
For each keypoint in \(\hat{P}\), get it 2D location (i,j)
Get corresponding \(O_{i,j}\), \(S_{i,j}\)
Produce bounding boxes
(Optional) Post-processing all boxes with NMS. inference time: 28fps with DLA-34 backbone, 7.8fps with hourglass-104 (45.1 AP)

Other applications¶

3D detection, Human pose estimation

Backbone & Preformance¶

Object Detection on COCO validation¶

Backbone	AP / FPS	Flip AP / FPS	Multi-scale AP / FPS
Hourglass-104	40.3 / 14	42.2 / 7.8	45.1 / 1.4
DLA-34	37.4 / 52	39.2 / 28	41.7 / 4
ResNet-101	34.6 / 45	36.2 / 25	39.3 / 4
ResNet-18	28.1 / 142	30.0 / 71	33.2 / 12

hourglass is pre-trained in ExtremeNet

Keypoint detection on COCO validation¶

Backbone	AP	FPS
Hourglass-104	64.0	6.6
DLA-34	58.9	23

Center point collision¶

CenterNet is unable to predict <0.1% objects due to collision in center points. But this number is lower than collisions of anchors-based detector

Remark¶

According to issue 269: Comparing with ExtremeNet and CornerNet
, this paper is rejected because it is not all better than ExtremeNet. However, this model do not require grouping keypoints hence faster.

TTFNet (AAAI 2020)¶

Training-Time-Friendly Network for Real-Time Object Detection
based on CenterNet: Objects as Points

using Gaussian kernels to encode training samples for center localization and size regression ~increasing batch size, so that enlarge the learning rate(Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour) and accelerate the training process. (It predict \((w_l, h_t, w_r, h_b)\) instead of size since the training sample of size regression is not only the center points
initiative sample weight for better information utilization result: balance training time while the accuracy and inference time still comparable to CenterNet