CNN Structure: Backbone (Feature Detection) Neck (Scaling) Head (Classification and Localization)

  • In MobileNetV2-YOLOv3, MobileNet is the backbone, some neck, and YOLOv3 is the head.
  • In YOLOv3, DarkNet is the backbone, FPN is the neck and YOLOv3 is the head.
  • In YOLOv4, CSPDarkNet53 (DarkNet53 + CSPNet strategy) is the backbone
  • Later YOLO examples: Depends, like, ResNet or VGG for backbone, PAN for Neck, YOLO for head.

Base Networks / Backbones are trained on datasets like ImageNet. Detection Networks:

  • One-Stage: SSD, RetinaNet, YOLO, etc.
  • Two-Stage: Faster-RNN, Masked R-CNN, etc.

Other Models:

  • RTMDet (Real Time Object Detector)
  • RT-DETR (Real Time Detection Transformer)
  • YOLO Distributions
    • Ultralytics YOLO (most popular, AGPLv3 / commercial dual-licensed)
    • YOLO-NAS (Dect-AI’s open code / non-commercial pre-trained weights build of YOLO)
    • YOLOX (Free, YOLOv3 with a DarkNet53 backbone)
    • DarkNet YOLO (original, still maintained)
    • MultimediaTechLab/YOLO (MIT licensed YOLOv7, YOLOv9 and YOLO-RD)
  • YOLO derivatives
    • WALDO 3.0