SSD: Single Shot MultiBox Detector

Why Is SSD Faster Than YOLO v1 and More Accurate than Faster R-CNN?

9 min readJul 31, 2022


In 2015, Wei Liu et al. published a paper on SSD (Single Shot MultiBox Detector). It was faster than YOLO v1 (the first version) and more accurate than Faster R-CNN. This article explains what made SSD so good.



The first author Wei Liu was a University of North Carolina student at Chapel Hill when they published the paper. Later, he joined Nuro and eventually became the head of machine learning research there. The second author Dragomir Anguelov was a senior staff engineer at Google and became a senior director of perception in Zoox at the time of the paper. Later, he joined Waymo and became the head of research. Interestingly, the founders of Nuro were ex-Waymo engineers. Those were the days when self-driving-related companies flooded.

It's not hard to imagine they were interested in building a fast and accurate perception system for autonomous vehicles. The paper talked about region-based detectors being too slow for real-time applications:

While accurate, these approaches have been too computationally intensive for embedded systems and, even with high-end hardware, too slow for real-time applications.

Source: paper

At that time, the most accurate detector was Faster R-CNN. YOLO v1 was faster but not as accurate. So, they wanted an object detection model that is more accurate than Faster R-CNN and faster than YOLO v1.

SSD vs. YOLO v1 vs. Faster R-CNN

Increasing speed would come with decreased detection accuracy. For example, YOLO v1 became faster than Faster R-CNN because it did not have a separate region proposal step and the subsequent feature resampling stage. So, a single-stage detector (aka single-shot detector) was the way to go for the speed. However, at the same time, YOLO v1 was not as accurate as Faster R-CNN. It seemed region-based detectors had more edge if you need better accuracy.

Most authors (Christian Szegedy, Scott Reed, Dumitru Erhan, and Dragomir Anguelov) previously worked on MultiBox (2014), a region-based (two-stage)…



Recommended from Medium


See more recommendations