Faster R-CNN
There was no doubt Fast R-CNN was faster than R-CNN. What used to take 47 seconds per image went down to 0.22 seconds. That is, Fast R-CNN was 213 times faster than R-CNN. It significantly improved inference speed, albeit excluding the region proposal step that took around 2 seconds.
It made sense to compare the latency without the region proposal step since both models used the same Selective Search method that took the same time. But the total latency was still more than 2 seconds per image, not fast enough for real-time object detection use. That motivated Ross Girshick to develop Faster R-CNN that overcame the latency issue.
Ross Girshick worked with three co-authors (Kaiming He, Shaoqing Ren, and Jian Sun) who were part of the Microsoft Research team that later developed ResNet, the winner of ILSVRC 2015 (Image Classification). So, the best researchers worked on Faster R-CNN, replacing the Selective Search method with Region Proposal Network (RPN), which is the main topic of this article.
Let's review how R-CNN evolved into Fast R-CNN and Faster R-CNN to see how RPN fits in the Faster R-CNN pipeline.