R-CNN: Region-based Convolutional Neural Network

R-CNN = CNN Extracting Features + SVM Classifier

8 min readJun 11, 2022


R-CNN (Region-based Convolutional Neural Network) was an epoch-making model in 2013 which successfully combined CNN with classical computer vision techniques for object detection and broke the previous record. R-CNN is now an old model, but it’s essential to have knowledge of the origin in studying the subsequent development in object detection.

If you are new to object detection, please refer to this article that explains the difference between image classification and object detection. Otherwise, let’s get started with an overview of R-CNN.

Ross Girshick

Ross Girshick, the central figure behind R-CNN, was a postdoc at the University of California, Berkeley, at the time. Then, he worked for Microsoft for a while and now belongs to Facebook (Meta) AI Research (FAIR).

Note: Yann LeCun tweeted that FAIR now stands for Fundamental AI Research.

In addition to R-CNN, Ross Girshick researched and developed Fast R-CNN and Faster R-CNN. He was involved in developing YOLO (first version only) with Joseph Redmon. He co-authored the paper for Mask R-CNN with Kaiming He (famous for Kaiming Initialization, or He Initialization). The latter also participated in the Faster R-CNN project at Microsoft (and then later moved to FAIR).

Ross Girshick integrated convolutional neural networks for feature extraction with existing computer vision techniques in object detection and achieved significant performance improvements over previous models.

R-CNN at a Glance

R-CNN performs object detection according to the steps shown in the figure below.

  1. Input Image
  2. Region Proposals
  3. CNN Feature Extraction
  4. SVM Classification

Finally, the post-processing performs NMS (Non-Maximum Suppression).