[2주차/백서경/논문 리뷰] Faster R-CNN

2023 Summer Session/CV Team 2

[2주차/백서경/논문 리뷰] Faster R-CNN

bravesk 2023. 7. 20. 18:37

1. Introduction

[ Region proposal method ]

Fast R-CNN은 selective search를 이용하여 CPU에서 연산 -> Region Proposal method(RPN)을 제시하여 proposal 과정을 deep convolutional neural network로 전환

[ RPN ]

1. 여러 비율, 사이즈의 box에 유연한 대응

multi scaled image 각각에서 feature map 뽑아내기 -> 하나의 feature map에서 multiple filter size를 classifier에 넣기 -> anchor box

2. RPN과 Fast R-CNN 연결

region proposal과 object detection에서 convolutional feature를 공유

3. Faster R-CNN

[ 두 가지 모듈로 구성된 unified network ]

region을 propose하는 fully convolutional network와 proposed region을 이용해서 물체를 감지하는 detector로 구성

3.1. Region Proposal Networks

1. 이미지 전체에 대해 하나의 convolutional feature map을 얻음

2. sliding window를 통해 n*n을 input으로 받아 network(VGG 등)에 통과시킴

3. fc layer를 거쳐 classification와 regression layer에 입력됨

3.1.1. Anchors

각 sliding position에서 sliding window의 중점을 중심으로하는 k개의 anchor를 생성(default=9개)

-regression layer는 k개의 anchor에 대한 좌표를 뱉으므로 output: 4*k cooridinates

-classification layer를 k개의 anchor proposal 각각이 object인지 아닌지 object score를 뱉으므로 output: 2*k scores

3.1.2. Loss Function

[ binary class label ]

각 anchor마다 object인지/아닌지 두 종류의 label을 부여 -> positive/negative

기준 1. positive: ground truth와의 IOU가 가장 높은 anchor

기준 2. positive: gound truth와의 IOU overlap이 0.7이상인 경우

기준 3. negative: ground truth와의 IOU가 0.3 이하

[ loss function ]

출처 : https://bkshin.tistory.com/entry/%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-Faster-R-CNN-%ED%86%BA%EC%95%84%EB%B3%B4%EA%B8%B0

i : mini batch 안에서 anchor 의 index

pi: anchor i가 object일 확률

pi*: ground truth label (0,1)

ti: 예측된 bounding box의 coordinates 4개 벡터

ti*: ground truth의 coordinates 4개 벡터

Lcls: classfication loss , object/not object 사이의 log loss

Lreg: regression loss, 예측된 bounding box 좌표값 ti와 ground truth 좌표값 ti* 사이의 smooth L1 loss

Ncls, Nreg: normalization term

람다: 가중치 term

[ bounding box regression ]

x, y, w, h: predicted box

xa, ya, wa, ha: anchor box

x*, y*, w*, h* : ground-truth box

ground-truth box와 anchor box의 차이와 predicted box와 anchor box의 차이를 줄이도록 학습

3.1.2. Training RPNs

end-to-end 학습이 가능

한 이미지에서 postivie anchor와 negative anchor를 최대 1:1 비율로 sample한 anchor의 batch를 모델에 넣어줌

3.2. Sharing Features for RPN and Fast R-CNN

[ Alternating training ]

RPN과 Fast R-CNN을 번갈아 학습시키는 방법을 채택

5. Conclusion

[ 논문의 기여 ]

region proposal의 방법으로 RPN을 제시

region proposal과 object detection에 대한 unified한 시스템을 제시

region proposal quality의 상승으로 object detection의 accuracy 상승