ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๋ฐ˜์‘ํ˜•

Intro

  • ๊ฐ์ฒด ํƒ์ง€ Object Detection์€ ์˜์ƒ ์†์˜ ์–ด๋– ํ•œ ๊ฐ์ฒด Label๊ฐ€ ์–ด๋””์— (x,y) ์–ด๋–ค ํฌ๊ธฐ๋กœ (w,h) ์กด์žฌํ•˜๋Š”์ง€๋ฅผ ์ฐพ๋Š” Task์ด๋‹ค.
  • Base๊ฐ€ ๋  ๊ธฐ๋ณธ์ ์ธ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ •๋ฆฌ ๋ฐ ์š”์•ฝ, ๋น„๊ต์ด๋‹ค.

R-CNN

  • R-CNN์€ region proposals์™€ CNN์ด ๊ฒฐํ•ฉ๋œ Regions with CNN์˜ ์•ฝ์ž
  • R-CNN์€ ์ด์ „๊นŒ์ง€ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ธ ๊ธฐ๋ฒ•์˜ mAP๋ณด๋‹ค 30% ๋†’์€ 53.3%๋ฅผ ๋‹ฌ์„ฑ
  • 2012๋…„ image classification challenge์—์„œ AlexNet์ด ํฐ ์„ฑ๊ณต → object detection์—์„œ๋„ CNN์„ ํ™œ์šฉํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰ → ๊ทธ ๊ฒฐ๊ณผ๋ฌผ์ด R-CNN(object detection ๋ถ„์•ผ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด region proposals์™€ CNN์„ ๊ฒฐํ•ฉ)
  • ๋‘ ๊ฐ€์ง€์˜ ์ค‘์š”ํ•œ ์•„์ด๋””์–ด ๊ฒฐํ•ฉ

(1) region proposals๋กœ object ์œ„์น˜๋ฅผ ์•Œ์•„๋‚ด๊ณ , ์ด๋ฅผ CNN์— ์ž…๋ ฅํ•˜์—ฌ class๋ฅผ ๋ถ„๋ฅ˜

- ๋ฌผ์ฒด๊ฐ€ ์žˆ์„ ๋ฒ•ํ•œ ์˜์—ญ์„ ์ œ์•ˆํ•ด์ฃผ๋Š” ๋‹จ๊ณ„

  • selective search ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ด์„œ ์ด๋ฏธ์ง€์—์„œ object์˜ ์œ„์น˜๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
  • selective search๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ”„๋กœ์„ธ์Šค๋กœ ์ด๋ฃจ์–ด์ง.
      • ์ด๋ฏธ์ง€์˜ ์ดˆ๊ธฐ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์ •ํ•˜์—ฌ, ์ˆ˜๋งŽ์€ region ์˜์—ญ์„ ์ƒ์„ฑ
      • greedy ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ๊ฐ region์„ ๊ธฐ์ค€์œผ๋กœ ์ฃผ๋ณ€์˜ ์œ ์‚ฌํ•œ ์˜์—ญ์„ ๊ฒฐํ•ฉ
      • ๊ฒฐํ•ฉ๋˜์–ด ์ปค์ง„ region์„ ์ตœ์ข… region proposal๋กœ ์ œ์•ˆ
  • ์ด๋ฏธ์ง€์— selective search๋ฅผ ์ ์šฉํ•˜๋ฉด 2000๊ฐœ์˜ region proposal์ด ์ƒ์„ฑ๋˜๋Š”๋ฐ, ์ด๋“ค์„ CNN์˜ ์ž…๋ ฅ ์‚ฌ์ด์ฆˆ(227x227)๋กœ warp(resize) ํ•˜์—ฌ CNN์— ์ž…๋ ฅํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” warp ๊ณผ์ •์—์„œ object ์ฃผ๋ณ€ 16 ํ”ฝ์…€๋„ ํฌํ•จํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋†’์˜€๋‹ค.

(2) Larger data set์œผ๋กœ ํ•™์Šต๋œ pre-trained CNN์„ fine-tunning

 

Method

1. ์ž…๋ ฅ ์ด๋ฏธ์ง€์— Selective Search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ bounding box(region proposal) 2000๊ฐœ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
2. ์ถ”์ถœ๋œ bounding box๋ฅผ warp(resize)ํ•˜์—ฌ CNN์— ์ž…๋ ฅํ•œ๋‹ค.
3. fine tunning ๋˜์–ด ์žˆ๋Š” pre-trained CNN์„ ์‚ฌ์šฉํ•˜์—ฌ bounding box์˜ 4096์ฐจ์›์˜ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
4. ์ถ”์ถœ๋œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ SVM์„ ์ด์šฉํ•˜์—ฌ class๋ฅผ ๋ถ„๋ฅ˜ํ•œ๋‹ค.
5. bounding box regression์„ ์ ์šฉํ•˜์—ฌ bounding box์˜ ์œ„์น˜๋ฅผ ์กฐ์ •ํ•œ๋‹ค.

๋ฌธ์ œ์ 

  • R-CNN์€ ๋น„ํšจ์œจ์„ฑ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค.
  • ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— 2000๊ฐœ์˜ region์ด ์กด์žฌํ•  ๋•Œ, R-CNN์€ ๊ฐ๊ฐ์˜ region๋งˆ๋‹ค ์ด๋ฏธ์ง€๋ฅผ cropping ํ•œ ๋’ค CNN ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ 2000๋ฒˆ์˜ CNN ์—ฐ์‚ฐ์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.
  • → ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์•„์ง€๊ณ  detection ์†๋„๊ฐ€ ๋А๋ฆฌ๋‹ค๋Š” ๋‹จ์ 
  • ์ด ๋‹จ์ ์ด ๊ฐœ์„ ๋œ Fast R-CNN, Faster R-CNN์ด ๋“ฑ์žฅ

 

Fast R-CNN

  • R-CNN์˜ ๋‹จ์ ์ธ ๋งค์šฐ ๋А๋ฆฐ ์†๋„๋ฅผ ๊ฐœ์„ ํ•œ ๋…ผ๋ฌธ
  • ์ฐจ์ด์ ์€ CNN์„ ๋ณ‘๋ ฌ๋กœ ์ˆ˜ํ–‰ํ•˜๋А๋ƒ, ์ˆœ์ฐจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋А๋ƒ ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ.
  • Fast R-CNN์—์„œ๋Š” ํ•ด๋‹น ์˜์—ญ(์ดˆ๋ก์ƒ‰ ๋ฐ•์Šค)์˜ Feature๋ฅผ ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ์ถ”์ถœ(Crop)ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ Feature Map๋‹จ์—์„œ ์ถ”์ถœ -> ์ž…๋ ฅ ์ด๋ฏธ์ง€๋งˆ๋‹ค ๋‹จ ํ•œ๋ฒˆ์˜ CNN๋งŒ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋จ.

RoI(Region of Interest) Pooling

: feature map์—์„œ region proposals์— ํ•ด๋‹นํ•˜๋Š” **๊ด€์‹ฌ ์˜์—ญ(Region of Interest)**์„ ์ง€์ •ํ•œ ํฌ๊ธฐ์˜ grid๋กœ ๋‚˜๋ˆˆ ํ›„ max pooling์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•

- ๊ฐ channel๋ณ„๋กœ ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰
- ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ feature map์„ ์ถœ๋ ฅํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅ

1. ๋จผ์ € ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ CNN ๋ชจ๋ธ์— ํ†ต๊ณผ์‹œ์ผœ feature map์„ ์–ป์Œ
  800x800 ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ VGG ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜์—ฌ 8x8 ํฌ๊ธฐ์˜ feature map์„ ์–ป์Œ.
 ์ด ๋•Œ sub-sampling ratio = 1/100์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ (์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” subsampling์€ pooling์„ ๊ฑฐ์น˜๋Š” ๊ณผ์ •์„ ์˜๋ฏธ) 2. ๊ทธ๋ฆฌ๊ณ  ๋™์‹œ์— ์›๋ณธ ์ด๋ฏธ์ง€์— ๋Œ€ํ•˜์—ฌ Selective search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ region proposals๋ฅผ ์–ป์Œ.
  ์›๋ณธ ์ด๋ฏธ์ง€์— Selective search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ 500x700 ํฌ๊ธฐ์˜ region proposal์„ ์–ป์Œ
3. ์ด์ œ feature map์—์„œ ๊ฐ region proposals์— ํ•ด๋‹นํ•˜๋Š” ์˜์—ญ์„ ์ถ”์ถœ: ์ด ๊ณผ์ •์€ RoI Projection์„ ํ†ตํ•ด ๊ฐ€๋Šฅ.   
  Selective search๋ฅผ ํ†ตํ•ด ์–ป์€ region proposals๋Š” sub-sampling ๊ณผ์ •์„ ๊ฑฐ์น˜์ง€ ์•Š์€ ๋ฐ˜๋ฉด, ์›๋ณธ ์ด๋ฏธ์ง€์˜ feature map์€ sub-sampling ๊ณผ์ •์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๊ฑฐ์ณ ํฌ๊ธฐ๊ฐ€ ์ž‘์•„์กŒ์Œ.
  ์ž‘์•„์ง„feature map์—์„œ region proposals์ด encode(ํ‘œํ˜„)ํ•˜๊ณ  ์žˆ๋Š” ๋ถ€๋ถ„์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์ž‘์•„์ง„ feature map์— ๋งž๊ฒŒ region proposals๋ฅผ ํˆฌ์˜ํ•ด์ฃผ๋Š” ๊ณผ์ •์ด ํ•„์š”.
  ์ด๋Š” region proposal์˜ ํฌ๊ธฐ์™€ ์ค‘์‹ฌ ์ขŒํ‘œ๋ฅผ sub sampling ratio์— ๋งž๊ฒŒ ๋ณ€๊ฒฝ์‹œ์ผœ์คŒ์œผ๋กœ์จ ๊ฐ€๋Šฅ.
   -> Region proposal์˜ ์ค‘์‹ฌ์  ์ขŒํ‘œ, width, height์™€ sub-sampling ratio๋ฅผ ํ™œ์šฉํ•˜์—ฌ feature map์œผ๋กœ ํˆฌ์˜์‹œ์ผœ์ค๋‹ˆ๋‹ค.
   -> feature map์—์„œ region proposal์— ํ•ด๋‹นํ•˜๋Š” 5x7 ์˜์—ญ์„ ์ถ”์ถœ
4. ์ถ”์ถœํ•œ RoI feature map 5x7 ํฌ๊ธฐ์˜ ์˜์—ญ์„ ์ง€์ •ํ•œ sub-window 2x2 ํฌ๊ธฐ์— ๋งž๊ฒŒ grid๋ฅผ ๋‚˜๋ˆ ์คŒ
5. grid์˜ ๊ฐ ์…€์— ๋Œ€ํ•˜์—ฌ max pooling์„ ์ˆ˜ํ–‰ํ•˜์—ฌ 2x2 ํฌ๊ธฐ์˜ feature map์„ ์–ป์Œ

 

=> ํ•˜์ง€๋งŒ Fast R-CNN์—์„œ Region Proposal์„ CNN Network๊ฐ€ ์•„๋‹Œ Selective search ์™ธ๋ถ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ณ‘๋ชฉํ˜„์ƒ ๋ฐœ์ƒ

 

Faster R-CNN

  • Fast R-CNN์€ ์˜์—ญ์„ ์ œ์•ˆํ•˜๊ธฐ์œ„ํ•ด Selective Search๋ผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด๋Š” GPU ๋‚ด์—์„œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ CPU์—์„œ ์ž‘๋™ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณ‘๋ชฉ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋จ.
  • ์˜์—ญ์„ ์ œ์•ˆ(Region Proposal)ํ•˜๋Š” ๊ฒƒ๋„ CNN ๋‚ด๋ถ€์—์„œ ์ˆ˜ํ–‰์„ ํ•˜์—ฌ(=GPU๋ฅผ ์ด์šฉ๊ฐ€๋Šฅ) ๋„คํŠธ์›Œํฌ๋ฅผ ๋น ๋ฅด๊ฒŒ ๋งŒ๋“ค์ž๋Š” ์•„์ด๋””์–ด์—์„œ ์ถœ๋ฐœ

Method

1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN์— ํ†ต๊ณผ์‹œ์ผœ Feature Map์„ ์–ป๋Š”๋‹ค.

2. ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น Feature Map์„ Region Proposal Network์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๋ก์ƒ‰ Box ์˜์—ญ์„ ๋ฝ‘์•„๋‚ด๊ฒŒ ๋œ๋‹ค.

3. ๊ทธ๋ ‡๊ฒŒ ๋ฝ‘์•„๋‚ธ ์˜์—ญ์„ ๊ธฐ์กด Feature Map์—์„œ ์ถ”์ถœํ•˜์—ฌ ํ•ด๋‹น ์˜์—ญ์„ RoI Pooling์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

4. FC Layer๋ฅผ ํ†ตํ•ด Classification๊ณผ Regression์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.

5. Fast R-CNN๊ณผ ๋‹ค๋ฅธ์ ์€ ์˜์—ญ์„ ์ œ์•ˆํ•  ๋•Œ, RPN์ด๋ผ๋Š” CNN ๊ธฐ๋ฐ˜์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ

=> GPU์—์„œ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ์–ด ๋น ๋ฆ„ / ์ „์ฒด ๋„คํŠธ์›Œํฌ๋‚ด์— ํฌํ•จ๋˜์–ด End-to-End ๊ตฌ์กฐ

 

RPN : Region Proposal Network

  • faster r-cnn์˜ ํ•ต์‹ฌ์ด๋ผ๊ณ ๋„ ํ•  ์ˆ˜ ์žˆ์Œ.
  • RPN์˜ ์ž…๋ ฅ์€ input image๋กœ๋ถ€ํ„ฐ CNN์„ ํ†ต๊ณผํ•œ Feature Map์ด๋‹ค.
  • ์œ„ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ 7x7 ํฌ๊ธฐ์˜ Feature Map์ด ์žˆ์„ ๋•Œ, 3x3ํฌ๊ธฐ์˜ ์ปค๋„ ํฌ๊ธฐ๋กœ sliding window๋ฐฉ์‹์œผ๋กœ ๋ชจ๋“  ๊ฒฉ์ž ์…€ ๋งˆ๋‹ค ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ์˜ k๊ฐœ Anchor box๋ฅผ ์ •์˜ํ•ด์ค€๋‹ค.
  • Anchor box์™€ GT box์˜ ์ฐจ์ด๋ฅผ regression์œผ๋กœ ์˜ˆ์ธกํ•˜๋ฏ€๋กœ ๋‹ค์–‘ํ•œ predict box๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.
  •  Encoder๋กœ๋ถ€ํ„ฐ (Channel, 7, 7)์˜ Feature Map์„ 3x3 Conv์— Padding์„ 1๋กœ ์ฃผ์–ด (256, 7, 7) ํฌ๊ธฐ๋กœ ๋งŒ๋“ ๋‹ค.
  • ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•˜๋Š”์ง€ ํ•˜์ง€ ์•Š๋Š”์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Classification์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด 1x1 Conv๋กœ 2(๋ฐฐ๊ฒฝ/์ „๊ฒฝ) * 9(anchors) = 18 ์ฑ„๋„๋กœ ๋งŒ๋“ค๊ณ , BBox์˜ ์ขŒํ‘œ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Regression์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด 1x1 Conv๋กœ 4(x,y,x,h) * 9(anchors) = 36์ฑ„๋„์„ ๋งŒ๋“ค๊ฒŒ ๋œ๋‹ค.
  • ์ด๋ฅผํ†ตํ•ด, ๊ฐ Grid(7x7) ๋ณ„๋กœ 9๊ฐœ Anchor์˜ Class์™€ ์ขŒํ‘œ๊ฐ’ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.

=> ๊ฒน์น˜๊ฑฐ๋‚˜ ์ž‘์€ ์‚ฌ๋ฌผ์— ๋Œ€ํ•œ ์ธ์‹๋ฅ  ๋†’์Œ / ๋‹จ์ ์€ ๋А๋ฆผ, ์• ์ดˆ์— ์‹ค์‹œ๊ฐ„ ํ…Œ์Šคํฌ๋ฅผ ์ƒ๊ฐํ•˜๊ณ  ๋งŒ๋“  ๋„คํŠธ์›Œํฌ๋Š” ์•„๋‹˜.

 

SSD

  • RCNN ๊ณ„์—ด์˜ 2-stage detector๋Š” region proposals์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ view๋ฅผ ๋ชจ๋ธ์— ์ œ๊ณต
  • → ๋†’์€ ์ •ํ™•๋„ but, region proposals๋ฅผ ์ถ”์ถœํ•˜๊ณ  ์ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์—์„œ ๋งŽ์€ ์‹œ๊ฐ„ ์†Œ์š” → detection ์†๋„๊ฐ€ ๋А๋ฆฌ๋‹ค.
  • SSD๋Š” ๋‹ค์–‘ํ•œ view๋ฅผ ํ™œ์šฉํ•˜๋ฉด์„œ ํ†ตํ•ฉ๋œ network ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ 1-stage detector → ๋†’์€ ์ •ํ™•๋„์™€ ๋น ๋ฅธ ์†๋„
  • ํฌ๊ฒŒ 2๊ฐ€์ง€ ๊ตฌ์„ฑ: Multi Scale Feature Layer & Default (Anchor) Box

Multi Scale Feature Layer

  • image scale ์ˆ˜ํ–‰์‹œ window์‚ฌ์ด์ฆˆ๋Š” ๊ณ ์ •ํ•ด๋‘๊ณ  ์ˆ˜ํ–‰ -> multi object๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ detectํ•˜๊ธฐ ์–ด๋ ต๊ณ  detect์— ๋ฌธ์ œ๊ฐ€ ๋งŽ๊ฒŒ ๋จ.
  • ์›๋ณธ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ํฌ๊ธฐ์˜ feature map์„ ์ด์šฉํ•œ๋‹ค๋ฉด, ์ •๋ณด๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? - ์•„์ด๋””์–ด!
  • ์˜ˆ๋ฅผ ๋“ค์–ด 32x32 ๊ฐ™์€ ํฐ feature map์€ 8x8์— ๋น„ํ•ด ๋น„๊ต์  ์ž‘์€ object๋“ค์„ ์ž˜ detectํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

Default Box

  • Region proposal๋กœ๋งŒ default box๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋ง๊ณ  ๊ทธ๋ƒฅ object detection์— ๋ฐ”๋กœ ํ™œ์šฉํ•˜์ž๋Š” ์•„์ด๋””์–ด

๊ตฌ์ฒด์ ์œผ๋กœ ์ €๋Ÿฐ ๊ณ ์–‘์ด ๊ฐ•์•„์ง€ ์‚ฌ์ง„์ด ์žˆ์„ ๋•Œ, 8x8์—์„œ๋Š” 2๋ฒˆ์งธ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด anchor box๋“ค์ด ๊ตฌ์ถ•๋˜๋Š”๋ฐ, ๊ณ ์–‘์ด ๋ถ€๋ถ„์—์„œ ํŒŒ๋ž—๊ฒŒ ํ‘œ์‹œ๋œ ๋ถ€๋ถ„์ด ๋ฐ”๋กœ GT์™€ ๋งค์นญ๋œ anchor box๋ฅผ ์˜๋ฏธํ•˜๊ฒŒ ๋œ๋‹ค. ๋งค์นญ ๊ธฐ์ค€์€ IOU 50% ์ด์ƒ์„ ๊ธฐ์ค€์œผ๋กœ ํŒ๋ณ„.
๋ฐ˜๋ฉด 8x8์—์„œ๋Š” ๊ฐ•์•„์ง€๊ฐ€ detect๋˜์ง€ ์•Š์ง€๋งŒ 4x4์—์„œ๋Š” detect๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
์ด ๋งค์นญ ๋ฐ•์Šค ์ •๋ณด๋กœ classification์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์ด ๋งค์นญ ๋ฐ•์Šค๋Š” GT์™€ ๊ฐ€๊นŒ์›Œ์ง€๊ธฐ ์œ„ํ•ด ๊ณ„์† offset ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋ฉด์„œ Bbox regression์„ ์ˆ˜ํ–‰!

 

=> ๋‹จ์  Data augmentation ์˜์กด๋„๊ฐ€ ๋งค์šฐ ํผ.(์ž‘์€ object์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด) : ํŠน์ • IOU ๊ธฐ์ค€์œผ๋กœ filterํ•ด์„œ samplingํ•ด์„œ ๋‹ค์‹œ ratio ๋งž์ถ”๋Š” ๋“ฑ ๋ณต์žกํ•œ ๋ฐฉ๋ฒ•์˜ augmentation์„ ์‚ฌ์šฉํ•จ.

 

YOLO : You Only Look Once

  • One stage detector์„ ์‹œ์ž‘ํ•จ.
  • ๊ธฐ๋ณธ์ ์œผ๋กœ settingํ•˜๋Š” ๊ฒƒ์€ cell ๋‹จ์œ„๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ
  • 7x7 grid๋กœ ๋‚˜๋ˆ„๊ฒŒ ๋˜๊ณ  ๊ฐ grid์˜ cell์ด ํ•˜๋‚˜์˜ obejct์— ๋Œ€ํ•œ detection์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋จ.
  • ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๊ฐ grid cell์ด 2๊ฐœ์˜ bounding box์˜ ํ›„๋ณด๋ฅผ ๋„์ถœํ•˜๊ฒŒ ๋˜๊ณ , ๊ทธ bbox๋“ค์„ ์‹ค์ œ ground truth์— ๊ทผ์‚ฌ์‹œํ‚ค๋ฉด์„œ ํ•™์Šต์„ ์ˆ˜ํ–‰.

Method

1. ์šฐ์„  input image๋ฅผ ์ž…๋ ฅํ•  ๋•Œ backbone ๋„คํŠธ์›Œํฌ๋กœ๋Š” VGG๊ฐ€ ์•„๋‹Œ Inception-v1 (googlenet) ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘์„ ์ˆ˜ํ–‰

- Googlenet์˜ ๋Œ€ํ‘œ์ ์ธ ํŠน์ง•์œผ๋กœ๋Š” 1x1 convolution layer๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํŠน์ง• -> feature map์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ์–ด ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์—ฌ์ฃผ๋Š” ํšจ๊ณผ

2. backbone์„ ํ†ต๊ณผํ•˜๊ณ  ๋‚˜๋ฉด, 2๊ฐœ์˜ dense layer๋ฅผ ๊ฑฐ์น˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด๋“ค์€ classification๊ณผ regression์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ ์šฉ๋˜๋Š” layer

3. 2๊ฐœ์˜ dense layer๋ฅผ ๊ฑฐ์ณ ๊ตฌ์„ฑ๋œ "7x7x30" ์˜ feature map์„ ๊ตฌ์ถ•

4. ์ด๋“ค์˜ information์„ ํ™œ์šฉํ•ด์„œ Bbox regression๊ณผ object detection์˜ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ด ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋„คํŠธ์›Œํฌ๋ฅผ ์—…๋ฐ์ดํŠธ

 - 7x7x30์˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ : 7x7x30์˜ feature map์—์„œ cell ํ•˜๋‚˜๋ฅผ ๋ฝ‘๊ฒŒ ๋˜๋ฉด, ๊ทธ cell ํ•˜๋‚˜๋Š” 30์˜ depth / cell ๋‹น 2๊ฐœ์˜ Bbox๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜๋Š”๋ฐ ๊ทธ Bbox์˜ ์ขŒํ‘œ ๊ฐ’๊ณผ confidence score๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ.

 

=> ๋น ๋ฅด์ง€๋งŒ  detection ์ •ํ™•๋„๊ฐ€ ๋†’์ง€๋Š” ์•Š์Œ/ ๋’ค๋กœ version ์—ฌ๋Ÿฌ๊ฐœ ๊ณ„์† ๋‚˜์˜ด.

 

 

๋ฐ˜์‘ํ˜•
๊ณต์ง€์‚ฌํ•ญ
์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
์ตœ๊ทผ์— ๋‹ฌ๋ฆฐ ๋Œ“๊ธ€
Total
Today
Yesterday
๋งํฌ
ยซ   2025/06   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
๊ธ€ ๋ณด๊ด€ํ•จ
๋ฐ˜์‘ํ˜•