ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๋ฐ˜์‘ํ˜•
CVPR 2023.
Yu-Lun Liu, Chen Gao ,Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, Jia-Bin Huang
Meta | National Taiwan University | KAIST | University of Maryland, College Park
5 Jan 2023

 

Abstract

Dynamic radiance field reconstruction ๋ฐฉ๋ฒ•์€ ๋™์  ์žฅ๋ฉด์˜ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ•˜๋Š” ๊ตฌ์กฐ์™€ ๋ชจ์–‘์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด์˜ Dynamic radiance field reconstruction ๋ฐฉ๋ฒ•์€ SfM(Structure from Motion) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์ •ํ™•ํ•œ ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ SfM ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋งค์šฐ ๋™์ ์ธ ๊ฐ์ฒด, ์งˆ๊ฐ์ด ๋‚˜์œ ํ‘œ๋ฉด ๋ฐ ํšŒ์ „ํ•˜๋Š” ์นด๋ฉ”๋ผ ๋ชจ์…˜์ด ์žˆ๋Š” ๊นŒ๋‹ค๋กœ์šด ๋น„๋””์˜ค์—์„œ ์ž˜๋ชป๋œ ํฌ์ฆˆ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‹ ๋ขฐํ•  ์ˆ˜ ์—†๋‹ค.

์นด๋ฉ”๋ผ ๋งค๊ฐœ๋ณ€์ˆ˜(poses and focal length)์™€ ํ•จ๊ป˜ static ๋ฐ dynamic radiance fields์„ ๊ณต๋™์œผ๋กœ ์ถ”์ •ํ•˜์—ฌ ์ด๋Ÿฌํ•œ Robustness ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ๊ฒฐ๊ณผ๋Š” state-of-the-art dynamic view synthesis methods ๋ฐฉ๋ฒ•๋ณด๋‹ค ์œ ๋ฆฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค.

 

Related work

 

Dynamic view synthesis.

๋งŽ์€ ์‹œ์Šคํ…œ์ด ๋ณต์žกํ•œ ๊ธฐํ•˜ํ•™ ์žฅ๋ฉด์„ ๋‹ค๋ฃฐ ์ˆ˜ ์—†์œผ๋ฉฐ, interactive view manipulation์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด multi-view์™€ timesynchronized ์˜์ƒ์„ ์ž…๋ ฅ์œผ๋กœ ์š”๊ตฌํ•œ๋‹ค.

์ตœ๊ทผ์—๋Š” NeRF๋ฅผ ํ™•์žฅํ•˜์—ฌ dynamic scenes๋ฅผ ์ฒ˜๋ฆฌํ•œ๋‹ค.

space-time synthesis๋Š” ๊ฒฐ๊ณผ๊ฐ€ ์ธ์ƒ์ ์ด์ง€๋งŒ, ์ •ํ™•ํ•œ camera pose๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์˜์กดํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์€ COLMAP ํ˜น์€ SfM systems์— ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์–ด๋ ค์šด ์žฅ๋ฉด์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ known camera poses ์—†์ด ๋ณต์žกํ•œ ๋™์  ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Visual odometry and camera pose estimation.

์ด๋ฏธ์ง€ ๋ชจ์Œ์—์„œ visual odometry๋Š” 3D ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์ถ”์ •ํ•œ๋‹ค.

์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์€ photometric consistency๋ฅผ ๊ทน๋Œ€ํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ

์ƒ์„ฑ or ํ•™์Šต๋œ feature์— ์˜์กดํ•˜๋Š” feature-based ๋ฐฉ๋ฒ• ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋‰œ๋‹ค.

casually captured video์—์„œ ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ผ์ด๋‹ค.

NeRF ๊ธฐ๋ฐ˜ ๊ธฐ์ˆ ์€ ์ •์  ์‹œํ€€์Šค๋กœ ์ œํ•œ๋˜์ง€๋งŒ, ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ์‹ ๊ฒฝ 3D ํ‘œํ˜„๊ณผ ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ๊ฒฐํ•ฉํ•˜๋„๋ก ์ œ์•ˆ๋˜์—ˆ๋‹ค. 

๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐฉ๋ฒ•์€ ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ , ๋™์  ๊ฐ์ฒด ๋ชจ๋ธ์„ ๋™์‹œ์— ๋ชจ๋ธ๋งํ•œ๋‹ค.

 

Method

3.1 ์„น์…˜์—์„œ Neural radiance fields์˜ ๋ฐฐ๊ฒฝ๊ณผ ์นด๋ฉ”๋ผ ํฌ์ฆˆ ์ถ”์ • ๋ฐ ๋™์  ์žฅ๋ฉด ํ‘œํ˜„์˜ ํ™•์žฅ์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์†Œ๊ฐœ.
์„น์…˜ 3.2์—์„œ ๋ฐฉ๋ฒ•์˜ ๊ฐœ์š”๋ฅผ ์„ค๋ช…
3.3์ ˆ์˜ ์ •์  ๋ณต์‚ฌ ํ•„๋“œ ์žฌ๊ตฌ์„ฑ์„ ํ†ตํ•ด ์นด๋ฉ”๋ผ ํฌ์ฆˆ ์ถ”์ •์˜ ์„ธ๋ถ€ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๋…ผ์˜
์„น์…˜ 3.4์—์„œ ๋™์  ์žฅ๋ฉด์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•

์„น์…˜ 3.5์—์„œ ๊ตฌํ˜„ ์„ธ๋ถ€ ์‚ฌํ•ญ ์„ค๋ช…

 

3.1 Preliminaries

NeRF.

Neural radiance fields (NeRF)์€ $Θ$๋กœ ํŒŒ๋ผ๋ฏธํ„ฐํ™” ๋œ implicit MLP๋กœ ์ •์ ์ธ(static) 3D ์žฅ๋ฉด์„ ๋‚˜ํƒ€๋‚ด๊ณ ,

3D ํฌ์ง€์…˜ $(x, y, z)$์— ๋งคํ•‘, viewing direction $(θ, ฯ•)$๋ฅผ ๋Œ€์‘๋˜๋Š” ์ƒ‰ c์™€ ๋ฐ€๋„(density) $σ$d์— ๋งคํ•‘ํ•œ๋‹ค.

 

์นด๋ฉ”๋ผ ์›์ ์—์„œ ๋ฐฉ์ถœ๋˜๋Š” ray์— ๋”ฐ๋ผ ๋ณผ๋ฅจ ๋ Œ๋”๋ง์„ ์ ์šฉํ•˜์—ฌ ํ”ฝ์…€ ์ƒ‰์ƒ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

- $δ(i)$ : ray๋ฅผ ๋”ฐ๋ผ ๋‘ ์ƒ˜ํ”Œ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ

- $N$ : ๊ฐ ray ์œ„์˜ ์ƒ˜ํ”Œ ์ˆ˜

- $T(i)$ : ๋ˆ„์ ๋œ ํˆฌ๋ช…๋„

 

๋ Œ๋”๋ง ๋œ ์ƒ‰์ƒ $Cˆ$์™€ ์‹ค์ธก ์ƒ‰์ƒ $C$ ์‚ฌ์ด์˜ reconstruction ์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜์—ฌ radiance field๋ฅผ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Explicit neural voxel radiance fields.

๋ Œ๋”๋ง ์งˆ์ด ๋›ฐ์–ด๋‚˜์ง€๋งŒ, NeRF ๋ฐฉ๋ฒ•์€ ๋†’์€ storage ํšจ์œจ์„ฑ์„ ์œ„ํ•ด MLP์™€ ๊ฐ™์€ implicitํ•œ ํ‘œํ˜„์œผ๋กœ ์žฅ๋ฉด์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

์ด๋Ÿฐ ๋ฐฉ๋ฒ•์˜ ๋‹จ์ ์€ ํ›ˆ๋ จ์†๋„๊ฐ€ ๋งค์šฐ ๋А๋ฆฌ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

๋‹จ์  ๊ทน๋ณต์„ ์œ„ํ•ด ์ตœ๊ทผ์˜ ๋ฐฉ๋ฒ•๋“ค์€ explicit voxels๋กœ radiance field๋ฅผ ๋ชจ๋ธ๋ง ํ•  ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค.

๋งคํ•‘ ํ•จ์ˆ˜๋ฅผ voxel grid๋กœ ๋Œ€์ฒดํ•˜๊ณ , voxel๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ๋ง๋œ feature๋“ค์„ ์ง์ ‘ ์ตœ์ ํ™” ํ•œ๋‹ค.

view-dependent ํšจ๊ณผ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด shallow MLP๋ฅผ ์ ์šฉํ•œ๋‹ค. MLP์‚ฌ์šฉ๋Ÿ‰์ด ์ค„์–ด๋“ค๊ธฐ ๋•Œ๋ฌธ์—, ํ›ˆ๋ จ ์‹œ๊ฐ„์ด ๋ช‡์‹œ๊ฐ„์œผ๋กœ ๋‹จ์ถ•๋œ๋‹ค.

๋ณธ ์—ฐ๊ตฌ ๋˜ํ•œ ์ด๋Ÿฐ ์ž‘์—…์—์„œ explicit representation๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

 

3.2 Method Overview

N๊ฐœ์˜ ํ”„๋ ˆ์ž„์ด ์žˆ๋Š” ์ž…๋ ฅ ๋น„๋””์˜ค ์‹œํ€€์Šค๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ๊ณต๋™์œผ๋กœ ์นด๋ฉ”๋ผ ํฌ์ฆˆ, focal length(์ดˆ์  ๊ฑฐ๋ฆฌ), static ๋ฐ dynamic Radiance field๋ฅผ ์ตœ์ ํ™”ํ•œ๋‹ค.

static๊ณผ dynamic ๋ถ€๋ถ„์„ explicit neural voxels $Vs$์™€ $Vd$๋กœ ํ‘œํ˜„ํ•œ๋‹ค.

 

Static radiance field

static radiance field๋Š” ์ •์  ์žฅ๋ฉด์„ ์žฌ๊ตฌ์„ฑํ•˜๊ณ  camera pose์™€ focal length๋ฅผ ์ถ”์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

  1. ์ƒ˜ํ”Œ๋ง๋œ ์ขŒํ‘œ ์™€ viewing direction $๋ฅผ ๋ชจ๋‘ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋ฐ€๋„ $์™€ ์ƒ‰์ƒ $๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.
  2. ์ •์  ๋ถ€๋ถ„์˜ ๋ฐ€๋„๋Š” ์‹œ๊ฐ„๊ณผ viewing direction์— invariantํ•˜๋ฏ€๋กœ ์ฟผ๋ฆฌ๋œ feature์˜ ํ•ฉ์„ ๋ฐ€๋„๋กœ ์‚ฌ์šฉํ•œ๋‹ค(MLP๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ ).
  3. ์ •์  ์˜์—ญ์— ๋Œ€ํ•œ loss๋งŒ ๊ณ„์‚ฐํ•˜๋ฉฐ ๊ณ„์‚ฐ๋œ gradient๋Š” static voxel field์™€ MLP๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์นด๋ฉ”๋ผ ํŒŒ๋ผ๋ฏธํ„ฐ์—๋„ ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ „๋‹ฌํ•œ๋‹ค.

Dynamic radiance field

dynamic radiance field๋Š” ๋น„๋””์˜ค์˜ ์žฅ๋ฉด ์—ญํ•™(์ผ๋ฐ˜์ ์œผ๋กœ ์›€์ง์ด๋Š” ๋ฌผ์ฒด์— ์˜ํ•ด ๋ฐœ์ƒ)์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

  1. ์ƒ˜ํ”Œ๋ง๋œ ์ขŒํ‘œ์™€ ์‹œ๊ฐ„ $๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Canonical space์—์„œ deformed coordinates ๋ฅผ ์–ป๋Š”๋‹ค.
  2. dynamic voxel field์—์„œ ๋ณ€ํ˜•๋œ ์ขŒํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ feature๋ฅผ ์ฟผ๋ฆฌํ•˜๊ณ  time index์™€ ํ•จ๊ป˜ feature๋ฅผ time-dependent ์–•์€ MLP์— ์ „๋‹ฌํ•˜์—ฌ ๋™์  ๋ถ€๋ถ„์˜ ์ƒ‰์ƒ $, ๋ฐ€๋„ $ ๋ฐ nonrigidity $๋ฅผ ์–ป๋Š”๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณผ๋ฅจ ๋ Œ๋”๋ง ํ›„ ์ •์  ๋ฐ ๋™์  ๋ถ€๋ถ„์—์„œ RGB image $๊นŠ์ด ๋งต $๋ฅผ nonrigidity mask $์™€ ํ•จ๊ป˜ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์ดํ›„ ํ”„๋ ˆ์ž„๋ณ„ reconstruction loss๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋•Œ, ํ”„๋ ˆ์ž„๋ณ„ loss๋งŒ ํฌํ•จํ•œ๋‹ค.

 

Canonical space : representative pose๋ฅผ ๊ฐ€์ง€๋Š” frame (ex: t=0)

 

 

3.3 Camera Pose Estimation

Motion mask generation.

๋น„๋””์˜ค์—์„œ ๋™์  ์˜์—ญ์„ ์ œ์™ธํ•˜๋ฉด ์นด๋ฉ”๋ผ ํฌ์ฆˆ ์ถ”์ •์ด ์ˆ˜์›”ํ•ด์ง„๋‹ค.

๊ธฐ์กด ๋ฐฉ๋ฒ•์€ Mask R-CNN๊ณผ ๊ฐ™์€ instance segmentation ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๊ณตํ†ต ์ด๋™ ๊ฐ์ฒด๋ฅผ ๋งˆ์Šคํ‚นํ•˜๊ณค ํ–ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ž…๋ ฅ ๋น„๋””์˜ค์—์„œ ๋ฌผ์ด ํ๋ฅด๊ฑฐ๋‚˜ ๋‚˜๋ฌด๊ฐ€ ํ”๋“ค๋ฆฌ๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ด ํ™œ๋™์„ฑ์ด ๋งŽ์€ ๊ฐ์ฒด๋Š” ๊ฐ์ง€/๋ถ„ํ• ํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

๋”ฐ๋ผ์„œ Mask R-CNN์˜ mask ์™ธ์—๋„ ์—ฐ์† ํ”„๋ ˆ์ž„์˜ optical flow์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ํ–‰๋ ฌ์„ ์ถ”์ •ํ•œ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ Sampson distance(๊ฐ ํ”ฝ์…€์—์„œ ์ถ”์ •๋œ epipolar line๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ž„๊ณ„๊ฐ’์„ ์ง€์ •ํ•˜์—ฌ binary motion mask๋ฅผ ์–ป๋Š”๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ Mask R-CNN๊ณผ epipolar distance ์ž„๊ณ„๊ฐ’์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… motion mask๋ฅผ ์–ป๋Š”๋‹ค.

 

Coarse-to-fine static scene reconstruction.

๋จผ์ € camera pose์™€ ํ•จ๊ป˜ static radiance field๋ฅผ ์žฌ๊ตฌ์„ฑํ•œ๋‹ค.

6D camera pose $ ๋ฐ ๋ชจ๋“  ์ž…๋ ฅ ํ”„๋ ˆ์ž„์ด ๋™์‹œ์— ๊ณต์œ ํ•˜๋Š” focal length $๋ฅผ ๊ณต๋™์œผ๋กœ ์ตœ์ ํ™”ํ•œ๋‹ค.

๊ธฐ์กด ํฌ์ฆˆ ์ถ”์ • ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, coarse-to-fine ๋ฐฉ๋ฒ•์œผ๋กœ static scene representation์„ ์ตœ์ ํ™”ํ•œ๋‹ค.

์ด ๋ฐฉ๋ฒ•์€ energy surface๊ฐ€ ๋” ๋ถ€๋“œ๋Ÿฌ์›Œ์ง€๊ธฐ(?) ๋•Œ๋ฌธ์— camera pose ์ถ”์ •์— ํ•„์ˆ˜์ ์ด๋‹ค.

๋”ฐ๋ผ์„œ optimizer๋Š” sub-optimal solution์— ๊ฐ‡ํž ๊ฐ€๋Šฅ์„ฑ์ด ์ ๋‹ค(๊ทธ๋ฆผ 4(a) vs. ๊ทธ๋ฆผ 4(d)).

 

Late viewing direction conditioning.

์ฃผ๋œ ๊ฐ๋…์€ photometric consistency loss์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ์ ํ™”๋Š” neural voxel์„ ์šฐํšŒํ•˜๊ณ (?) viewing direction์—์„œ ์ถœ๋ ฅ ์ƒ˜ํ”Œ ์ƒ‰์ƒ๊นŒ์ง€์˜ ๋งคํ•‘ ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ, ์ƒ‰์ƒ MLP์˜ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์—์„œ๋งŒ viewing direction์„ ์œตํ•ฉํ•˜๋„๋ก ์„ ํƒํ•œ๋‹ค. ์ด ์„ค๊ณ„๋Š” scene geometry๋ฟ๋งŒ ์•„๋‹ˆ๋ผ camera pose๋„ ์žฌ๊ตฌ์„ฑํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค.

Late viewing direction ์กฐ์ • ์—†์ด MLP๋ฅผ ์ตœ์ ํ™”ํ•จ์œผ๋กœ์จ photometric loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด, ์ž˜๋ชป๋œ camera pose ๋ฐ geometry ์ถ”์ •์„ ์ดˆ๋ž˜ํ•  ์ˆ˜ ์žˆ๋‹ค(๊ทธ๋ฆผ 4(c)).

 

Losses.

Training Loss

 

(a) loss ๊ณ„์‚ฐ์—์„œ ๋™์  ์˜์—ญ์„ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•ด motion mask๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

(b) scene flow MLP๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ Œ๋”๋ง๋œ ๋ณผ๋ฅจ 3D ํฌ์ธํŠธ์˜ 3D ์›€์ง์ž„์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

  • Reprojection loss : 3D ๋ณผ๋ฅจ ๋ Œ๋”๋ง ํฌ์ธํŠธ๋ฅผ ์ด์›ƒ ํ”„๋ ˆ์ž„์— ํˆฌ์˜ํ•˜์—ฌ ์‚ฌ์ „ ๊ณ„์‚ฐ๋œ flow์™€ ์œ ์‚ฌํ•˜๋„๋ก ์œ ๋„
  • Disparity loss : ์ด์›ƒ ํ”„๋ ˆ์ž„์˜ ๋‘ ๋Œ€์‘ํ•˜๋Š” ํฌ์ธํŠธ์—์„œ ๋ Œ๋”๋ง๋œ ๋ณผ๋ฅจ 3D ํฌ์ธํŠธ๊ฐ€ ์œ ์‚ฌํ•œ  ๊ฐ’์„ ๊ฐ€์ง€๋„๋ก ๊ฐ•์ œ
  • Monocular depth loss : ๋ณผ๋ฅจ ๋ Œ๋”๋ง ๊นŠ์ด์™€ ์‚ฌ์ „ ๊ณ„์‚ฐ๋œ MiDaS ๊นŠ์ด ์‚ฌ์ด์˜ ์Šค์ผ€์ผ ๋ฐ shift-invariant loss๋ฅผ ๊ณ„์‚ฐ

 

์ •์  ์˜์—ญ์—์„œ ์˜ˆ์ธก ์™€ ์บก์ฒ˜๋œ ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ photometric loss์„ ์ตœ์†Œํ™”ํ•œ๋‹ค

  • M : motion mask

๋ณต์žกํ•œ ์นด๋ฉ”๋ผ ๊ถค์ ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€๋กœ auxiliary loss๋ฅผ ๋„์ž…ํ•˜์—ฌ ํ›ˆ๋ จ์„ ์ •๊ทœํ™”ํ•œ๋‹ค.

 

 

Conclusions

์šฐ๋ฆฌ๋Š” ์นด๋ฉ”๋ผ ํฌ์ฆˆ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์š”๊ตฌํ•˜์ง€ ์•Š๊ณ  ๋ฌด์‹ฌ์ฝ” ์บก์ฒ˜ํ•œ ๋‹จ์•ˆ ๋น„๋””์˜ค์˜ ์‹œ๊ณต๊ฐ„ ํ•ฉ์„ฑ์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๋™์  ๊ด‘๋Ÿ‰ ํ•„๋“œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
์ œ์•ˆ๋œ ๋ชจ๋ธ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋‹ค์–‘ํ•œ ๋„์ „์ ์ธ ๋น„๋””์˜ค์—์„œ ์ •ํ™•ํ•œ ๋™์  ๊ด‘๋Ÿ‰ ํ•„๋“œ๋ฅผ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์šฐ๋ฆฌ๋Š” ์ตœ์ฒจ๋‹จ ๊ธฐ์ˆ ๊ณผ ๊ด‘๋ฒ”์œ„ํ•œ ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์  ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.

 

๋ฐ˜์‘ํ˜•
๊ณต์ง€์‚ฌํ•ญ
์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
์ตœ๊ทผ์— ๋‹ฌ๋ฆฐ ๋Œ“๊ธ€
Total
Today
Yesterday
๋งํฌ
ยซ   2025/06   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
๊ธ€ ๋ณด๊ด€ํ•จ
๋ฐ˜์‘ํ˜•