ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๋ฐ˜์‘ํ˜•
CVPR 2024.
Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta
KadambiUniversity of California | University of Texas at Austin | DEVCOM ARL
6 Dec 2023

 

Introduction

Feature 3DGS๋Š” 3D-GS ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ตœ์ดˆ์˜ feature field distillation(๋ถ„๋ฆฌ!!) ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•˜๋Š” ๋…ผ๋ฌธ์ด๋‹ค.

3DGS ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ Gaussian์—์„œ semantic feature์˜ joint ํ•™์Šต์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค. (semantic feature : object ๋ณ„๋กœ ๊ตฌ๋ถ„๋œ ํŠน์ง•)

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒ‰์ƒ ์ •๋ณด ์™ธ์—๋„ ๊ฐ 3D Gaussian์˜ semantic feature๋ฅผ ํ•™์Šตํ•  ๊ฒƒ์„ ์ œ์•ˆํ•˜๊ณ , ๊ทธ ํ›„ 2D foundation model์„ ์‚ฌ์šฉํ•œ feature field์˜ ์ถ”์ถœ์„ ํ†ตํ•ด segmentation์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์˜€๋‹ค.

 

Method

High-dimensional Semantic Feature Rendering

3D Gaussian์ด radiance field์™€ feature field๋ฅผ ๋ชจ๋‘ explicitํ•˜๊ฒŒ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” high-dimensional segmentic feature ๋ Œ๋”๋ง ๋ฐ feature field distillation์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํŒŒ์ดํ”„๋ผ์ธ์„ ๋„์ž…ํ•˜์˜€๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์ด๋ฉฐ ๋ชจ๋“  2D foundation model๊ณผ ํ˜ธํ™˜๋  ์ˆ˜ ์žˆ๋‹ค.

๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ 2D foundation model์— ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ž„์˜์˜ ํฌ๊ธฐ์™€ ์ž„์˜์˜ feature ์ฐจ์›์˜ 2D feature map์„ ๋ Œ๋”๋งํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

 

1. ์ด๋ฅผ ์œ„ํ•ด 3DGS์˜ ๋ Œ๋”๋ง ํŒŒ์ดํ”„๋ผ์ธ์„ Structure from Motion์„ ์‚ฌ์šฉํ•˜์—ฌ Gaussian์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

2. ๊ธฐ์กด์˜ Gaussian ์†์„ฑ์— semantic feature f์„ ํ†ตํ•ฉํ•œ๋‹ค.

3. feature map์˜ ๊ฐ ํ”ฝ์…€์˜ ๊ฐ’ Fs๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ N์€ ์ฃผ์–ด์ง„ ํ”ฝ์…€๊ณผ ๊ฒน์น˜๋Š” ์ •๋ ฌ๋œ Gaussian ์ง‘ํ•ฉ์ด๊ณ , T๋Š” ํˆฌ๊ณผ์œจ์ด๋‹ค.
Fs์˜ ์•„๋ž˜ ์ฒจ์ž s๋Š” “student”๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ด๋Š” ๋ Œ๋”๋ง๋œ feature๊ฐ€ “teacher” feature FtFt์— ์˜ํ•ด ํ”ฝ์…€ ๋‹จ์œ„๋กœ supervise๋จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
Ft๋Š” 2D foundation model์˜ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ground truth ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉํ•˜์—ฌ ์–ป์€ latent ์ž„๋ฒ ๋”ฉ์ด๋‹ค.
๋ณธ์งˆ์ ์œผ๋กœ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ๋ณผ๋ฅจ ๋ Œ๋”๋ง์„ ํ†ตํ•ด ๋Œ€๊ทœ๋ชจ 2D teacher model์„ ์ž‘์€ 3D student explicit ์žฅ๋ฉด ํ‘œํ˜„ ๋ชจ๋ธ๋กœ ์ถ”์ถœํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

4. Rasterization ๋‹จ๊ณ„์—์„œ๋Š” RGB ์ด๋ฏธ์ง€์™€ feature map์„ ๋…๋ฆฝ์ ์œผ๋กœ rasterizationํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ฐ™์ด ์ตœ์ ํ™”ํ•œ๋‹ค. ์ด๋ฏธ์ง€์™€ feature map ๋ชจ๋‘ ๋™์ผํ•œ ํƒ€์ผ ๊ธฐ๋ฐ˜ rasterization ์ ˆ์ฐจ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด feature map์˜ ์ถฉ์‹ค๋„๊ฐ€ RGB ์ด๋ฏธ์ง€์˜ ์ถฉ์‹ค๋„๋งŒํผ ๋†’๊ฒŒ ๋ Œ๋”๋ง๋˜์–ด ํ”ฝ์…€๋‹น ์ •ํ™•๋„๊ฐ€ ์œ ์ง€๋œ๋‹ค.

 

Optimization and Speed-up

Loss function์€ photometric loss์™€ feature loss์˜ ๊ฒฐํ•ฉ์ด๋‹ค.

 


Ft
(I) ๋Š” 2D foundation model์—์„œ ์–ป์€ ground truth ์ด๋ฏธ์ง€ I์— ๋Œ€ํ•œ feature map์ด๊ณ Fs(^I) ๋Š” ๋ Œ๋”๋ง๋œ feature map์ด๋‹ค.

ํ”ฝ์…€๋‹น L1 loss ๊ณ„์‚ฐ์— ๋Œ€ํ•ด ๋™์ผํ•œ ํ•ด์ƒ๋„ H×W๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด bilinear interpolation์„ ์ ์šฉํ•˜์—ฌ Fs(^I)์˜ ํฌ๊ธฐ๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ์กฐ์ •ํ•œ๋‹ค. ์‹ค์ œ๋กœ γ=1.0γ=1.0, λ=0.2λ=0.2๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๋ Œ๋”๋ง๋œ feature map Fs(^I)∈R H×W×N๊ณผ teacher feature map Ft(I)∈R H×W×M ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ด์ƒ์ ์œผ๋กœ๋Š” N=M ์œผ๋กœ ์ตœ์†Œํ™”ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ 2D foundation model์˜ ๋†’์€ latent ์ฐจ์›์œผ๋กœ ์ธํ•ด (LSeg๋Š” M=512M=512, SAM์€ M=256M=256) ์‹ค์ œ๋กœ M์€ ๋งค์šฐ ํฐ ์ˆ˜์ด๋ฏ€๋กœ, ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› feature map์„ ์ง์ ‘ ๋ Œ๋”๋งํ•˜๋Š” ๋ฐ ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค.

 

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด rasterization ํ”„๋กœ์„ธ์Šค ๋งˆ์ง€๋ง‰์— ์†๋„ ํ–ฅ์ƒ(speed up) ๋ชจ๋“ˆ์„ ๋„์ž…ํ•œ๋‹ค.

์ด ๋ชจ๋“ˆ์€ kernel size 1×11×1๋กœ feature ์ฑ„๋„์„ ์—…์ƒ˜ํ”Œ๋งํ•˜๋Š” lightweight convolution decoder๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ ์ž„์˜์˜ Nโ‰ชM์„ ์‚ฌ์šฉํ•˜์—ฌ fRN์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ , ์ด ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋””์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ feature ์ฑ„๋„์„ ์ผ์น˜์‹œํ‚ค๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์ด๋ฅผ ํ†ตํ•ด ๋‹ค์šด์ŠคํŠธ๋ฆผ task์˜ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค์ง€ ์•Š์œผ๋ฉด์„œ ์ตœ์ ํ™” ํ”„๋กœ์„ธ์Šค์˜ ์†๋„๋ฅผ ํฌ๊ฒŒ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

 

* Photometric loss
: ์ž…๋ ฅ ์ด๋ฏธ์ง€์™€ ๋ Œ๋”๋ง ์ด๋ฏธ์ง€ ๊ฐ„ ํ”ฝ์…€ ๊ฐ’ ์ฐจ์ด๋ฅผ ์†์‹ค๋กœ ์‚ฌ์šฉ

* D-SSIM์€ SSIM์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์†์‹ค ํ•จ์ˆ˜
: ๋‘ ์ด๋ฏธ์ง€์˜ ๋ฐ๊ธฐ(ํ”ฝ์…€ ๊ฐ’ ํฌ๊ธฐ), ๋Œ€๋น„(์ธ์ ‘ ํ”ฝ์…€๊ฐ„ ์ฐจ์ด),v๊ตฌ์กฐ(ํ”ฝ์…€๊ฐ’ ๋ถ„ํฌ๊ธฐ๋ฐ˜ correlation)๋ฅผ ์ด์šฉํ•ด ๋‘ ์ด๋ฏธ์ง€์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐ

 

Promptable Explicit Scene Representation

๊ตฌ์ฒด์ ์œผ๋กœ ์ €์ž๋“ค์€ SAM(Segment Anything model)๊ณผ LSeg๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ณธ ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์˜€๋‹ค.

Segment Anything model

SAM์€ ํŠน์ • task์— ๋Œ€ํ•œ ํ•™์Šต ์—†์ด๋„ 2D์—์„œ promptable/promptless zero-shot segmentation์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

LSeg

LSeg๋Š” zero-shot semantic segmentation์— ์–ธ์–ด ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋„์ž…ํ•˜์˜€๋‹ค.

LSeg๋Š” (DPT ์•„ํ‚คํ…์ฒ˜๊ฐ€ ํฌํ•จ๋œ )์ด๋ฏธ์ง€ feature ์ธ์ฝ”๋”์™€ CLIP์˜ ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ์—ฐ๊ฒฐ์„ 2D ํ”ฝ์…€ ๋ ˆ๋ฒจ๋กœ ํ™•์žฅํ•˜์˜€๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์€ ์ถ”์ถœ๋œ feature field๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ , ์ƒ์ž, ํ…์ŠคํŠธ์— ์˜ํ•ด ํ”„๋กฌํ”„ํŒ…๋˜๋Š” ๋ชจ๋“  2D ๊ธฐ๋Šฅ์„ 3D ์˜์—ญ์œผ๋กœ ํ™•์žฅํ•˜์˜€๋‹ค.

Promptableํ•œ explicit ์žฅ๋ฉด ํ‘œํ˜„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•œ๋‹ค.

ํƒ€๊ฒŸ ํ”ฝ์…€๊ณผ ๊ฒน์น˜๋Š” N๊ฐœ์˜ ์ •๋ ฌ๋œ 3D Gaussian ์ค‘, x์— ๋Œ€ํ•œ ํ”„๋กฌํ”„ํŠธ τ์˜ activation score๋Š” feature space์˜ ์ฟผ๋ฆฌ q(τ)์™€ semantic feature f(x) ์‚ฌ์ด์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„์™€ softmax๋กœ ๊ณ„์‚ฐ๋œ๋‹ค.

Score๊ฐ€ ๋‚ฎ์€ Gaussian๋“ค์„ ํ•„ํ„ฐ๋ง ํ•˜๊ณ , ์ƒ‰์ƒ c(x)์™€ ๋ถˆํˆฌ๋ช…๋„ α(x)๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ๋ฌผ์ฒด ์ถ”์ถœ, ๋ฌผ์ฒด ์ œ๊ฑฐ, ์™ธํ˜• ๋ณ€ํ˜• ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Experiments

Novel view semantic segmentation

Replica ๋ฐ์ดํ„ฐ์…‹ ๋ Œ๋”๋ง ์„ฑ๋Šฅ
Replica ๋ฐ์ดํ„ฐ์…‹ semantic segmentation ์„ฑ๋Šฅ

Replica ๋ฐ์ดํ„ฐ์…‹&LLFF ๋ฐ์ดํ„ฐ์…‹

novel view semantic segmentation ๊ฒฐ๊ณผ ๋น„๊ต

 

Segment Anything from Any View

(a) SAM ์ธ์ฝ”๋”-๋””์ฝ”๋” ๋ชจ๋“ˆ์„ novel view ๋ Œ๋”๋ง ์ด๋ฏธ์ง€์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ์™€
(b) ๋ Œ๋”๋ง๋œ feature๋ฅผ ์ง์ ‘ ๋””์ฝ”๋”ฉํ•˜์—ฌ ์–ป์€ SAM segmentation ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•œ ๊ฒƒ

๋‹ค์Œ์€ NeRF-DFF์™€ novel view segmentation ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•œ ๊ฒƒ์ด๋‹ค. (SAM)

 

Language-guided Editing

NeRF-DFF์™€ ์–ธ์–ด ๊ธฐ๋ฐ˜ ํŽธ์ง‘ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•œ ๊ฒƒ์ด๋‹ค.

 

๋ฐ˜์‘ํ˜•
๊ณต์ง€์‚ฌํ•ญ
์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
์ตœ๊ทผ์— ๋‹ฌ๋ฆฐ ๋Œ“๊ธ€
Total
Today
Yesterday
๋งํฌ
ยซ   2025/04   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
๊ธ€ ๋ณด๊ด€ํ•จ
๋ฐ˜์‘ํ˜•