Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Abstract

Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive (DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: (1) High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. (2) Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33x faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels).

Method overview

The entire scene is voxelized based on the point-cloud reconstruction, where each voxel is assigned a DNMP to parameterize the geometry and radiance of the local area. By rasterization, we can obtain the interpolated radiance features from the intersected DNMPs for each view ray. Thereafter, these interpolated features along with the view-dependent embeddings are sent to an implicit function to predict the radiance value and opacity of each intersection point. Finally, the rendering color of the view ray is obtained by blending the radiance values according to the opacities.

Video

Qualitative comparisons

KITTI-360 dataset

Waymo dataset

Demo videos

Scene geometry optimization

Texture editing

Object duplication

Object removal

Novel semantic synthesis

BibTeX

@article{lu2023dnmp,
  author    = {Lu, Fan and Xu, Yan and Chen, Guang and Li, Hongsheng and Lin, Kwan-Yee and Jiang, Changjun},
  title     = {Urban Radiance Field Representation with Deformable Neural Mesh Primitives},
  journal   = {ICCV},
  year      = {2023},
}