Urban Radiance Field Representation with Deformable Neural Mesh Primitives

ICCV 2023

1Tongji University, 2The Chinese University of Hong Kong, 3Shanghai AI Laboratory, 4CPII
*Equal contribution

Taking the patchy point clouds as inputs, we first voxelize the points and then initialize our Deformable Neural Mesh Primitive (DNMP) for each voxel. During training, the shapes of DNMPs are deformed to model the underlying 3D structures, while the radiance features of DNMPs are learnt to model the local radiance information for neural rendering. Based on our representation, we achieve efficient and photo-realistic rendering for urban scenes.


Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive (DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: (1) High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. (2) Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33x faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels).

Method overview


The entire scene is voxelized based on the point-cloud reconstruction, where each voxel is assigned a DNMP to parameterize the geometry and radiance of the local area. By rasterization, we can obtain the interpolated radiance features from the intersected DNMPs for each view ray. Thereafter, these interpolated features along with the view-dependent embeddings are sent to an implicit function to predict the radiance value and opacity of each intersection point. Finally, the rendering color of the view ray is obtained by blending the radiance values according to the opacities.


Qualitative comparisons

KITTI-360 dataset


Waymo dataset


Demo videos

Scene geometry optimization

Texture editing

Object duplication

Object removal

Novel semantic synthesis


  author    = {Lu, Fan and Xu, Yan and Chen, Guang and Li, Hongsheng and Lin, Kwan-Yee and Jiang, Changjun},
  title     = {Urban Radiance Field Representation with Deformable Neural Mesh Primitives},
  journal   = {ICCV},
  year      = {2023},