About me

I am a Researcher at INSAIT, working with Prof. Luc Van Gool and Dr. Danda Paudel. I received my MSc from ETH Zürich, where I conducted 3D Vision and Graphics research at Disney Research | Studios Zürich (with Prof. Markus Gross) and at VLG (with Prof. Siyu Tang). I obtained my Bachelor’s degree from City University of Hong Kong.

My research lies at the intersection of vision-language modeling, spatial AI, and controllable visual representations. I aim to build models that jointly reason about language and 3D environments, enabling fine-grained, controllable generation and editing of both 2D and 3D scene representations.

Research Interests

  • Vision-Language Models & Multimodal Reasoning
  • Spatial AI and 3D Scene Understanding
  • Controllable 2D / 3D Generation & Editing
  • Neural Rendering and Inverse Rendering

Outside research, I enjoy Rendering, Photography, video games, fingerstyle guitar, table tennis, skiing, and hiking.

News

  • 2026.01🎉 My first-author paper EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark has been accepted to ICLR 2026!
  • 2025.09🎉 Our paper StateSpaceDiffuser: Bringing Long Context to Diffusion World Models has been accepted to NeurIPS 2025!
  • 2025.04I joined INSAIT as a Researcher, supervised by Prof. Luc Van Gool and Dr. Danda Paudel!
  • 2024.10🎉 My first-author paper RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering has been accepted to 3DV 2025!
  • 2023.10🎉 My first-author paper CoARF: Controllable 3D Artistic Style Transfer for Radiance Fields has been accepted to 3DV 2024!

Selected Publications

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging BenchmarkICLR 2026
The first comprehensive benchmark for egocentric vision understanding in low-light and nighttime conditions, comprising synthetic scenes (EgoNight-Synthetic), aligned day–night pairs (EgoNight-Sofia), and unaligned nighttime footage (EgoNight-Oxford).
StateSpaceDiffuser: Bringing Long Context to Diffusion World ModelsNeurIPS 2025
A diffusion world model that overcomes the memory bottleneck by integrating features from a state-space model representing the entire interaction history, enabling long-context world modeling.
RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering3DV 2025
An end-to-end relightable neural inverse-rendering system enabling high-quality reconstruction of geometry and material properties. The core idea is a two-stage approach for better factorization of scene parameters, supporting high-quality relighting of glossy objects.
CoARF: Controllable 3D Artistic Style Transfer for Radiance Fields3DV 2024
A novel algorithm for controllable 3D scene stylization that enables style transfer for specified objects, compositional 3D style transfer, and semantic-aware style transfer via segmentation masks and label-dependent losses.
EgoSpot teaser
A mixed-reality system on HoloLens 2 that enables users to control the Boston Dynamics Spot robot through egocentric multimodal signals — gaze, gesture, and voice — making robot teleoperation more accessible and intuitive.

Selected Projects

Auto-scrolls horizontally · hover to pause and explore.

Experience

Auto-scrolls vertically · hover any card to pause and expand details.

Academic Services

  • Conference Reviewer · ICML 2026 Gold Reviewer · NeurIPS 2026 · CVPR 2026
  • Journal Reviewer · IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)