Publications

2026

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Shengli Zhou, Minghang Zheng, Feng Zheng, Yang Liu✉
Accepted by CVPR 2026 Main Conference: Project Page / Paper (CVF) / Paper (arXiv) / Code / 机器之心
Presentation: CVPR Poster Page / Poster / Video / Slides
This paper proposes QuatRoPE, a linear-scalable 3D positional embedding method that computes pairwise object spatial relations via quaternion rotations in Transformer attention layers, and the Isolated Gated RoPE Extension (IGRE) to minimize its interference with LLMs’ original language RoPE, while also introducing the ASR benchmark for pure 3D spatial reasoning evaluation; extensive experiments show that the proposed methods consistently boost the 3D spatial reasoning performance of LLMs on multiple 3D VL benchmarks and the ASR benchmark, outperforming strong baselines and validating their effectiveness.

CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models

Shengli Zhou, Xiangchen Wang, Guanhua Chen✉, and Feng Zheng✉
Accepted by ACL 2026 Main Conference: Project Page / Paper (ACL Anthology) / Paper (arXiv) / Code
Presentation: ACL Conference Page & Video / Poster / Slides
Existing scene graph pruning for 3D vision-language tasks often discards task-critical relations, harming spatial reasoning. To address this issue, we propose CAPruner, which combines semantic relevance and spatial proximity to estimate relation importance under specific task context, trained without expensive relation-level annotations. Experiments show it preserves key spatial relations and significantly boosts LLM performance on 3D-VL tasks.

Learn 3D VQA Better with Active Selection and Reannotation

Shengli Zhou, Yang Liu, Feng Zheng✉
Accepted by ACM MM 2025: Paper (ACM Digital Library) / Paper (arXiv) / Code / 公众号
To address the negative impact of inevitable improper annotation in 3D Visual Question-Answering and the scarcity of annotations, we propose a multi-turn interactive active learning strategy, combining semantic variance-based data selection with interactive oracle reannotation, enhancing answer quality and reducing training costs.

HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision

Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng✉
Accepted by ICANN 2025: Paper (Springer) / Paper (arXiv) / Code
3D VQA models suffer from superficial shortcuts due to high model complexity and data scarcity; thus, we propose a hierarchical concentration narrowing supervision paradigm to guide the model to perform spatial reasoning under a general pathway and suppress superficial shortcuts.