Publications
2026
Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models
- Shengli Zhou, Minghang Zheng, Feng Zheng, Yang Liu✉
- Accepted by CVPR 2026: Project Page / Paper (arXiv) / Code (GitHub)
- This paper proposes QuatRoPE, a linear-scalable 3D positional embedding method that computes pairwise object spatial relations via quaternion rotations in Transformer attention layers, and the Isolated Gated RoPE Extension (IGRE) to minimize its interference with LLMs’ original language RoPE, while also introducing the ASR benchmark for pure 3D spatial reasoning evaluation; extensive experiments show that the proposed methods consistently boost the 3D spatial reasoning performance of LLMs on multiple 3D VL benchmarks and the ASR benchmark, outperforming strong baselines and validating their effectiveness.
CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models
- Shengli Zhou, Xiangchen Wang, Guanhua Chen, and Feng Zheng✉
- Accepted by ACL 2026 (Main Conference): Code (GitHub)
- Existing scene graph pruning for 3D vision-language tasks often discards task-critical relations, harming spatial reasoning. To address this issue, we propose CAPruner, which combines semantic relevance and spatial proximity to estimate relation importance under specific task context, trained without expensive relation-level annotations. Experiments show it preserves key spatial relations and significantly boosts LLM performance on 3D-VL tasks.
2025
Learn 3D VQA Better with Active Selection and Reannotation
- Shengli Zhou, Yang Liu, Feng Zheng✉
- Accepted by ACM MM 2025: Paper (ACM Digital Library) / Paper (arXiv) / Code (GitHub) / 微信公众号
- To address the negative impact of inevitable improper annotation in 3D Visual Question-Answering and the scarcity of annotations, we propose a multi-turn interactive active learning strategy, combining semantic variance-based data selection with interactive oracle reannotation, enhancing answer quality and reducing training costs.
HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
- Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng✉
- Accepted by ICANN 2025: Paper (Springer) / Paper (arXiv) / Code (GitHub)
- 3D VQA models suffer from superficial shortcuts due to high model complexity and data scarcity; thus, we propose a hierarchical concentration narrowing supervision paradigm to guide the model to perform spatial reasoning under a general pathway and suppress superficial shortcuts.