Publications
2026
Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models
- Shengli Zhou, Minghang Zheng, Feng Zheng✉, Yang Liu✉
- Accepted by CVPR 2026
- This paper proposes QuatRoPE, a linear-scalable 3D positional embedding method that computes pairwise object spatial relations via quaternion rotations in Transformer attention layers, and the Isolated Gated RoPE Extension (IGRE) to minimize its interference with LLMs’ original language RoPE, while also introducing the ASR benchmark for pure 3D spatial reasoning evaluation; extensive experiments show that the proposed methods consistently boost the 3D spatial reasoning performance of LLMs on multiple 3D VL benchmarks and the ASR benchmark, outperforming strong baselines and validating their effectiveness.
2025
Learn 3D VQA Better with Active Selection and Reannotation
- Shengli Zhou, Yang Liu, Feng Zheng✉
- Accepted by ACM MM 2025 / Paper (ACM Digital Library) / Paper (arXiv) / Code (GitHub)
- To address the negative impact of inevitable improper annotation in 3D Visual Question-Answering and the scarcity of annotations, we propose a multi-turn interactive active learning strategy, combining semantic variance-based data selection with interactive oracle reannotation, enhancing answer quality and reducing training costs.
HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
- Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng✉
- Accepted by ICANN 2025 / Paper (Springer) / Paper (arXiv) / Code (GitHub)
- 3D VQA models suffer from superficial shortcuts due to high model complexity and data scarcity; thus, we propose a hierarchical concentration narrowing supervision paradigm to guide the model to perform spatial reasoning under a general pathway and suppress superficial shortcuts.