ACoT-VLA | arxiv 2026.01.16 | Paper Reading
InternVLA-A1 | arxiv 2026.01.05 | Paper Reading
PointWorld | arxiv 2026.01.07 | Paper Reading

PointWorld | arxiv 2026.01.07 | Paper Reading

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation

这篇文章提出了一个大型预训练的三维世界模型,模型能够根据静态点云及与具体形态无关的机器人动作描述预测全场景三维点流运动,同时作者构建了大规模三维动力学建模数据集,涵盖真实与仿真环境中单臂、双臂、全身及移动操作等多种交互形态。该模型经多样化数据预训练后,仅需在开放场景中捕获的单帧RGB-D图像,无需额外数据或微调,即可在实体硬件上实现多种操作行为。

Read more
Learning to Remember:Exploring Multimodal Memory Mechanisms in Long Video Understanding | Reading Group
MemoryVLA | ICLR 2026 | Paper Reading
$\pi_{0.5}$ | arxiv 2025.04.22 | Paper Reading

$\pi_{0.5}$ | arxiv 2025.04.22 | Paper Reading

π0.5 : a Vision-Language-Action Model with Open-World Generalization

这篇文章提出了一个基础的VLA模型主要通过真机数据训练旨在让机器人可以适应家用场景。

Read more
InternVLA M1 | arxiv 2025.10.15 | Paper Reading
SP-VLA | arxiv 2025.10.03 | Paper Reading

SP-VLA | arxiv 2025.10.03 | Paper Reading

SP-VLA: A JOINT MODEL SCHEDULING AND TOKEN PRUNING APPROACH FOR VLA MODEL ACCELERATION

这篇文章通过现有双系统进行动态调整以及动态剪枝来达到减低参数同时提升模型精度。

Read more
OpenVLA-OFT | RSS 2025 | Paper Reading
OpenVLA | CoRL 2024 | Paper Reading
WeChatQQGoogle ScholarDailyLogRSS