37.2° Blog

HuggingFace Papers 2026-03-15

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time TrainingHumans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video streams is essential for spatial intelligence. The core challenge is not simply longer context windows but how spatial information is selected, organized, and retained over time. I ...

HuggingFace Papers 2026-03-16

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time TrainingLLM Analysis Q: 这篇论文试图解决什么问题？这篇论文旨在解决流式视觉空间智能（Streaming Visual-based Spatial Intelligence）中的核心挑战，即如何让多模态大语言模型（MLLMs）有效地处理和理解长时程视频流中的三维空间信息。具体而言，论文针对以下关键问题： 1. 长时程空间信息的持续维护与更新现有MLLMs主要依赖固定参数进行推理，难以处理实际场景中连续、无界的视觉观测流（如机器人导航、自动驾驶中的长视频流）。论文指出，核心挑战并非简单地扩展上下文窗口长度，而是如何在时间维度上选择、组织和保留空间证据，使模型能够像人类一样通过连续观测逐步构建和维护对三维环境的理解。 2. 计算效率与内存瓶颈二次方复杂度困境：标准Transformer的注意力机制具有二次方复杂度，直接扩展输入序列处 ...

HuggingFace Papers 2026-03-17

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LMEB: Long-horizon Memory Embedding BenchmarkMemory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models’ ability to handle long-horizon memory retrieval tasks involving fragmented, context-dependent, and temporally distant information. To address this, we introduce the Long-horizon ...

HuggingFace Papers 2026-03-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. InCoder-32B: Code Foundation Model for Industrial ScenariosRecent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code ...

HuggingFace Papers 2026-03-20

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the WildLarge language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing ...

HuggingFace Papers 2026-03-22

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Generation Models Know Space: Unleashing Implicit 3D Priors for Scene UnderstandingWhile Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which are limited by data scarcity and generalization challenges. In this work, we ...

HuggingFace Papers 2026-03-23

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Generation Models Know Space: Unleashing Implicit 3D Priors for Scene UnderstandingWhile Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which are limited by data scarcity and generalization challenges. In this work, we ...

ArXiv Domain 2026-02-05

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual LearningWe develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two compl ...

ArXiv Domain 2026-02-11

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based DrivingOut of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in {0,1,2,3}$). Using closed loop co ...

ArXiv Domain 2026-02-27

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and DatasetsThe reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a fully automated framework designed to address these challenges by e ...

ArXiv Domain 2026-03-03

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data ScienceThe fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking. There are two major gaps in existing benchmarks: (i) the lack of standardized, process-aware evaluation that captures instruction adherence and process fidelity, and (ii) the scarcity of accurately labeled training dat ...

ArXiv Domain 2026-03-08

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. RoboPocket: Improve Robot Policies Instantly with Your PhoneScaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy’s weaknesses, leading to inefficient coverage of critical state distributions. C ...

ArXiv Domain 2026-03-09

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. RoboPocket: Improve Robot Policies Instantly with Your PhoneScaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy’s weaknesses, leading to inefficient coverage of critical state distributions. C ...

ArXiv Domain 2026-03-13

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. COMIC: Agentic Sketch Comedy GenerationWe propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction ...

ArXiv Domain 2026-03-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. NavTrust: Benchmarking Trustworthiness for Embodied NavigationThere are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world setting ...