37.2° Blog

HuggingFace Papers 2026-02-18

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Experiential Reinforcement LearningReinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a trainin ...

HuggingFace Papers 2026-02-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?Sparse Autoencoders (SAEs) have emerged as a promising tool for interpreting neural networks by decomposing their activations into sparse sets of human-interpretable features. Recent work has introduced multiple SAE variants and successfully scaled them to frontier models. Despite much excitement, a growing number of negative results in downstream tasks casts doubt on whether SAEs recov ...

HuggingFace Papers 2026-02-20

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. SLA2: Sparse-Linear Attention with Learnable Routing and QATSparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identif ...

HuggingFace Papers 2026-02-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-TuningMany training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rules, i.e., Top-k and Top-p, fail, and how can we avoid t ...

HuggingFace Papers 2026-02-24

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM TrainingTraining stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference engines all cause the behavior policy to diverge from the current policy, risking training collapse. Importance sampling provides a principled correction for this dis ...

HuggingFace Papers 2026-02-25

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. A Very Big Video Reasoning SuiteRapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over spatiotemporal structure such as continuity, interaction, and causality. However, systematically studying video reasoning and its scal ...

HuggingFace Papers 2026-02-26

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On Data Engineering for Scaling LLM Terminal CapabilitiesDespite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed- ...

HuggingFace Papers 2026-02-27

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential RecommendationModeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challen ...

HuggingFace Papers 2026-03-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. From Scale to Speed: Adaptive Test-Time Scaling for Image EditingImage Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image and instruction. This mismatch causes three challenges when applying Image-CoT to editing: in ...

HuggingFace Papers 2026-03-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Utonia: Toward One Encoder for All Point CloudsWe dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across diverse domains, spanning remote sensing, outdoor LiDAR, indoor RGB-D sequences, object-centric CAD models, and point clouds lifted from RGB-only videos. Despite th ...

HuggingFace Papers 2026-03-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision EncodersVision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing practice that state-of-the-art VLMs must rely on vision encoders initialized via ...

HuggingFace Papers 2026-03-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Lost in Stories: Consistency Bugs in Long Story Generation by LLMsWhat happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts, character traits, and world rules. Existing story generation benchmarks focus mainly on plot ...

HuggingFace Papers 2026-03-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene EditingLeveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, maintaining multi-view consistency in edited results remains challenging, and the extreme scarcity of 3D-consistent editing paired data renders supervised fine-tuning (SFT), the most effective training strategy for editing tasks, infeasible. In this paper, we observe tha ...

HuggingFace Papers 2026-03-13

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. OpenClaw-RL: Train Any Agent Simply by TalkingEvery agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a live, online learning source. We present OpenClaw-RL, a framework built on a simple observation: next-state signals are universal, and policy can learn from all of them simultaneously. Personal conversations, termin ...

HuggingFace Papers 2026-03-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Generation Models Know Space: Unleashing Implicit 3D Priors for Scene UnderstandingWhile Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which are limited by data scarcity and generalization challenges. In this work, we ...