37.2° Blog

HuggingFace Papers 2026-05-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. ResearchMath-14K: Scaling Research-Level Mathematics via AgentsAbstract:The frontier of mathematics is defined by problems whose solutions are not yet known, yet it remains unclear whether language models can meaningfully engage with such problems without human intervention. A major obstacle is the lack of large-scale research-level math datasets. To this end, we introduce ResearchMath-14k, a set of $14{,}056$ problems curated from academic sources via a ...

HuggingFace Papers 2026-05-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and SecurityAbstract:Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightwe ...

HuggingFace Papers 2026-06-01

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LongDS-Bench: On the Failure of Long-Horizon Agentic Data AnalysisAbstract:Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents’ ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. LongD ...

HuggingFace Papers 2026-06-02

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation ModelsAbstract:Spatial intelligence requires visual representations that capture both semantic objects and geometric structure in the physical world. To support this, two major pre-training schemes are now widely used as foundation backbones: Vision-Language Models (VLMs), which use language supervision to align visual observatio ...

HuggingFace Papers 2026-06-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline MatchingAbstract:Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained perception, and occlusion reasoning, making it a challenging testbed for spatial reasoning in multimodal large language models (MLLMs) deployed in physical environments. However, current MLLMs lack systematic evaluation and training frameworks for these capabilities. ...

HuggingFace Papers 2026-06-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. OPRD: On-Policy Representation DistillationAbstract:On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen’s ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose O ...

HuggingFace Papers 2026-06-06

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software EvolutionAbstract:Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA — costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork fram ...

HuggingFace Papers 2026-06-08

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. dots.tts Technical ReportAbstract:We present this http URL, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multiple objectives to build a semantically structured and prediction-friendly continuous speech space. Second, we use full-history conditioning ...

HuggingFace Papers 2026-06-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Your UnEmbedding Matrix is Secretly a Feature Lens for Text EmbeddingsAbstract:Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal performance on massive text embedding benchmarks. In this paper, we identify a potential cause underlying this deficiency. Our motivation stems from an unexpected observation: text e ...

HuggingFace Papers 2026-06-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. What Should Agents Say? Action-state Communication for Efficient Multi-Agent SystemsAbstract:Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system ...

HuggingFace Papers 2026-06-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. ICA Lens: Interpreting Language Models Without Training Another DictionaryAbstract:Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and rai ...

HuggingFace Papers 2026-06-13

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic EnvironmentsAbstract:Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introdu ...

HuggingFace Papers 2026-06-15

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. When is Your LLM Steerable?Abstract:Activation steering offers a lightweight approach to control language models’ behavior at inference time, but whether it succeeds or fails heavily depends on the prompt, concept, model, and steering configuration. Finding the regime and boundaries of successful steering typically requires expensive grid searches and post-hoc evaluation of full autoregressive rollouts. In this work, we investigate whether steerability c ...

HuggingFace Papers 2026-06-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level PerformanceAbstract:While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we ...

HuggingFace Papers 2026-06-20

Created2019-06-18|AI