37.2° Blog

HuggingFace Papers 2026-01-18

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. STEP3-VL-10B Technical ReportWe present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language sy ...

HuggingFace Papers 2026-01-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. STEP3-VL-10B Technical ReportWe present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language sy ...

HuggingFace Papers 2026-01-20

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Your Group-Relative Advantage Is BiasedReinforcement Learning from Verifier Rewards (RLVR) has emerged as a widely used approach for post-training large language models on reasoning tasks, with group-based methods such as GRPO and its variants gaining broad adoption. These methods rely on group-relative advantage estimation to avoid learned critics, yet its theoretical properties remain poorly understood. In this work, we uncover a fundamental issue of g ...

HuggingFace Papers 2026-01-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. ABC-Bench: Benchmarking Agentic Backend Coding in Real-World DevelopmentThe evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving. However, current benchmarks predominantly evaluate code logic in static contexts, neglecting the dynamic, full-process requirements of real-world engineering, particularly in backend ...

HuggingFace Papers 2026-01-22

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment GeneralizationWe introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal “mother tongue” for physical ...

HuggingFace Papers 2026-01-23

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Agentic Reasoning for Large Language ModelsReasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, ...

HuggingFace Papers 2026-01-24

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceThe development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work ...

HuggingFace Papers 2026-01-25

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceThe development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work ...

HuggingFace Papers 2026-01-26

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceThe development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work ...

HuggingFace Papers 2026-01-27

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LongCat-Flash-Thinking-2601 Technical ReportWe introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demons ...

HuggingFace Papers 2026-01-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMsData preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them, which is essential for a wide range of data-centric applications. Driven by (i) rising demands for application-ready data (e.g., for analytics, visualization, decision-making), (ii) increasingly powerful LLM techniques, and (iii) the emergence of i ...

HuggingFace Papers 2026-01-29

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and SecurityThe rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally c ...

HuggingFace Papers 2026-01-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question ReformulationReinforcement Learning with Verifiable Rewards (RLVR) offers a robust mechanism for enhancing mathematical reasoning in large models. However, we identify a systematic lack of emphasis on more challenging questions in existing methods from both algorithmic and data perspectives, despite their importance for refining underdeveloped capabiliti ...

HuggingFace Papers 2026-01-31

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific NarrativesAutonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, and reasoning over large volumes of scientific literatur ...

HuggingFace Papers 2026-03-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. From Scale to Speed: Adaptive Test-Time Scaling for Image EditingImage Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image and instruction. This mismatch causes three challenges when applying Image-CoT to editing: in ...