37.2° Blog

HuggingFace Papers 2025-08-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation ModelsWe present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance ...

HuggingFace Papers 2025-08-14

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. WebWatcher: Breaking New Frontier of Vision-Language Deep Research AgentWeb agents such as Deep Research have demonstrated superhuman cognitive abilities, capable of solving highly challenging information-seeking problems. However, most research remains primarily text-centric, overlooking visual information in the real world. This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, lo ...

HuggingFace Papers 2025-08-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...

HuggingFace Papers 2025-08-16

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical ReasoningMultimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this ...

HuggingFace Papers 2025-08-17

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical ReasoningMultimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this ...

HuggingFace Papers 2025-08-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. SSRL: Self-Search Reinforcement LearningWe investigate the potential of large language models (LLMs) to serve as efficient simulators for agentic search tasks in reinforcement learning (RL), thereby reducing dependence on costly interactions with external search engines. To this end, we first quantify the intrinsic search capability of LLMs via structured prompting and repeated sampling, which we term Self-Search. Our results reveal that LLMs exhibit str ...

HuggingFace Papers 2025-08-20

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Ovis2.5 Technical ReportWe present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the degradation from fixed-resolution tiling and preserving both fine detail and global layout — crucial for visually dense content like complex charts. To strengthen reasoning, we tr ...

HuggingFace Papers 2025-08-18

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical ReasoningMultimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this ...

HuggingFace Papers 2025-08-22

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. DuPO: Enabling Reliable LLM Self-Verification via Dual Preference OptimizationWe present DuPO, a dual learning-based preference optimization framework that generates annotation-free feedback via a generalized duality. DuPO addresses two key limitations: Reinforcement Learning with Verifiable Rewards (RLVR)’s reliance on costly labels and applicability restricted to verifiable tasks, and traditional dual learning’s restriction to strictly dual task pairs ...

HuggingFace Papers 2025-08-24

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...

HuggingFace Papers 2025-08-25

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...

HuggingFace Papers 2025-08-26

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMsIn this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptati ...

HuggingFace Papers 2025-08-27

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and EfficiencyWe introduce InternVL 3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and onlin ...

HuggingFace Papers 2025-08-29

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Beyond Transcription: Mechanistic Interpretability in ASRInterpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and inte ...

HuggingFace Papers 2025-08-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based ModelingRecent advancements in aligning large language models via reinforcement learning have achieved remarkable gains in solving complex reasoning problems, but at the cost of expensive on-policy rollouts and limited exploration of diverse reasoning paths. In this work, we introduce TreePO, involving a self-guided rollout algorithm that views ...