37.2° Blog

HuggingFace Papers 2025-09-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. VCRL: Variance-based Curriculum Reinforcement Learning for Large Language ModelsPolicy-based reinforcement learning currently plays an important role in improving LLMs on mathematical reasoning tasks. However, existing rollout-based reinforcement learning methods (GRPO, DAPO, GSPO, etc.) fail to explicitly consider LLMs’ learning ability for samples of different difficulty levels, which is contrary to the human cognitive process of mathematical reasoning ...

HuggingFace Papers 2025-09-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear AttentionIn Diffusion Transformer (DiT) models, particularly for video generation, attention latency is a major bottleneck due to the long sequence length and the quadratic complexity. We find that attention weights can be separated into two parts: a small fraction of large weights with high rank and the remaining weights with very low rank. This naturally suggests applying spa ...

HuggingFace Papers 2025-10-01

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear AttentionIn Diffusion Transformer (DiT) models, particularly for video generation, attention latency is a major bottleneck due to the long sequence length and the quadratic complexity. We find that attention weights can be separated into two parts: a small fraction of large weights with high rank and the remaining weights with very low rank. This naturally suggests applying spa ...

HuggingFace Papers 2025-10-02

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP UseMCP standardizes how LLMs interact with external systems, forming the foundation for general agents. However, existing MCP benchmarks remain narrow in scope: they focus on read-heavy tasks or tasks with limited interaction depth, and fail to capture the complexity and realism of real-world workflows. To address this gap, we propose MCPMark, a benchmark designed to evaluate MCP use ...

HuggingFace Papers 2025-10-03

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree SearchAlthough RLVR has become an essential component for developing advanced reasoning skills in LLMs, contemporary studies have documented training plateaus that emerge following thousands of optimization steps, demonstrating notable decreases in performance gains despite increased computational investment. This limitation stems from the sparse ex ...

HuggingFace Papers 2025-10-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LongCodeZip: Compress Long Context for Code Language ModelsCode generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general tex ...

HuggingFace Papers 2025-10-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LongCodeZip: Compress Long Context for Code Language ModelsCode generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general tex ...

HuggingFace Papers 2025-10-06

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LongCodeZip: Compress Long Context for Code Language ModelsCode generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general tex ...

HuggingFace Papers 2025-10-07

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Apriel-1.5-15b-ThinkerWe present Apriel-1.5-15B-Thinker, a 15-billion parameter open-weights multimodal reasoning model that achieves frontier-level performance through training design rather than sheer scale. Starting from Pixtral-12B, we apply a progressive three-stage methodology: (1) depth upscaling to expand reasoning capacity without pretraining from scratch, (2) staged continual pre-training that first develops foundational text and vision underst ...

HuggingFace Papers 2025-10-08

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Paper2Video: Automatic Video Generation from Scientific PapersAcademic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, ...

HuggingFace Papers 2025-10-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Cache-to-Cache: Direct Semantic Communication Between Large Language ModelsMulti-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation l ...

HuggingFace Papers 2025-10-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Cache-to-Cache: Direct Semantic Communication Between Large Language ModelsMulti-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation l ...

HuggingFace Papers 2025-10-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Agent Learning via Early ExperienceA long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current age ...

HuggingFace Papers 2025-10-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Agent Learning via Early ExperienceA long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current age ...

HuggingFace Papers 2025-10-13

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Agent Learning via Early ExperienceA long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current age ...