37.2° Blog

HuggingFace Papers 2025-08-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...

HuggingFace Papers 2025-08-31

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...

HuggingFace Papers 2025-09-01

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...

HuggingFace Papers 2025-09-03

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic ReasoningCritic-free reinforcement learning methods, particularly group policies, have attracted considerable attention for their efficiency in complex tasks. However, these methods rely heavily on multiple sampling and comparisons within the policy to estimate advantage, which may cause the policy to fall into local optimum and increase computational cost. To address these issues, we propos ...

HuggingFace Papers 2025-09-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Robix: A Unified Model for Robot Interaction, Reasoning and PlanningWe introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex ins ...

HuggingFace Papers 2025-09-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. The Landscape of Agentic Reinforcement Learning for LLMs: A SurveyThe emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decis ...

HuggingFace Papers 2025-09-06

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthWe introduce Drivelology, a unique linguistic phenomenon characterised as “nonsense with depth”, utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We f ...

HuggingFace Papers 2025-09-07

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthWe introduce Drivelology, a unique linguistic phenomenon characterised as “nonsense with depth”, utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We f ...

HuggingFace Papers 2025-09-08

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthWe introduce Drivelology, a unique linguistic phenomenon characterised as “nonsense with depth”, utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We f ...

HuggingFace Papers 2025-09-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Why Language Models HallucinateLike students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such “hallucinations” persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical ...

HuggingFace Papers 2025-09-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Parallel-R1: Towards Parallel Thinking via Reinforcement LearningParallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather t ...

HuggingFace Papers 2025-09-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Reverse-Engineered Reasoning for Open-Ended GenerationWhile the deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoning -- reinforcement learning (RL) and instruction distillation -- falter in this area; RL struggles with the absence of clear reward signals and high-quality r ...

HuggingFace Papers 2025-09-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. A Survey of Reinforcement Learning for Large Reasoning ModelsIn this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progres ...

HuggingFace Papers 2025-09-13

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action ModelVision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We int ...

HuggingFace Papers 2025-09-14

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action ModelVision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We int ...