avatar
Articles
501
Tags
24
Categories
15

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • Weibo
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • Weibo
  • HF
  • Arxiv
Archives
Categories
About
HuggingFace Papers 2025-08-23
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...
HuggingFace Papers 2025-08-24
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...
HuggingFace Papers 2025-08-25
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...
HuggingFace Papers 2025-08-26
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMsIn this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptati ...
HuggingFace Papers 2025-08-27
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and EfficiencyWe introduce InternVL 3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and onlin ...
HuggingFace Papers 2025-08-28
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based ModelingRecent advancements in aligning large language models via reinforcement learning have achieved remarkable gains in solving complex reasoning problems, but at the cost of expensive on-policy rollouts and limited exploration of diverse reasoning paths. In this work, we introduce TreePO, involving a self-guided rollout algorithm that views ...
HuggingFace Papers 2025-08-29
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Beyond Transcription: Mechanistic Interpretability in ASRInterpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and inte ...
HuggingFace Papers 2025-08-30
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...
HuggingFace Papers 2025-08-31
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...
HuggingFace Papers 2025-09-01
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement LearningRecent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplifie ...
HuggingFace Papers 2025-09-02
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce LearningMultimodal Large Language Models (MLLMs) equipped with step-by-step thinking capabilities have demonstrated remarkable performance on complex reasoning problems. However, this thinking process is redundant for simple problems solvable without complex reasoning. To address this inefficiency, we propose R-4B, an auto-thinking MLLM, which can ad ...
HuggingFace Papers 2025-09-03
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic ReasoningCritic-free reinforcement learning methods, particularly group policies, have attracted considerable attention for their efficiency in complex tasks. However, these methods rely heavily on multiple sampling and comparisons within the policy to estimate advantage, which may cause the policy to fall into local optimum and increase computational cost. To address these issues, we propos ...
HuggingFace Papers 2025-09-04
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. The Landscape of Agentic Reinforcement Learning for LLMs: A SurveyThe emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decis ...
HuggingFace Papers 2025-09-05
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Robix: A Unified Model for Robot Interaction, Reasoning and PlanningWe introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex ins ...
HuggingFace Papers 2025-09-06
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthWe introduce Drivelology, a unique linguistic phenomenon characterised as “nonsense with depth”, utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We f ...
1…161718…34
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
501
Tags
24
Categories
15
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
No title2025-10-17
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
Categories
  • AI190
  • Cython1
  • DSA24
  • GitHub111
  • LLMs16
Tags
DSARLTransformerLLMsPLPaperReadingDeepLearningCVGPTdomaingithubhfweiboleetcodealgoArXivDomainAIGitHubTrendingHuggingFacePapers微博热搜
Archives
  • October 20251
  • January 20245
  • December 202314
  • November 202326
  • October 20231
Info
Article :
501
Run time :
Total Count :
20887k
UV :
PV :
Last Push :
©2023 - 2025 By Firefly
Search
Loading the Database