avatar
Articles
950
Tags
25
Categories
16

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
HuggingFace Papers 2025-08-09
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
HuggingFace Papers 2025-08-10
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
HuggingFace Papers 2025-08-21
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLRecent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computatio ...
HuggingFace Papers 2026-02-06
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. ERNIE 5.0 Technical ReportIn this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale de ...
ArXiv Domain 2026-02-05
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual LearningWe develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two compl ...
1…6364
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
950
Tags
25
Categories
16
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI418
  • Cython1
  • DSA24
  • GitHub226
  • HotNews75
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhfhot_newsArXivDomainAIGitHubTrendingHuggingFacePapers微博热搜HotNewsleetcodealgo
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
950
Run time :
Total Count :
48094.1k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database