avatar
Articles
879
Tags
25
Categories
16

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
ArXiv Domain 2026-01-18
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite MatchingTool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to ...
ArXiv Domain 2026-01-19
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite MatchingTool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to ...
ArXiv Domain 2026-01-20
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. How Long Is a Piece of String? A Brief Empirical Analysis of TokenizersFrontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the token. In general, tokens are used as a stable currency, assumed to be broadly consistent across tokenizers and contexts, enabling direct comparisons. However, tokenization varies significantly across ...
ArXiv Domain 2026-01-21
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. How Long Is a Piece of String? A Brief Empirical Analysis of TokenizersFrontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the token. In general, tokens are used as a stable currency, assumed to be broadly consistent across tokenizers and contexts, enabling direct comparisons. However, tokenization varies significantly across ...
ArXiv Domain 2026-01-22
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. VideoMaMa: Mask-Guided Video Matting via Generative PriorGeneralizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though ...
ArXiv Domain 2026-01-23
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Iterative Refinement Improves Compositional Image GenerationText-to-image (T2I) models have achieved remarkable progress, yet they continue to struggle with complex prompts that require simultaneously handling multiple objects, relations, and attributes. Existing inference-time strategies, such as parallel sampling with verifiers or simply increasing denoising steps, can improve prompt alignment but remain inadequate for richly compositional settings where ...
HuggingFace Papers 2025-08-09
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
HuggingFace Papers 2025-08-21
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLRecent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computatio ...
HuggingFace Papers 2025-08-10
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
1…5859
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
879
Tags
25
Categories
16
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI383
  • Cython1
  • DSA24
  • GitHub208
  • HotNews57
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhot_newshfArXivDomainAIGitHubTrending微博热搜HotNewsHuggingFacePapersleetcodealgo
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
879
Run time :
Total Count :
43199.7k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database