avatar
Articles
493
Tags
24
Categories
15

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • Weibo
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • Weibo
  • HF
  • Arxiv
Archives
Categories
About
ArXiv Domain 2025-08-05
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Beyond Fixed: Variable-Length Denoising for Diffusion Large Language ModelsDiffusion Large Language Models (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length alloc ...
ArXiv Domain 2025-08-06
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Test Set Quality in Multilingual LLM EvaluationSeveral multilingual benchmark datasets have been developed in a semi-automatic manner in the recent past to measure progress and understand the state-of-the-art in the multilingual capabilities of Large Language Models. However, there is not a lot of attention paid to the quality of the datasets themselves, despite the existence of previous work in identifying errors in even fully human-annotated test sets. I ...
ArXiv Domain 2025-08-07
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome RewardAnswer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex ...
ArXiv Domain 2025-08-08
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples ReplayThe continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. ...
ArXiv Domain 2025-08-09
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...
ArXiv Domain 2025-08-10
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...
ArXiv Domain 2025-08-11
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...
ArXiv Domain 2025-08-12
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token PruningLong-context inference for Large Language Models (LLMs) is heavily limited by high computational demands. While several existing methods optimize attention computation, they still process the full set of hidden states at each layer, limiting overall efficiency. In this work, we propose SlimInfer, an innovative framework that aims to accelerate inference by directly pruning less cri ...
ArXiv Domain 2025-08-13
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Jinx: Unlimited LLMs for Probing Alignment FailuresUnlimited, or so-called helpful-only language models are trained without safety alignment constraints and never refuse user queries. They are widely used by leading AI companies as internal tools for red teaming and alignment evaluation. For example, if a safety-aligned model produces harmful outputs similar to an unlimited model, this indicates alignment failures that require further attention. Despite th ...
ArXiv Domain 2025-08-14
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application WorkflowsAutonomous agents powered by large language models (LLMs) are increasingly deployed in real-world applications requiring complex, long-horizon workflows. However, existing benchmarks predominantly focus on atomic tasks that are self-contained and independent, failing to capture the long-term contextual dependencies and multi-interaction coordination required in realisti ...
ArXiv Domain 2025-08-15
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache CompressionTransformer-based Large Language Models rely critically on the KV cache to efficiently handle extended contexts during the decode phase. Yet, the size of the KV cache grows proportionally with the input length, burdening both memory bandwidth and capacity as decoding progresses. To address this challenge, we present RocketKV, a training-free KV cache compression strategy co ...
ArXiv Domain 2025-08-16
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...
ArXiv Domain 2025-08-17
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...
ArXiv Domain 2025-08-18
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...
ArXiv Domain 2025-08-19
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Controlling Multimodal LLMs via Reward-guided DecodingAs Multimodal Large Language Models (MLLMs) gain widespread applicability, it is becoming increasingly desirable to adapt them for diverse user needs. In this paper, we study the adaptation of MLLMs through controlled decoding. To achieve this, we introduce the first method for reward-guided decoding of MLLMs and demonstrate its application in improving their visual grounding. Our method involves buildi ...
1…282930…33
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
493
Tags
24
Categories
15
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
No title2025-10-15
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
Categories
  • AI186
  • Cython1
  • DSA24
  • GitHub109
  • LLMs16
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhfweiboArXivDomainAIGitHubTrendingHuggingFacePapers微博热搜leetcodealgo
Archives
  • October 20251
  • January 20245
  • December 202314
  • November 202326
  • October 20231
Info
Article :
493
Run time :
Total Count :
20359.6k
UV :
PV :
Last Push :
©2023 - 2025 By Firefly
Search
Loading the Database