37.2° Blog

ArXiv Domain 2025-08-02

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World ModelAI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the out ...

ArXiv Domain 2025-08-03

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World ModelAI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the out ...

ArXiv Domain 2025-08-04

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World ModelAI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the out ...

ArXiv Domain 2025-08-05

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Beyond Fixed: Variable-Length Denoising for Diffusion Large Language ModelsDiffusion Large Language Models (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length alloc ...

ArXiv Domain 2025-08-06

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Test Set Quality in Multilingual LLM EvaluationSeveral multilingual benchmark datasets have been developed in a semi-automatic manner in the recent past to measure progress and understand the state-of-the-art in the multilingual capabilities of Large Language Models. However, there is not a lot of attention paid to the quality of the datasets themselves, despite the existence of previous work in identifying errors in even fully human-annotated test sets. I ...

ArXiv Domain 2025-07-26

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsKnowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased e ...

ArXiv Domain 2025-08-07

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome RewardAnswer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex ...

ArXiv Domain 2025-08-08

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples ReplayThe continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. ...

ArXiv Domain 2025-08-09

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...

ArXiv Domain 2025-08-10

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...

ArXiv Domain 2025-08-11

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn ConversationsLarge Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we ...

ArXiv Domain 2025-08-12

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token PruningLong-context inference for Large Language Models (LLMs) is heavily limited by high computational demands. While several existing methods optimize attention computation, they still process the full set of hidden states at each layer, limiting overall efficiency. In this work, we propose SlimInfer, an innovative framework that aims to accelerate inference by directly pruning less cri ...

ArXiv Domain 2025-08-13

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Jinx: Unlimited LLMs for Probing Alignment FailuresUnlimited, or so-called helpful-only language models are trained without safety alignment constraints and never refuse user queries. They are widely used by leading AI companies as internal tools for red teaming and alignment evaluation. For example, if a safety-aligned model produces harmful outputs similar to an unlimited model, this indicates alignment failures that require further attention. Despite th ...

ArXiv Domain 2025-08-14

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application WorkflowsAutonomous agents powered by large language models (LLMs) are increasingly deployed in real-world applications requiring complex, long-horizon workflows. However, existing benchmarks predominantly focus on atomic tasks that are self-contained and independent, failing to capture the long-term contextual dependencies and multi-interaction coordination required in realisti ...

ArXiv Domain 2025-08-15

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache CompressionTransformer-based Large Language Models rely critically on the KV cache to efficiently handle extended contexts during the decode phase. Yet, the size of the KV cache grows proportionally with the input length, burdening both memory bandwidth and capacity as decoding progresses. To address this challenge, we present RocketKV, a training-free KV cache compression strategy co ...