37.2° Blog

HuggingFace Papers 2025-08-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLRecent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computatio ...

ArXiv Domain 2025-07-19

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical JokesHumour, as a complex language form, is derived from myriad aspects of life, whilst existing work on computational humour has focussed almost exclusively on short pun-based jokes. In this work, we investigate whether the ability of Large Language Models (LLMs) to explain humour depends on the particular humour form. We compare models on si ...

ArXiv Domain 2025-07-20

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical JokesHumour, as a complex language form, is derived from myriad aspects of life, whilst existing work on computational humour has focussed almost exclusively on short pun-based jokes. In this work, we investigate whether the ability of Large Language Models (LLMs) to explain humour depends on the particular humour form. We compare models on si ...

ArXiv Domain 2025-07-21

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical JokesHumour, as a complex language form, is derived from myriad aspects of life, whilst existing work on computational humour has focussed almost exclusively on short pun-based jokes. In this work, we investigate whether the ability of Large Language Models (LLMs) to explain humour depends on the particular humour form. We compare models on si ...

ArXiv Domain 2025-07-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuningLarge language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, m ...

ArXiv Domain 2025-07-23

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. The Impact of Language Mixing on Bilingual LLM ReasoningProficient multilingual speakers often intentionally switch languages in the middle of a conversation. Similarly, recent reasoning-focused bilingual large language models (LLMs) with strong capabilities in both languages exhibit language mixing—alternating languages within their chain of thought. Discouraging this behavior in DeepSeek-R1 was found to degrade accuracy, suggesting that language mixing m ...

ArXiv Domain 2025-07-24

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMsWe propose LingBench++, a linguistically-informed benchmark and reasoning framework designed to evaluate large language models (LLMs) on complex linguistic tasks inspired by the International Linguistics Olympiad (IOL). Unlike prior benchmarks that focus solely on final answer accuracy, LingBench++ provides structured reasoning trac ...

ArXiv Domain 2025-07-25

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuningLarge Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the ...

ArXiv Domain 2025-07-26

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsKnowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased e ...

ArXiv Domain 2025-07-27

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsKnowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased e ...

ArXiv Domain 2025-07-28

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader ImpactsMany recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performan ...

ArXiv Domain 2025-07-29

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader ImpactsMany recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performan ...

ArXiv Domain 2025-07-30

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human EvaluationNearly all human work is collaborative; thus, the evaluation of real-world NLP applications often requires multiple dimensions that align with diverse human perspectives. As real human evaluator resources are often scarce and costly, the emerging “LLM-as-a-judge” paradigm sheds light on a promising approach to leverage LLM agents to believably simulat ...

ArXiv Domain 2025-07-31

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. DeepSieve: Information Sieving via LLM-as-a-Knowledge-RouterLarge Language Models (LLMs) excel at many reasoning tasks but struggle with knowledge-intensive queries due to their inability to dynamically access up-to-date or domain-specific information. Retrieval-Augmented Generation (RAG) has emerged as a promising solution, enabling LLMs to ground their responses in external sources. However, existing RAG methods lack fine-grained control over both the qu ...