ArXiv Domain 2026-01-26
数据来源:ArXiv Domain
LLM Domain Papers1. Why Can’t I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action RecognitionWe study Compositional Video Understanding (CVU), where models must recognize verbs and objects and compose them to generalize to unseen combinations. We find that existing Zero-Shot Compositional Action Recognition (ZS-CAR) models fail primarily due to an overlooked failure mode: object-driven verb shortcuts. Through systematic analysis, we show tha ...
ArXiv Domain 2026-01-27
数据来源:ArXiv Domain
LLM Domain Papers1. A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMsUnderstanding the curvature evolution of the loss landscape is fundamental to analyzing the training dynamics of neural networks. The most commonly studied measure, Hessian sharpness ($λ_{\max}^H$) — the largest eigenvalue of the loss Hessian — determines local training stability and interacts with the learning rate throughout training. Despite its significance in ana ...
ArXiv Domain 2026-01-20
数据来源:ArXiv Domain
LLM Domain Papers1. How Long Is a Piece of String? A Brief Empirical Analysis of TokenizersFrontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the token. In general, tokens are used as a stable currency, assumed to be broadly consistent across tokenizers and contexts, enabling direct comparisons. However, tokenization varies significantly across ...
ArXiv Domain 2026-01-10
数据来源:ArXiv Domain
LLM Domain Papers1. Optimal Lower Bounds for Online MulticalibrationWe prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend on both context and the learner’s predictions, we prove an $Ω(T^{2/3})$ lower bound on expected multicalibration error using just three disjoint binary groups. This matches the upper bounds of Noarov et al. (2025) up to log ...
ArXiv Domain 2026-01-29
数据来源:ArXiv Domain
LLM Domain Papers1. Evaluation of Oncotimia: An LLM based system for supporting tumour boardsMultidisciplinary tumour boards (MDTBs) play a central role in oncology decision-making but require manual processes and structuring large volumes of heterogeneous clinical information, resulting in a substantial documentation burden. In this work, we present ONCOTIMIA, a modular and secure clinical tool designed to integrate generative artificial intelligence (GenAI) into oncology wo ...
ArXiv Domain 2026-01-31
数据来源:ArXiv Domain
LLM Domain Papers1. RedSage: A Cybersecurity Generalist LLMCybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models lacking domain adaptation. To bridge this gap, we curate 11.8B tokens of cybersecurity-focused continual pretraining data via large-scale web filtering and manual collection of high-quality resources, spanning 28.6K docume ...
ArXiv Domain 2026-01-28
数据来源:ArXiv Domain
LLM Domain Papers1. ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language ModelsText embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and precluding potentially valuable generative use cases. In this work, we align Large Language Models to embeddings of clinical trials using the recently reported Embeddi ...
ArXiv Domain 2026-01-22
数据来源:ArXiv Domain
LLM Domain Papers1. VideoMaMa: Mask-Guided Video Matting via Generative PriorGeneralizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though ...
HuggingFace Papers 2025-08-09
数据来源:HuggingFace Papers
Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
HuggingFace Papers 2025-08-10
数据来源:HuggingFace Papers
Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...
HuggingFace Papers 2025-08-21
数据来源:HuggingFace Papers
Latest Papers1. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLRecent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computatio ...
ArXiv Domain 2026-01-30
数据来源:ArXiv Domain
LLM Domain Papers1. Recursive Language ModelsWe study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference paradigm that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs can successfully process inputs up to tw ...