ArXiv Domain 2025-11-27
数据来源:ArXiv Domain
LLM Domain Papers1. MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging ModalitiesTraditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first ...
ArXiv Domain 2025-11-28
数据来源:ArXiv Domain
LLM Domain Papers1. Revisiting Generalization Across Difficulty Levels: It’s Not So EasyWe investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data leads to better results, and whether those gains come on easier or harder test data. We address this question by conducting a systematic evaluation of LLM ...
ArXiv Domain 2025-11-29
数据来源:ArXiv Domain
LLM Domain Papers1. Revisiting Generalization Across Difficulty Levels: It’s Not So EasyWe investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data leads to better results, and whether those gains come on easier or harder test data. We address this question by conducting a systematic evaluation of LLM ...
ArXiv Domain 2025-11-30
数据来源:ArXiv Domain
LLM Domain Papers1. Revisiting Generalization Across Difficulty Levels: It’s Not So EasyWe investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data leads to better results, and whether those gains come on easier or harder test data. We address this question by conducting a systematic evaluation of LLM ...
ArXiv Domain 2025-12-01
数据来源:ArXiv Domain
LLM Domain Papers1. Revisiting Generalization Across Difficulty Levels: It’s Not So EasyWe investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data leads to better results, and whether those gains come on easier or harder test data. We address this question by conducting a systematic evaluation of LLM ...
ArXiv Domain 2025-12-02
数据来源:ArXiv Domain
LLM Domain Papers1. Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn InteractionDeveloping robust world model reasoning is crucial for large language model (LLM) agents to plan and interact in complex environments. While multi-turn interaction offers a superior understanding of environmental dynamics via authentic feedback, current approaches often impose a rigid reasoning process, which constrains the model’s active learning, ultimately hind ...
ArXiv Domain 2025-12-03
数据来源:ArXiv Domain
LLM Domain Papers1. EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AIGenerative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a ...
ArXiv Domain 2025-12-04
数据来源:ArXiv Domain
LLM Domain Papers1. PPTArena: A Benchmark for Agentic PowerPoint EditingWe introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-trut ...
ArXiv Domain 2025-12-05
数据来源:ArXiv Domain
LLM Domain Papers1. SkillFactory: Self-Distillation For Learning Cognitive BehaviorsReasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that ar ...
ArXiv Domain 2025-12-06
数据来源:ArXiv Domain
LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...
ArXiv Domain 2025-12-07
数据来源:ArXiv Domain
LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...
ArXiv Domain 2025-12-08
数据来源:ArXiv Domain
LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...
ArXiv Domain 2025-12-09
数据来源:ArXiv Domain
LLM Domain Papers1. Enhancing Retrieval-Augmented Generation with Entity Linking for Educational PlatformsIn the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their impressive effectiveness in many areas, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, wh ...
ArXiv Domain 2025-12-10
数据来源:ArXiv Domain
LLM Domain Papers1. Relational Visual SimilarityHumans do not just see attribute similarity — we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also like a peach: its crust, mantle, and core correspond to the peach’s skin, flesh, and pit. This ability to perceive and recognize relational similarity, is arguable by cognitive scientist to be what distinguishes humans from other species. Yet, all widely used visual simil ...
ArXiv Domain 2025-12-11
数据来源:ArXiv Domain
LLM Domain Papers1. Astra: General Interactive World Model with Autoregressive DenoisingRecent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose scenarios and various forms of actions. To bridge this gap, we introduce Astra, an interactiv ...