37.2° Blog

ArXiv Domain 2025-12-02

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn InteractionDeveloping robust world model reasoning is crucial for large language model (LLM) agents to plan and interact in complex environments. While multi-turn interaction offers a superior understanding of environmental dynamics via authentic feedback, current approaches often impose a rigid reasoning process, which constrains the model’s active learning, ultimately hind ...

ArXiv Domain 2025-12-03

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AIGenerative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a ...

ArXiv Domain 2025-12-04

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. PPTArena: A Benchmark for Agentic PowerPoint EditingWe introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-trut ...

ArXiv Domain 2025-12-05

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. SkillFactory: Self-Distillation For Learning Cognitive BehaviorsReasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that ar ...

ArXiv Domain 2025-12-06

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...

ArXiv Domain 2025-12-07

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...

ArXiv Domain 2025-12-08

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. The Universal Weight Subspace HypothesisWe show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain. Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision ...

ArXiv Domain 2025-12-10

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Relational Visual SimilarityHumans do not just see attribute similarity — we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also like a peach: its crust, mantle, and core correspond to the peach’s skin, flesh, and pit. This ability to perceive and recognize relational similarity, is arguable by cognitive scientist to be what distinguishes humans from other species. Yet, all widely used visual simil ...

ArXiv Domain 2025-12-09

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Enhancing Retrieval-Augmented Generation with Entity Linking for Educational PlatformsIn the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their impressive effectiveness in many areas, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, wh ...

ArXiv Domain 2025-11-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Dataset Distillation for Pre-Trained Self-Supervised Vision ModelsThe task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly initialized models. In contrast, state-of-the-art vision approaches are increasingly building o ...

ArXiv Domain 2025-11-23

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Dataset Distillation for Pre-Trained Self-Supervised Vision ModelsThe task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly initialized models. In contrast, state-of-the-art vision approaches are increasingly building o ...

HuggingFace Papers 2025-08-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...

HuggingFace Papers 2025-08-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...

HuggingFace Papers 2025-08-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLRecent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computatio ...