HuggingFace Papers 2025-12-27
数据来源:HuggingFace Papers
Latest Papers1. Latent Implicit Visual ReasoningWhile Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops. However, these strategies impo ...
HuggingFace Papers 2025-12-28
数据来源:HuggingFace Papers
Latest Papers1. Latent Implicit Visual ReasoningWhile Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops. However, these strategies impo ...
HuggingFace Papers 2025-12-30
数据来源:HuggingFace Papers
Latest Papers1. InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object InsertionRecent advances in diffusion-based video generation have opened new possibilities for controllable video editing, yet realistic video object insertion (VOI) remains challenging due to limited 4D scene understanding and inadequate handling of occlusion and lighting effects. We present InsertAnywhere, a new VOI framework that achieves geometrically consisten ...
HuggingFace Papers 2025-12-29
数据来源:HuggingFace Papers
Latest Papers1. Latent Implicit Visual ReasoningWhile Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops. However, these strategies impo ...
HuggingFace Papers 2025-12-31
数据来源:HuggingFace Papers
Latest Papers1. Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary LossMixture-of-Experts (MoE) models lack explicit constraints to ensure the router’s decisions align well with the experts’ capabilities, which ultimately limits model performance. To address this, we propose expert-router coupling (ERC) loss, a lightweight auxiliary loss that tightly couples the router’s decisions with expert capabilities. Our approach treats each expert’s router embedd ...
HuggingFace Papers 2026-01-01
数据来源:HuggingFace Papers
Latest Papers1. UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric RefinementIn this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeli ...
HuggingFace Papers 2026-01-03
数据来源:HuggingFace Papers
Latest Papers1. Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational ModelingMulti-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage tha ...
HuggingFace Papers 2026-01-04
数据来源:HuggingFace Papers
Latest Papers1. Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational ModelingMulti-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage tha ...
HuggingFace Papers 2026-01-02
数据来源:HuggingFace Papers
Latest Papers1. mHC: Manifold-Constrained Hyper-ConnectionsRecently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training ...
HuggingFace Papers 2026-01-05
数据来源:HuggingFace Papers
Latest Papers1. Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational ModelingMulti-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage tha ...
HuggingFace Papers 2026-01-06
数据来源:HuggingFace Papers
Latest Papers1. NeoVerse: Enhancing 4D World Model with in-the-wild Monocular VideosIn this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse ...
HuggingFace Papers 2026-01-07
数据来源:HuggingFace Papers
Latest Papers1. Can LLMs Predict Their Own Failures? Self-Awareness via Internal CircuitsLarge language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with true correctness. We ask: can LLMs predict their own failures by inspecting internal st ...
HuggingFace Papers 2026-01-08
数据来源:HuggingFace Papers
Latest Papers1. InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit FieldsExisting depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and hinder the geometric detail recovery. This paper introduces InfiniDepth, which represents depth as neural implicit fields. Through a simple yet effective local implicit decod ...
HuggingFace Papers 2026-01-09
数据来源:HuggingFace Papers
Latest Papers1. Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate ForgettingSupervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model’s internal belief, SFT for ...
HuggingFace Papers 2026-01-10
数据来源:HuggingFace Papers
Latest Papers1. GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL OptimizationAs language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each capturing a distinct preference, to guide models toward these desi ...