37.2° Blog

HuggingFace Papers 2025-11-26

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. General Agentic Memory Via Deep ResearchMemory is critical for AI agents, yet the widely-adopted static memory, aiming to create readily available memory in advance, is inevitably subject to severe information loss. To address this limitation, we propose a novel framework called \textbf{general agentic memory (GAM)}. GAM follows the principle of “\textbf{just-in time (JIT) compilation}” where it focuses on creating optimized contexts for its client at ru ...

HuggingFace Papers 2025-11-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Video Generation Models Are Good Latent Reward ModelsReward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach ...

HuggingFace Papers 2025-11-29

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Video Generation Models Are Good Latent Reward ModelsReward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach ...

HuggingFace Papers 2025-11-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Video Generation Models Are Good Latent Reward ModelsReward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach ...

HuggingFace Papers 2025-12-01

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Video Generation Models Are Good Latent Reward ModelsReward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach ...

HuggingFace Papers 2025-12-02

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion TransformerThe landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including Qwen-Image, Hunyuan-Image-3.0 and FLUX.2, are characterized by massive parameter counts (20B to 80B), making them impractical for inference, and fine-tuning on consumer-gr ...

HuggingFace Papers 2025-12-03

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. From Code Foundation Models to Agents and Applications: A Practical Guide to Code IntelligenceLarge language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from ...

HuggingFace Papers 2025-12-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. DeepSeek-V3.2: Pushing the Frontier of Open Large Language ModelsWe introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenario ...

HuggingFace Papers 2025-12-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Qwen3-VL Technical ReportWe introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family includes both dense (2B/4B/8B/32B) and mixture-of-experts (30B-A3B/235B-A22B) variants to accommodate diverse latency-quality trade-offs ...

HuggingFace Papers 2025-12-07

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite LengthExisting diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, streaming audio-driven avatar synthesis. We present Live Avatar, an algorithm-system co-designed framework that enables efficient, high-fidelity, and infinite-length avatar generation usin ...

HuggingFace Papers 2025-12-08

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite LengthExisting diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, streaming audio-driven avatar synthesis. We present Live Avatar, an algorithm-system co-designed framework that enables efficient, high-fidelity, and infinite-length avatar generation usin ...

HuggingFace Papers 2025-12-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial FlowsRecent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While vari ...

HuggingFace Papers 2025-12-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement LearningWe introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start’’ ...

HuggingFace Papers 2025-12-11

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Wan-Move: Motion-controllable Video Generation via Latent Trajectory GuidanceWe present Wan-Move, a simple and scalable framework that brings motion control to video generative models. Existing motion-controllable methods typically suffer from coarse control granularity and limited scalability, leaving their outputs insufficient for practical use. We narrow this gap by achieving precise and high-quality motion control. Our core idea is to directly make t ...

HuggingFace Papers 2025-12-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. StereoWorld: Geometry-Aware Monocular-to-Stereo Video GenerationThe growing adoption of XR devices has fueled strong demand for high-quality stereo video, yet its production remains costly and artifact-prone. To address this challenge, we present StereoWorld, an end-to-end framework that repurposes a pretrained video generator for high-fidelity monocular-to-stereo video generation. Our framework jointly conditions the model on the monocular video input w ...