37.2° Blog

HuggingFace Papers 2026-03-25

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World ModelsVideo—based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text—video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling ...

HuggingFace Papers 2026-03-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. PixelSmile: Toward Fine-Grained Facial Expression EditingFine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a dif ...

HuggingFace Papers 2026-03-29

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. PixelSmile: Toward Fine-Grained Facial Expression EditingLLM Analysis Q: 这篇论文试图解决什么问题？该论文旨在解决细粒度面部表情编辑中的核心挑战，具体包括以下几个关键问题： 1. 语义重叠导致的结构化混淆面部表情存在于连续的语义流形上，本质上相互重叠（如恐惧与惊讶共享”睁大眼睛、张嘴”特征，愤怒与厌恶共享”皱眉、负面情绪”特征）。现有基于离散类别标签（one-hot标签）的训练方法强制将连续的表情划分为刚性边界，导致：生成模型在潜在空间中学习纠缠的表征编辑某一情绪时意外触发其他情绪特征（如编辑恐惧时混入惊讶特征）人类标注者、分类器和生成模型均出现系统性的跨类别混淆 2. 缺乏连续、细粒度的表情控制现有方法主要依赖离散标签或粗略的参考信号，无法捕捉人类情感的细微结构，导致：无法精确控制表情强度（intensity）的连续变化难以在语义相邻的表情之间实现平滑、线性的过渡大强度编辑时易出现身份漂移（identi ...

HuggingFace Papers 2026-03-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. PixelSmile: Toward Fine-Grained Facial Expression EditingLLM Analysis Q: 这篇论文试图解决什么问题？该论文旨在解决细粒度面部表情编辑中的核心挑战，具体包括以下几个关键问题： 1. 语义重叠导致的结构化混淆面部表情存在于连续的语义流形上，本质上相互重叠（如恐惧与惊讶共享”睁大眼睛、张嘴”特征，愤怒与厌恶共享”皱眉、负面情绪”特征）。现有基于离散类别标签（one-hot标签）的训练方法强制将连续的表情划分为刚性边界，导致：生成模型在潜在空间中学习纠缠的表征编辑某一情绪时意外触发其他情绪特征（如编辑恐惧时混入惊讶特征）人类标注者、分类器和生成模型均出现系统性的跨类别混淆 2. 缺乏连续、细粒度的表情控制现有方法主要依赖离散标签或粗略的参考信号，无法捕捉人类情感的细微结构，导致：无法精确控制表情强度（intensity）的连续变化难以在语义相邻的表情之间实现平滑、线性的过渡大强度编辑时易出现身份漂移（identi ...

HuggingFace Papers 2026-03-31

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World ModelsVideo world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing subjects. To address this, we introduce Hybrid Memory, a novel paradigm requiring ...

HuggingFace Papers 2026-04-04

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language ModelsData-centric training has emerged as a promising direction for improving large language models (LLMs) by optimizing not only model parameters but also the selection, composition, and weighting of training data during optimization. However, existing approaches to data selection, data mixture optimization, and data reweighting are often developed in isolated codebases w ...

HuggingFace Papers 2026-04-06

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. A Simple Baseline for Streaming Video UnderstandingRecent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published streaming models. We formalize this baseline as SimpleStream and evaluate it against 13 major offline and on ...

HuggingFace Papers 2026-04-07

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Self-Distilled RLVROn-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide dense, fine-grained signals for each sampled trajectory, in contrast to reinforcement learning with verifiable rewards (RLVR), which only obtains sparse signals from verifiable outcomes in the environment. Recently, the community has explored on-policy self-distillation (OPSD), where t ...

HuggingFace Papers 2026-04-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video UnderstandingWith the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously evaluate the robustness and faithfulness of video und ...

HuggingFace Papers 2026-04-14

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. WildDet3D: Scaling Promptable 3D Detection in the WildUnderstanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection—recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must generalize beyond closed-set categories, support diverse prompt modalities, and leverage geometric cues when ava ...

HuggingFace Papers 2026-04-15

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code GenerationLarge Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBench+, a unified benchmark spanning Qiskit, PennyLane, and Cirq, with 42 aligned tasks covering quantum algorithms, gate deco ...

HuggingFace Papers 2026-04-16

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsGUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training su ...

HuggingFace Papers 2026-04-17

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Seedance 2.0: Advancing Video Generation for World ComplexitySeedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrat ...

HuggingFace Papers 2026-04-21

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Elucidating the SNR-t Bias of Diffusion Probabilistic ModelsLLM Analysis Q: 这篇论文试图解决什么问题？这篇论文旨在解决**扩散概率模型（Diffusion Probabilistic Models, DPMs）中的信噪比-时间步偏差（Signal-to-Noise Ratio-timestep, SNR-t bias）**问题。具体而言，该问题可细分为以下几个方面： 1. 核心问题：SNR-t 偏差的定义训练阶段：样本的信噪比（SNR）与其对应的时间步 t 被严格耦合，即 SNR(t) = αt/(1 - α_t) 。神经网络 εθ(·, t) 在这种确定性对应关系下学习去噪。推理阶段：由于神经网络预测误差和数值求解器的离散化误差累积，反向去噪轨迹偏离理想路径，导致预测样本 x_t 的实际 SNR 与预设时间步 t 不再匹配。 2. 偏差的具体表现网络预测失准：当输入样本的 SNR 与当前时 ...

HuggingFace Papers 2026-04-22

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Extending One-Step Image Generation from Class Labels to Text via Discriminative Text RepresentationFew-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existing research on MeanFlow primarily focuses on class-to-image generation. However, an intuitive yet unexplored direction is to extend the condition from fixed class labels to flexible text inputs, enabling ric ...