37.2° Blog

HuggingFace Papers 2026-04-23

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion ItemsRecent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our system maintains a high success rate across challenging cases l ...

HuggingFace Papers 2026-04-24

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language ModelWe present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model ena ...

HuggingFace Papers 2026-04-25

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...

HuggingFace Papers 2026-04-26

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...

HuggingFace Papers 2026-04-27

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...

HuggingFace Papers 2026-04-28

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Agentic World Modeling: Foundations, Capabilities, Laws, and BeyondAs AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We intro ...

HuggingFace Papers 2026-04-29

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. World-R1: Reinforcing 3D Constraints for Text-to-Video GenerationRecent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To fa ...

HuggingFace Papers 2026-04-30

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal AgentsAbstract:We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around th ...

HuggingFace Papers 2026-05-01

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal AgentsAbstract:We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around th ...

HuggingFace Papers 2026-05-05

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion PriorsAbstract:Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unified multimodal framework that leverage ...

HuggingFace Papers 2026-05-06

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. MolmoAct2: Action Reasoning Models for Real-world DeploymentAbstract:Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today’s systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the thre ...

HuggingFace Papers 2026-05-12

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMsAbstract:Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledg ...

HuggingFace Papers 2026-05-13

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. Qwen-Image-2.0 Technical ReportAbstract:We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex ...

HuggingFace Papers 2026-05-19

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document IntelligenceAbstract:Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the correct answer while grounding it in the wrong passage — a critical risk in high-stakes do ...

HuggingFace Papers 2026-05-27

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement LearningAbstract:Reinforcement Learning has become a standard paradigm for aligning Large Language Models with human intent and task requirements. While Group Relative Policy Optimization offers an efficient, value-model-free alternative to Proximal Policy Optimization, adapting it to real-world multi-reward settings remains challenging. Standard scalarization practices ...