avatar
Articles
550
Tags
23
Categories
15

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
HuggingFace Papers 2026-04-07
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Self-Distilled RLVROn-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide dense, fine-grained signals for each sampled trajectory, in contrast to reinforcement learning with verifiable rewards (RLVR), which only obtains sparse signals from verifiable outcomes in the environment. Recently, the community has explored on-policy self-distillation (OPSD), where t ...
HuggingFace Papers 2026-04-09
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video UnderstandingWith the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously evaluate the robustness and faithfulness of video und ...
HuggingFace Papers 2026-04-14
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. WildDet3D: Scaling Promptable 3D Detection in the WildUnderstanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection—recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must generalize beyond closed-set categories, support diverse prompt modalities, and leverage geometric cues when ava ...
HuggingFace Papers 2026-04-15
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code GenerationLarge Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBench+, a unified benchmark spanning Qiskit, PennyLane, and Cirq, with 42 aligned tasks covering quantum algorithms, gate deco ...
HuggingFace Papers 2026-04-16
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsGUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training su ...
HuggingFace Papers 2026-04-17
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Seedance 2.0: Advancing Video Generation for World ComplexitySeedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrat ...
HuggingFace Papers 2026-04-21
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Elucidating the SNR-t Bias of Diffusion Probabilistic ModelsLLM Analysis Q: 这篇论文试图解决什么问题? 这篇论文旨在解决**扩散概率模型(Diffusion Probabilistic Models, DPMs)中的信噪比-时间步偏差(Signal-to-Noise Ratio-timestep, SNR-t bias)**问题。 具体而言,该问题可细分为以下几个方面: 1. 核心问题:SNR-t 偏差的定义 训练阶段:样本的信噪比(SNR)与其对应的时间步 t 被严格耦合,即 SNR(t) = αt/(1 - α_t) 。神经网络 εθ(·, t) 在这种确定性对应关系下学习去噪。 推理阶段:由于神经网络预测误差和数值求解器的离散化误差累积,反向去噪轨迹偏离理想路径,导致预测样本 x_t 的实际 SNR 与预设时间步 t 不再匹配。 2. 偏差的具体表现 网络预测失准:当输入样本的 SNR 与当前时 ...
HuggingFace Papers 2026-04-22
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Extending One-Step Image Generation from Class Labels to Text via Discriminative Text RepresentationFew-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existing research on MeanFlow primarily focuses on class-to-image generation. However, an intuitive yet unexplored direction is to extend the condition from fixed class labels to flexible text inputs, enabling ric ...
HuggingFace Papers 2026-04-23
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion ItemsRecent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our system maintains a high success rate across challenging cases l ...
HuggingFace Papers 2026-04-24
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language ModelWe present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model ena ...
HuggingFace Papers 2026-04-25
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...
HuggingFace Papers 2026-04-26
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...
HuggingFace Papers 2026-04-27
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to SemanticsComprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoni ...
HuggingFace Papers 2026-04-28
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Agentic World Modeling: Foundations, Capabilities, Laws, and BeyondAs AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We intro ...
HuggingFace Papers 2026-04-30
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal AgentsAbstract:We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around th ...
1…192021…37
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
550
Tags
23
Categories
15
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI209
  • Cython1
  • DSA24
  • GitHub125
  • HotNews125
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhfhot_newsArXivDomainAIGitHubTrendingHuggingFacePapersHotNewsleetcodealgo
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
550
Run time :
Total Count :
30102.1k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database