avatar
Articles
405
Tags
23
Categories
15

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
ArXiv Domain 2026-03-20
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Unified Spatio-Temporal Token Scoring for Efficient Video VLMsToken pruning is essential for enhancing the computational efficiency of vision-language models (VLMs), particularly for video-based tasks where temporal redundancy is prevalent. Prior approaches typically prune tokens either (1) within the vision transformer (ViT) exclusively for unimodal perception tasks such as action recognition and object segmentation, without adapting to downstream vision- ...
ArXiv Domain 2026-03-21
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. NavTrust: Benchmarking Trustworthiness for Embodied NavigationThere are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world setting ...
HuggingFace Papers 2026-02-06
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. ERNIE 5.0 Technical ReportIn this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale de ...
HuggingFace Papers 2026-02-15
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI SocietiesThe emergence of multi-agent systems built from large language models (LLMs) offers a promising paradigm for scalable collective intelligence and self-evolution. Ideally, such systems would achieve continuous self-improvement in a fully closed loop while maintaining robust safety alignment—a combination we term the self-evolution trilemma. However, we demonstrate ...
HuggingFace Papers 2026-02-22
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-TuningMany training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rules, i.e., Top-k and Top-p, fail, and how can we avoid t ...
HuggingFace Papers 2026-02-23
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-TuningMany training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rules, i.e., Top-k and Top-p, fail, and how can we avoid t ...
HuggingFace Papers 2026-02-28
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. The Trinity of Consistency as a Defining Principle for General World ModelsThe construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Uni ...
HuggingFace Papers 2026-03-01
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. The Trinity of Consistency as a Defining Principle for General World ModelsThe construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Uni ...
HuggingFace Papers 2026-03-02
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. The Trinity of Consistency as a Defining Principle for General World ModelsThe construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Uni ...
HuggingFace Papers 2026-03-03
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. dLLM: Simple Diffusion Language ModelingAlthough diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accelerates, there is a clear need for a unified framework that standardizes these common components while remaining flexib ...
HuggingFace Papers 2026-03-06
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Helios: Real Real-Time Long Video Generation ModelWe introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generati ...
HuggingFace Papers 2026-03-07
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity BarrierWhile large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the ...
HuggingFace Papers 2026-03-08
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity BarrierWhile large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the ...
HuggingFace Papers 2026-03-09
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity BarrierWhile large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the ...
HuggingFace Papers 2026-03-14
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time TrainingHumans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video streams is essential for spatial intelligence. The core challenge is not simply longer context windows but how spatial information is selected, organized, and retained over time. I ...
1…252627
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
405
Tags
23
Categories
15
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI152
  • Cython1
  • DSA24
  • GitHub81
  • HotNews81
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhfhot_newsGitHubTrendingHuggingFacePapersAIHotNewsleetcodealgoArXivDomain
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
405
Run time :
Total Count :
22299.6k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database