HuggingFace Papers 2026-02-11
数据来源:HuggingFace Papers
Latest Papers1. QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha MiningFinancial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated experience. To address these challenges, we propose QuantaAlpha, an evolutionary alpha mining fra ...
HuggingFace Papers 2026-02-12
数据来源:HuggingFace Papers
Latest Papers1. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every IterationAs high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic static filters that ignore training dynamics, or use dynamic yet optimizer-agnostic criteria based on raw gradients. We propose OPUS (Optimizer-induce ...
HuggingFace Papers 2026-02-13
数据来源:HuggingFace Papers
Latest Papers1. Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active ParametersWe introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 ...
HuggingFace Papers 2026-02-14
数据来源:HuggingFace Papers
Latest Papers1. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI SocietiesThe emergence of multi-agent systems built from large language models (LLMs) offers a promising paradigm for scalable collective intelligence and self-evolution. Ideally, such systems would achieve continuous self-improvement in a fully closed loop while maintaining robust safety alignment—a combination we term the self-evolution trilemma. However, we demonstrate ...
HuggingFace Papers 2026-02-16
数据来源:HuggingFace Papers
Latest Papers1. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI SocietiesThe emergence of multi-agent systems built from large language models (LLMs) offers a promising paradigm for scalable collective intelligence and self-evolution. Ideally, such systems would achieve continuous self-improvement in a fully closed loop while maintaining robust safety alignment—a combination we term the self-evolution trilemma. However, we demonstrate ...
HuggingFace Papers 2026-02-17
数据来源:HuggingFace Papers
Latest Papers1. Less is Enough: Synthesizing Diverse Data in Feature Space of LLMsThe diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we intro ...
HuggingFace Papers 2026-02-18
数据来源:HuggingFace Papers
Latest Papers1. Experiential Reinforcement LearningReinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a trainin ...
HuggingFace Papers 2026-02-19
数据来源:HuggingFace Papers
Latest Papers1. Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?Sparse Autoencoders (SAEs) have emerged as a promising tool for interpreting neural networks by decomposing their activations into sparse sets of human-interpretable features. Recent work has introduced multiple SAE variants and successfully scaled them to frontier models. Despite much excitement, a growing number of negative results in downstream tasks casts doubt on whether SAEs recov ...
HuggingFace Papers 2026-02-20
数据来源:HuggingFace Papers
Latest Papers1. SLA2: Sparse-Linear Attention with Learnable Routing and QATSparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identif ...
HuggingFace Papers 2026-02-21
数据来源:HuggingFace Papers
Latest Papers1. SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-TuningMany training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rules, i.e., Top-k and Top-p, fail, and how can we avoid t ...
HuggingFace Papers 2026-02-24
数据来源:HuggingFace Papers
Latest Papers1. VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM TrainingTraining stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference engines all cause the behavior policy to diverge from the current policy, risking training collapse. Importance sampling provides a principled correction for this dis ...
HuggingFace Papers 2026-02-26
数据来源:HuggingFace Papers
Latest Papers1. On Data Engineering for Scaling LLM Terminal CapabilitiesDespite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed- ...
HuggingFace Papers 2026-02-27
数据来源:HuggingFace Papers
Latest Papers1. HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential RecommendationModeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challen ...
HuggingFace Papers 2026-01-01
数据来源:HuggingFace Papers
Latest Papers1. UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric RefinementIn this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeli ...
HuggingFace Papers 2026-01-02
数据来源:HuggingFace Papers
Latest Papers1. mHC: Manifold-Constrained Hyper-ConnectionsRecently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training ...