avatar
Articles
935
Tags
25
Categories
16

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
HuggingFace Papers 2026-01-11
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL OptimizationAs language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each capturing a distinct preference, to guide models toward these desi ...
HuggingFace Papers 2026-01-13
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Thinking with Map: Reinforced Parallel Map-Augmented Agent for GeolocalizationThe image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans — using maps. In this work, we first equip the model \textit{Thinking with M ...
HuggingFace Papers 2026-01-12
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL OptimizationAs language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each capturing a distinct preference, to guide models toward these desi ...
HuggingFace Papers 2026-01-14
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video ReasoningIn real-world video question answering scenarios, videos often provide only localized visual cues, while verifiable answers are distributed across the open web; models therefore need to jointly perform cross-frame clue extraction, iterative retrieval, and multi-hop reasoning-based verification. To bridge this gap, we construct the first video deep r ...
HuggingFace Papers 2026-01-15
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. MemGovern: Enhancing Code Agents through Learning from Governed Human ExperiencesWhile autonomous software engineering (SWE) agents are reshaping programming paradigms, they currently suffer from a “closed-world” limitation: they attempt to fix bugs from scratch or solely using local context, ignoring the immense historical human experience available on platforms like GitHub. Accessing this open-world experience is hindered by the unstructured and fragme ...
HuggingFace Papers 2026-01-16
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Controlled Self-Evolution for Algorithmic Code OptimizationSelf-evolution methods enhance code generation through iterative “generate-verify-refine” cycles, yet existing approaches suffer from low exploration efficiency, failing to discover solutions with superior complexity within limited budgets. This inefficiency stems from initialization bias trapping evolution in poor solution regions, uncontrolled stochastic operations lacking feedback guidance, an ...
HuggingFace Papers 2026-01-17
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Urban Socio-Semantic Segmentation with Vision-Language ReasoningAs hubs of human activity, urban surfaces consist of a wealth of semantic entities. Segmenting these various entities from satellite imagery is crucial for a range of downstream applications. Current advanced segmentation models can reliably segment entities defined by physical attributes (e.g., buildings, water bodies) but still struggle with socially defined categories (e.g., schools, park ...
HuggingFace Papers 2026-01-18
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. STEP3-VL-10B Technical ReportWe present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language sy ...
HuggingFace Papers 2026-01-19
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. STEP3-VL-10B Technical ReportWe present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language sy ...
HuggingFace Papers 2026-01-21
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. ABC-Bench: Benchmarking Agentic Backend Coding in Real-World DevelopmentThe evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving. However, current benchmarks predominantly evaluate code logic in static contexts, neglecting the dynamic, full-process requirements of real-world engineering, particularly in backend ...
HuggingFace Papers 2026-01-20
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Your Group-Relative Advantage Is BiasedReinforcement Learning from Verifier Rewards (RLVR) has emerged as a widely used approach for post-training large language models on reasoning tasks, with group-based methods such as GRPO and its variants gaining broad adoption. These methods rely on group-relative advantage estimation to avoid learned critics, yet its theoretical properties remain poorly understood. In this work, we uncover a fundamental issue of g ...
HuggingFace Papers 2026-01-23
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Agentic Reasoning for Large Language ModelsReasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, ...
HuggingFace Papers 2026-01-22
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment GeneralizationWe introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal “mother tongue” for physical ...
HuggingFace Papers 2026-01-24
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceThe development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work ...
HuggingFace Papers 2026-01-25
Created2019-06-18|AI
数据来源:HuggingFace Papers Latest Papers1. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceThe development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work ...
1…333435…63
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
935
Tags
25
Categories
16
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI411
  • Cython1
  • DSA24
  • GitHub222
  • HotNews71
Tags
DSARLTransformerLLMsPaperReadingDeepLearningCVGPTPLdomaingithubhfhot_newsArXivDomainAIGitHubTrendingHuggingFacePapers微博热搜HotNewsleetcodealgo
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
935
Run time :
Total Count :
46916.8k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database