HuggingFace Papers 2025-09-14
数据来源:HuggingFace Papers
Latest Papers1. VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action ModelVision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We int ...
HuggingFace Papers 2025-09-15
数据来源:HuggingFace Papers
Latest Papers1. VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action ModelVision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We int ...
HuggingFace Papers 2025-09-16
数据来源:HuggingFace Papers
Latest Papers1. IntrEx: A Dataset for Modeling Engagement in Educational ConversationsEngagement and motivation are crucial for second-language acquisition, yet maintaining learner interest in educational conversations remains a challenge. While prior research has explored what makes educational texts interesting, still little is known about the linguistic features that drive engagement in conversations. To address this gap, we introduce IntrEx, the first large dataset ...
HuggingFace Papers 2025-09-17
数据来源:HuggingFace Papers
Latest Papers1. OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World ModelingThe field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning. However, the development of truly general 4D world models remains fundamentally constrained by the availability of high-quality data. Existing datasets and ben ...
HuggingFace Papers 2025-09-18
数据来源:HuggingFace Papers
Latest Papers1. WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep ResearchThis paper tackles open-ended deep research (OEDR), a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and one-shot generation paradigms that easily suffer from long-context failure ...
HuggingFace Papers 2025-09-19
数据来源:HuggingFace Papers
Latest Papers1. Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at ScaleWe present Hala, a family of Arabic-centric instruction and translation models built with our translate-and-tune pipeline. We first compress a strong AR$\leftrightarrow$EN teacher to FP8 (yielding $\sim$2$\times$ higher throughput with no quality loss) and use it to create high-fidelity bilingual supervision. A lightweight language model LFM2-1.2B is then fine-tun ...
HuggingFace Papers 2025-09-20
数据来源:HuggingFace Papers
Latest Papers1. ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform DataVision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task dom ...
HuggingFace Papers 2025-09-21
数据来源:HuggingFace Papers
Latest Papers1. ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform DataVision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task dom ...
HuggingFace Papers 2025-09-23
数据来源:HuggingFace Papers
Latest Papers1. RPG: A Repository Planning Graph for Unified and Scalable Codebase GenerationLarge language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software stru ...
HuggingFace Papers 2025-09-22
数据来源:HuggingFace Papers
Latest Papers1. ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform DataVision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task dom ...
HuggingFace Papers 2025-09-24
数据来源:HuggingFace Papers
Latest Papers1. LIMI: Less is More for AgencyWe define Agency as the emergent capacity of AI systems to function as autonomous agents actively discovering problems, formulating hypotheses, and executing solutions through self-directed engagement with environments and tools. This fundamental capability marks the dawn of the Age of AI Agency, driven by a critical industry shift: the urgent need for AI systems that don’t just think, but work. While current AI excels at rea ...
HuggingFace Papers 2025-09-25
数据来源:HuggingFace Papers
Latest Papers1. Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCRArabic document OCR remains a challenging task due to the language’s cursive script, diverse fonts, diacritics, and right-to-left orientation. While modern Multimodal Large Language Models (MLLMs) have advanced document understanding for high-resource languages, their performance on Arabic remains limited. In this work, we introduce Baseer, a vision-language model fine- tuned specifically ...
HuggingFace Papers 2025-08-24
数据来源:HuggingFace Papers
Latest Papers1. Intern-S1: A Scientific Multimodal Foundation ModelIn recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to tho ...
HuggingFace Papers 2025-09-27
数据来源:HuggingFace Papers
Latest Papers1. VCRL: Variance-based Curriculum Reinforcement Learning for Large Language ModelsPolicy-based reinforcement learning currently plays an important role in improving LLMs on mathematical reasoning tasks. However, existing rollout-based reinforcement learning methods (GRPO, DAPO, GSPO, etc.) fail to explicitly consider LLMs’ learning ability for samples of different difficulty levels, which is contrary to the human cognitive process of mathematical reasoning ...
HuggingFace Papers 2025-09-29
数据来源:HuggingFace Papers
Latest Papers1. VCRL: Variance-based Curriculum Reinforcement Learning for Large Language ModelsPolicy-based reinforcement learning currently plays an important role in improving LLMs on mathematical reasoning tasks. However, existing rollout-based reinforcement learning methods (GRPO, DAPO, GSPO, etc.) fail to explicitly consider LLMs’ learning ability for samples of different difficulty levels, which is contrary to the human cognitive process of mathematical reasoning ...