avatar
Articles
472
Tags
23
Categories
15

Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
37.2° Blog
Search
Home
Content
  • Paper
  • LLMs
  • Jupyter
  • Algorithm
  • PLs
Daily
  • Github
  • HotNews
  • HF
  • Arxiv
Archives
Categories
About
ArXiv Domain 2026-02-11
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based DrivingOut of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in {0,1,2,3}$). Using closed loop co ...
ArXiv Domain 2026-02-27
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and DatasetsThe reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a fully automated framework designed to address these challenges by e ...
ArXiv Domain 2026-03-03
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data ScienceThe fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking. There are two major gaps in existing benchmarks: (i) the lack of standardized, process-aware evaluation that captures instruction adherence and process fidelity, and (ii) the scarcity of accurately labeled training dat ...
ArXiv Domain 2026-03-08
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. RoboPocket: Improve Robot Policies Instantly with Your PhoneScaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy’s weaknesses, leading to inefficient coverage of critical state distributions. C ...
ArXiv Domain 2026-03-09
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. RoboPocket: Improve Robot Policies Instantly with Your PhoneScaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy’s weaknesses, leading to inefficient coverage of critical state distributions. C ...
ArXiv Domain 2026-03-13
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. COMIC: Agentic Sketch Comedy GenerationWe propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction ...
ArXiv Domain 2026-03-22
Created2019-06-18|AI
数据来源:ArXiv Domain LLM Domain Papers1. NavTrust: Benchmarking Trustworthiness for Embodied NavigationThere are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world setting ...
1…3132
avatar
Firefly
A firefly flying freely in the AI domain.
Articles
472
Tags
23
Categories
15
Follow Me
Announcement
Welcome to My Personal Blog!
If Not, Please Visit Gitee Mirror.
Recent Post
检索增强LLM2024-01-13
LLMs公开课 - 6.文本理解和生成大模型2024-01-10
LLMs公开课 - 5.高效训练&模型压缩2024-01-07
Categories
  • AI175
  • Cython1
  • DSA24
  • GitHub103
  • HotNews103
Tags
DSARLTransformerLLMsPLPaperReadingDeepLearningCVGPTdomaingithubhfhot_newsGitHubTrendingHuggingFacePapersAIHotNewsleetcodealgoArXivDomain
Archives
  • January 20245
  • December 202314
  • November 202326
  • October 20231
  • September 20234
Info
Article :
472
Run time :
Total Count :
24754.5k
UV :
PV :
Last Push :
©2023 - 2026 By Firefly
Search
Loading the Database