ArXiv Domain 2026-05-03
数据来源:ArXiv Domain
LLM Domain Papers1. BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance TaskAbstract:We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU’s battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-sho ...
ArXiv Domain 2026-05-04
数据来源:ArXiv Domain
LLM Domain Papers1. BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance TaskAbstract:We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU’s battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-sho ...
ArXiv Domain 2026-05-13
数据来源:ArXiv Domain
LLM Domain Papers1. SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User SimulatorsAbstract:We present SalesSim, a framework and testbed for evaluating the ability of Multimodal Large Language Models (MLLMs) to simulate realistic, persona-driven customer behavior in multi-turn, multi-modal, tool-augmented online retail conversations. Unlike prior work that treat user simulation as surface-level dialogue generation, SalesSim models retail interaction a ...
ArXiv Domain 2026-05-17
数据来源:ArXiv Domain
LLM Domain Papers1. Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical OdysseyAbstract:Multilingual knowledge editing (MKE) remains challenging because language-specific edits interfere with one another, even when locate-then-edit methods work well in monolingual settings. This paper focuses on three issues: the effectiveness of vector merging methods for MKE, the extent to which Task Singular Vectors for Merging (TSVM) can reduce multi ...