ArXiv Domain 2026-05-03
数据来源:ArXiv Domain
LLM Domain Papers1. BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance TaskAbstract:We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU’s battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-sho ...
ArXiv Domain 2026-05-04
数据来源:ArXiv Domain
LLM Domain Papers1. BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance TaskAbstract:We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU’s battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-sho ...
ArXiv Domain 2026-05-13
数据来源:ArXiv Domain
LLM Domain Papers1. SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User SimulatorsAbstract:We present SalesSim, a framework and testbed for evaluating the ability of Multimodal Large Language Models (MLLMs) to simulate realistic, persona-driven customer behavior in multi-turn, multi-modal, tool-augmented online retail conversations. Unlike prior work that treat user simulation as surface-level dialogue generation, SalesSim models retail interaction a ...