ArXiv Domain 2026-05-24
数据来源:ArXiv Domain
LLM Domain Papers1. CR4T: Rewrite-Based Guardrails for Adolescent LLM SafetyAbstract:Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create ...
ArXiv Domain 2026-05-26
数据来源:ArXiv Domain
LLM Domain Papers1. Evaluating Large Language Models in a Complex Hidden Role GameAbstract:Quantifying the deceptive potential of Large Language Models (LLMs) is critical for AI safety, yet difficult to achieve in uncontrolled environments. This work investigates the reasoning, persuasion, and deceptive capabilities of LLMs within the social deduction game Secret Hitler. I introduce an open-source framework and novel metrics to measure performance: Role Identification Accurac ...
ArXiv Domain 2026-05-18
数据来源:ArXiv Domain
LLM Domain Papers1. Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical OdysseyAbstract:Multilingual knowledge editing (MKE) remains challenging because language-specific edits interfere with one another, even when locate-then-edit methods work well in monolingual settings. This paper focuses on three issues: the effectiveness of vector merging methods for MKE, the extent to which Task Singular Vectors for Merging (TSVM) can reduce multi ...