37.2° Blog

ArXiv Domain 2025-08-20

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation PatternsDetecting content generated by large language models (LLMs) is crucial for preventing misuse and building trustworthy AI systems. Although existing detection methods perform well, their robustness in out-of-distribution (OOD) scenarios is still lacking. In this paper, we hypothesize that, compared to features used by existing detection methods, the internal representations ...

ArXiv Domain 2025-08-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMsRecent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource dem ...

ArXiv Domain 2025-08-23

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-24

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-25

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-26

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Can Large Language Models Simulate Human Responses? A Case Study of Stated Preference Experiments in the Context of Heating-related ChoicesStated preference (SP) surveys are a key method to research how individuals make trade-offs in hypothetical, also futuristic, scenarios. In energy context this includes key decarbonisation enablement contexts, such as low-carbon technologies, distributed renewable energy generation, and demand-side response [1,2]. Howev ...

ArXiv Domain 2025-08-27

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language ModelsClassifiers are an important and defining feature of the Chinese language, and their correct prediction is key to numerous educational applications. Yet, whether the most popular Large Language Models (LLMs) possess proper knowledge the Chinese classifiers is an issue that has largely remain unexplored in the Natural Language Processing (NLP) literature. To addre ...

ArXiv Domain 2025-08-28

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text ModificationsLarge Language Models (LLMs) have significantly advanced natural language processing, demonstrating strong capabilities in tasks such as text generation, summarization, and reasoning. Recently, their potential for automating precise text editing tasks across specialized domains, such as programming code, LaTeX, and structured database languages, has gained attention. Howe ...

ArXiv Domain 2025-08-29

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. 11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired AnalysisFor human cognitive process, spatial reasoning and perception are closely entangled, yet the nature of this interplay remains underexplored in the evaluation of multimodal large language models (MLLMs). While recent MLLM advancements show impressive performance on reasoning, their capacity for human-like spatial cognition remains an open question. In this work, we i ...

ArXiv Domain 2025-08-30

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMsDecoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing si ...

ArXiv Domain 2025-08-31

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMsDecoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing si ...

ArXiv Domain 2025-09-01

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMsDecoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing si ...

ArXiv Domain 2025-09-02

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction TuningInstruction tuning has underscored the significant potential of large language models (LLMs) in producing more human controllable and effective outputs in various domains. In this work, we focus on the data selection problem for task-specific instruction tuning of LLMs. Prevailing methods primarily rely on the crafted similarity metrics to select training data that ali ...

ArXiv Domain 2025-09-03

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction TuningInstruction tuning has underscored the significant potential of large language models (LLMs) in producing more human controllable and effective outputs in various domains. In this work, we focus on the data selection problem for task-specific instruction tuning of LLMs. Prevailing methods primarily rely on the crafted similarity metrics to select training data that ali ...

ArXiv Domain 2025-09-04

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review AutomationWith the rapid growth of academic publications, peer review has become an essential yet time-consuming responsibility within the research community. Large Language Models (LLMs) have increasingly been adopted to assist in the generation of review comments; however, current LLM-based review tasks lack a unified evaluation benchmark to rigorously assess the models’ abi ...