37.2° Blog

ArXiv Domain 2025-08-16

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...

ArXiv Domain 2025-08-17

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...

ArXiv Domain 2025-08-18

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding TasksLarge Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de ...

ArXiv Domain 2025-08-19

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Controlling Multimodal LLMs via Reward-guided DecodingAs Multimodal Large Language Models (MLLMs) gain widespread applicability, it is becoming increasingly desirable to adapt them for diverse user needs. In this paper, we study the adaptation of MLLMs through controlled decoding. To achieve this, we introduce the first method for reward-guided decoding of MLLMs and demonstrate its application in improving their visual grounding. Our method involves buildi ...

ArXiv Domain 2025-08-20

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation PatternsDetecting content generated by large language models (LLMs) is crucial for preventing misuse and building trustworthy AI systems. Although existing detection methods perform well, their robustness in out-of-distribution (OOD) scenarios is still lacking. In this paper, we hypothesize that, compared to features used by existing detection methods, the internal representations ...

ArXiv Domain 2025-08-21

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. The Promise of Large Language Models in Digital Health: Evidence from Sentiment Analysis in Online Health CommunitiesDigital health analytics face critical challenges nowadays. The sophisticated analysis of patient-generated health content, which contains complex emotional and medical contexts, requires scarce domain expertise, while traditional ML approaches are constrained by data shortage and privacy limitations in healthcare settings. Online Health Com ...

ArXiv Domain 2025-08-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMsRecent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource dem ...

ArXiv Domain 2025-08-23

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-24

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-25

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesUnderstanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific ...

ArXiv Domain 2025-08-26

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Can Large Language Models Simulate Human Responses? A Case Study of Stated Preference Experiments in the Context of Heating-related ChoicesStated preference (SP) surveys are a key method to research how individuals make trade-offs in hypothetical, also futuristic, scenarios. In energy context this includes key decarbonisation enablement contexts, such as low-carbon technologies, distributed renewable energy generation, and demand-side response [1,2]. Howev ...

ArXiv Domain 2025-08-27

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language ModelsClassifiers are an important and defining feature of the Chinese language, and their correct prediction is key to numerous educational applications. Yet, whether the most popular Large Language Models (LLMs) possess proper knowledge the Chinese classifiers is an issue that has largely remain unexplored in the Natural Language Processing (NLP) literature. To addre ...

ArXiv Domain 2025-08-28

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text ModificationsLarge Language Models (LLMs) have significantly advanced natural language processing, demonstrating strong capabilities in tasks such as text generation, summarization, and reasoning. Recently, their potential for automating precise text editing tasks across specialized domains, such as programming code, LaTeX, and structured database languages, has gained attention. Howe ...

HuggingFace Papers 2025-08-09

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...

HuggingFace Papers 2025-08-10

Created2019-06-18|AI

数据来源：HuggingFace Papers Latest Papers1. On the Generalization of SFT: A Reinforcement Learning Perspective with Reward RectificationWe present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generaliza ...