GitHub Trending 2026-01-30
数据来源:github.com/trending
global Languagesmoltbot/moltbotYour own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
⭐ Stars: 103250
🍴 Forks: 0
📝 Language: TypeScript
asgeirtj/system_prompts_leaksCollection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
⭐ Stars: 27589
🍴 Forks: 0
📝 Language: JavaScript
MoonshotAI/kimi-cliKimi Code CLI is your next CLI agent.
⭐ Stars: 4936
🍴 Forks: 0
📝 Language: Python
modelcontextprotocol/ext-apps ...
GitHub Trending 2026-01-31
数据来源:github.com/trending
global Languagesopenclaw/openclawYour own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
⭐ Stars: 119095
🍴 Forks: 0
📝 Language: TypeScript
asgeirtj/system_prompts_leaksCollection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
⭐ Stars: 28541
🍴 Forks: 0
📝 Language: JavaScript
MoonshotAI/kimi-cliKimi Code CLI is your next CLI agent.
⭐ Stars: 5284
🍴 Forks: 0
📝 Language: Python
modelcontextprotocol/ext-ap ...
ArXiv Domain 2026-05-06
数据来源:ArXiv Domain
LLM Domain Papers1. H-Probes: Extracting Hierarchical Structures From Latent Representations of Language ModelsAbstract:Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop \textit{H-prob ...
ArXiv Domain 2026-05-20
数据来源:ArXiv Domain
LLM Domain Papers1. The Scaling Laws of Skills in LLM Agent SystemsAbstract:As agent systems scale, skills accumulate into large reusable libraries, yet their scaling laws remain poorly understood. Across 15 frontier LLMs, 1,141 real-world skills, and over 3M routing or execution decisions, we identify two coupled laws. Routing law: single-step routing accuracy decays logarithmically with library size ($R^2{>}0.97$ for all models), with errors progressing from local skill ...
ArXiv Domain 2026-05-28
数据来源:ArXiv Domain
LLM Domain Papers1. ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference AlignmentAbstract:Recent advances in multimodal large language models (MLLMs) and diffusion models (DMs) have opened new possibilities for AI-generated content. Yet, personalized cover image generation remains underexplored, despite its critical role in boosting user engagement on digital platforms. We propose ICG, a novel framework that integrates MLLM-based prompti ...
ArXiv Domain 2026-06-01
数据来源:ArXiv Domain
LLM Domain Papers1. Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission EquipmentAbstract:Defect grading of power transmission equipment (DGPTE) is crucial to the stability of electric energy transmission. Although existing machine learning methods exhibit strong capabilities in defect detection, they are plagued by difficulties in integrating expert experience and facing class imbalance in more refined defect grading field. To address this ...
ArXiv Domain 2026-06-03
数据来源:ArXiv Domain
LLM Domain Papers1. IdiomX A Multilingual Benchmark for Idiom Understanding, Retrieval, and InterpretationAbstract:Idiomatic expressions remain a persistent challenge for natural language processing because their meanings are often non-compositional, context-dependent, and difficult to align across languages. Existing idiom resources are often limited in scale, contextual diversity, or multilingual coverage, restricting their utility for modern language models. We introduce I ...
ArXiv Domain 2026-06-04
数据来源:ArXiv Domain
LLM Domain Papers1. POLARIS: Guiding Small Models to Write Long StoriesAbstract:Small open-weight models struggle at long-form creative writing: their generated stories either fall far short of the requested length, or their quality significantly degrades as length increases, especially when compared to frontier models. We present POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), a lower-compute GRPO recipe with two k ...
ArXiv Domain 2026-06-05
数据来源:ArXiv Domain
LLM Domain Papers1. POLARIS: Guiding Small Models to Write Long StoriesAbstract:Small open-weight models struggle at long-form creative writing: their generated stories either fall far short of the requested length, or their quality significantly degrades as length increases, especially when compared to frontier models. We present POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), a lower-compute GRPO recipe with two k ...
ArXiv Domain 2026-06-08
数据来源:ArXiv Domain
LLM Domain Papers1. Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement LearningAbstract:Large language models (LLMs) trained predominantly on English data encode substantial world knowledge, yet often fail to express it reliably in other languages, a phenomenon known as cross-lingual factual inconsistency. To study and address this, we introduce PolyFact, a large-scale parallel multilingual factual QA dataset containing 100K Wikidata-grounded facts ac ...
ArXiv Domain 2026-06-10
数据来源:ArXiv Domain
LLM Domain Papers1. Bidirectional Small-Granularity Search between Code and TextAbstract:We introduce the novel task of bidirectional small-granularity search between code and text, where the queries are small snippets of text or code and the results are also small fragments of the opposite modality, i.e., code or text. This task establishes direct links between text in scientific publications and corresponding code segments, in support of better and faster understanding of s ...
ArXiv Domain 2026-06-11
数据来源:ArXiv Domain
LLM Domain Papers1. PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM InferenceAbstract:Decentralized LLM inference networks need lightweight, reference-free quality evaluation for Proof of Quality (PoQ). We present PoQ-Judge, a framework that trains dedicated judge models to score query-output pairs without ground-truth references. We study three architectures across the quality-cost tradeoff: a TextCNN judge, a MiniLM ...
ArXiv Domain 2026-06-15
数据来源:ArXiv Domain
LLM Domain Papers1. The Coin Flip Judge? Reliability and Bias in LLM-as-a-Judge EvaluationAbstract:LLM-as-a-Judge is now widely used to rank model outputs, train reward models, and populate public leaderboards, but its run-to-run reliability remains under-characterized. We study repeated identical evaluations on 29 tasks spanning 10 categories using two OpenAI judge models (GPT-4o-mini and GPT-4.1-mini), with 50 pairwise trials and 50 pointwise trials per question, supplement ...
ArXiv Domain 2026-06-19
数据来源:ArXiv Domain
LLM Domain Papers1. Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path AggregationAbstract:Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static automated metrics. These approaches obscure the underlying probability distributions and fail to capture biases hidden in lower-probability g ...
HuggingFace Papers 2026-01-01
数据来源:HuggingFace Papers
Latest Papers1. UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric RefinementIn this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeli ...