37.2° Blog

ArXiv Domain 2025-12-20

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. EasyV2V: A High-quality Instruction-based Video Editing FrameworkWhile image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the design space of data, architecture, and control, and introduce \emph{EasyV2V}, a simple and effective framework for instruction-based video editing. On the data side, we compose existing experts with fast inverses to build diverse video pai ...

ArXiv Domain 2025-12-21

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. EasyV2V: A High-quality Instruction-based Video Editing FrameworkWhile image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the design space of data, architecture, and control, and introduce \emph{EasyV2V}, a simple and effective framework for instruction-based video editing. On the data side, we compose existing experts with fast inverses to build diverse video pai ...

ArXiv Domain 2025-12-22

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. EasyV2V: A High-quality Instruction-based Video Editing FrameworkWhile image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the design space of data, architecture, and control, and introduce \emph{EasyV2V}, a simple and effective framework for instruction-based video editing. On the data side, we compose existing experts with fast inverses to build diverse video pai ...

ArXiv Domain 2025-12-11

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Astra: General Interactive World Model with Autoregressive DenoisingRecent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose scenarios and various forms of actions. To bridge this gap, we introduce Astra, an interactiv ...

ArXiv Domain 2025-09-20

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Charting trajectories of human thought using large language modelsLanguage provides the most revealing window into the ways humans structure conceptual knowledge within cognitive maps. Harnessing this information has been difficult, given the challenge of reliably mapping words to mental concepts. Artificial Intelligence large language models (LLMs) now offer unprecedented opportunities to revisit this challenge. LLMs represent words and phrases as high-di ...

ArXiv Domain 2025-12-12

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. LISN: Language-Instructed Social Navigation with VLM-based Controller ModulatingTowards human-robot coexistence, socially aware navigation is significant for mobile robots. Yet existing studies on this area focus mainly on path efficiency and pedestrian collision avoidance, which are essential but represent only a fraction of social navigation. Beyond these basics, robots must also comply with user instructions, aligning their actions to task goals and soc ...

ArXiv Domain 2025-12-23

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lightingMonocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We introduce Re-Depth Anything, a test-time self-supervision framework that bridges this domain gap by fusing DA-V2 with the powerful priors of large-scale 2D diffusion models. Our method perform ...

ArXiv Domain 2025-12-24

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician OversightAutomating the calculation of clinical risk scores offers a significant opportunity to reduce physician administrative burden and enhance patient care. The current standard for evaluating this capability is MedCalc-Bench, a large-scale dataset constructed using LLM-based feature extraction and rule-based aggregation. However, treating such model-generated benchmarks as sta ...

ArXiv Domain 2025-12-25

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. LongVideoAgent: Multi-Agent Reasoning with Long VideosRecent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-rele ...

ArXiv Domain 2025-12-27

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Optimizing Decoding Paths in Masked Diffusion Models by Quantifying UncertaintyMasked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce D ...

ArXiv Domain 2025-12-26

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Optimizing Decoding Paths in Masked Diffusion Models by Quantifying UncertaintyMasked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce D ...

ArXiv Domain 2025-12-28

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Optimizing Decoding Paths in Masked Diffusion Models by Quantifying UncertaintyMasked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce D ...

ArXiv Domain 2025-12-30

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud ApplicationsCloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. Prior research identifies code- and configuration-related issues as the predominant category of root causes in cloud incidents. This paper introduces PRAXIS, an orchestrator that manages and deploys an agentic w ...

ArXiv Domain 2025-12-29

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Optimizing Decoding Paths in Masked Diffusion Models by Quantifying UncertaintyMasked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce D ...

ArXiv Domain 2025-11-02

Created2019-06-18|AI

数据来源：ArXiv Domain LLM Domain Papers1. Brain-IT: Image Reconstruction from fMRI via Brain-Interaction TransformerReconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present “Brain-IT”, a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effectiv ...