Skip to the content.

週次AIニュース 2026-W22

対象期間: 2026-05-25 〜 2026-05-31(2631 件)

← トップに戻る

トピックの推移

トピック別件数

今週のハイライト(上位 10 件)

2026-05-29 21:00 JSTOpenAILLM/生成AI

Boston Children’s uses AI to unlock new diagnoses

Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare d…

2026-05-29 21:00 JSTOpenAILLM/生成AIエージェント

How Braintrust turns customer requests into code with Codex

How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

2026-05-28 21:00 JSTOpenAIエージェント

How Endava builds an agentic organization with Codex

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks…

2026-05-27 20:00 JSTOpenAILLM/生成AIエージェント

Cisco and OpenAI redefine enterprise engineering with Codex

Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, a…

2026-05-27 16:00 JSTOpenAILLM/生成AIエージェント

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating wor…

2026-05-31 08:00 JSTITmedia AI+LLM/生成AI

日立はAnthropicと組んで何を狙うのか 従業員29万人へのClaude導入で目指す姿

ミッションクリティカル領域でのAI活用に向け、日立はAnthropicと戦略的パートナーシップを締結した。同社は今回の提携で何を実現しようとしているのか。

2026-05-31 01:30 JSTTechCrunch AILLM/生成AI

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

The golden age of Microsoft's Github Copilot appears to be at an end.

2026-05-31 00:30 JSTTechCrunch AILLM/生成AI

I put Google’s 24/7 AI assistant Gemini Spark to work, and it’s actually pretty useful

Gemini Spark helps automate everyday tasks, from inbox summaries to local event planning, but it’s unclear why Google made it a separate pr…

2026-05-30 07:14 JSTTechCrunch AI研究/論文

Coders are refusing to work without AI — and that could come back to bite them

While AI is helping coders produce code faster, it may not be producing better code, researchers warn. And that could cause problems down t…

2026-05-30 06:48 JSTITmedia AI+LLM/生成AI規制/政策研究/論文

OpenAIが生命科学推論AI「GPT-Rosalind」をバイオディフェンスに開放 デュアルユースリスクに懸念も

OpenAIは、生命科学研究に特化したフロンティア推論モデル「GPT-Rosalind」を活用した「Rosalind Biodefense」プログラムを発表した。生物脅威の検知など防衛目的に限定し、審査済みの開発者や米政府機関および同盟国のパートナー組織にAPIを無償提供する。

全件(日付別)

2026-05-31(5件)

2026-05-31 08:00 JSTITmedia AI+LLM/生成AI

日立はAnthropicと組んで何を狙うのか 従業員29万人へのClaude導入で目指す姿

ミッションクリティカル領域でのAI活用に向け、日立はAnthropicと戦略的パートナーシップを締結した。同社は今回の提携で何を実現しようとしているのか。

2026-05-31 06:45 JSTTechCrunch AIその他

SoftBank says it will invest up to €75 billion to build French data centers

The goal, the firm said, is to develop and operate up to 5 gigawatts of additional data center capacity.

2026-05-31 01:30 JSTTechCrunch AILLM/生成AI

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

The golden age of Microsoft's Github Copilot appears to be at an end.

2026-05-31 00:59 JSTTechCrunch AIその他

Meta is reportedly developing an AI pendant

Meta seems to be making big bets on AI-powered hardware.

2026-05-31 00:30 JSTTechCrunch AILLM/生成AI

I put Google’s 24/7 AI assistant Gemini Spark to work, and it’s actually pretty useful

Gemini Spark helps automate everyday tasks, from inbox summaries to local event planning, but it’s unclear why Google made it a separate pr…

2026-05-30(7件)

2026-05-30 22:00 JSTTechCrunch AIその他

As the browser wars heat up, here are the hottest alternatives to Chrome and Safari in 2026

We’ve compiled an overview of some of the top alternative browsers available today aiming to challenge Chrome and Safari.

2026-05-30 07:14 JSTTechCrunch AI研究/論文

Coders are refusing to work without AI — and that could come back to bite them

While AI is helping coders produce code faster, it may not be producing better code, researchers warn. And that could cause problems down t…

2026-05-30 06:48 JSTITmedia AI+LLM/生成AI規制/政策研究/論文

OpenAIが生命科学推論AI「GPT-Rosalind」をバイオディフェンスに開放 デュアルユースリスクに懸念も

OpenAIは、生命科学研究に特化したフロンティア推論モデル「GPT-Rosalind」を活用した「Rosalind Biodefense」プログラムを発表した。生物脅威の検知など防衛目的に限定し、審査済みの開発者や米政府機関および同盟国のパートナー組織にAPIを無償提供する。

2026-05-30 03:49 JSTTechCrunch AIその他

So you’ve heard these AI terms and nodded along; let’s fix that

The rise of AI has brought an avalanche of new terms and slang. Here is a glossary with definitions of some of the most important words and…

2026-05-30 02:57 JSTTechCrunch AIその他

What happens when companies become too AI-pilled?

The people deciding that AI can replace your job are also the ones least likely to understand what your job truly involves, according to Bo…

2026-05-30 02:27 JSTTechCrunch AIハードウェア/半導体ビジネス/資金調達

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M

Chipmaker Groq is looking to raise $650 million in internal funding as it pivots from hardware to focus more on AI inference, the process o…

2026-05-30 01:13 JSTTechCrunch AIエージェント

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

Cognition makes Devin, the first and arguably most successful AI coding agent. But famed coder Wu says it isn't designed to supplant human…

2026-05-29(540件)

2026-05-29 23:15 JSTTechCrunch AIその他

Today is the last day to apply to speak at TechCrunch Disrupt 2026

Submit your session topic before today ends for a chance to speak at TechCrunch Disrupt 2026. Apply now to share your insight and help shap…

2026-05-29 23:00 JSTTechCrunch AIその他

Final 24 hours to save up to $410 on your TechCrunch Disrupt 2026 ticket

You now have until tonight at 11:59 p.m. PT to lock in Early Bird savings of up to $410 for TechCrunch Disrupt 2026 before prices increase.…

2026-05-29 23:00 JSTTechCrunch AIその他

Does your CEO have AI psychosis? Aaron Levie thinks most of them do.

The people deciding that AI can replace your job are also the ones least likely to understand what your job truly involves, according to Bo…

2026-05-29 22:00 JSTTechCrunch AIその他

Kiwibit’s AI-powered bird feeder is my new backyard buddy

If you're looking for a fun way to connect with nature while collecting bird species on an app like Pokémon, give this smart feeder a try.

2026-05-29 21:00 JSTTechCrunch AIハードウェア/半導体ビジネス/資金調達

This chip startup just raised $135M on a bet that AI’s biggest bottleneck isn’t compute — it’s memory

South Korean chip startup XCENA is betting that AI's real bottleneck is not compute, but memory.

2026-05-29 21:00 JSTOpenAILLM/生成AI

Boston Children’s uses AI to unlock new diagnoses

Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare d…

2026-05-29 21:00 JSTOpenAILLM/生成AIエージェント

How Braintrust turns customer requests into code with Codex

How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

2026-05-29 20:30 JSTITmedia AI+LLM/生成AI画像/動画生成

「Nano Banana 2」「Nano Banana Pro」が一般提供開始 「2」は動画からの画像生成もサポート

米Googleは5月29日、「Nano Banana 2」(Gemini 3.1 Flash Image)と「Nano Banana Pro」(Gemini 3 Pro Image)の一般提供を開始すると発表した。加えて、動画入力に対応する新機能をNano Banana 2でプ…

2026-05-29 19:14 JSTITmedia AI+LLM/生成AI規制/政策

日本政府と主要金融機関、OpenAI新モデルのアクセス権を取得 サイバー対策強化へ

片山さつき金融担当相が、米OpenAIが開発した新型AIのアクセス権を政府と主要金融機関が取得したと明らかにした。高性能AIがサイバー攻撃に悪用される懸念が高まる中、AIを防御に活用した対策が急務となっている。片山氏は「わが国金融機関のサイバーセキュリティー強化の観点から歓迎す…

2026-05-29 17:00 JSTITmedia AI+その他

JR西日本は“熟練者が手書きするしかなかった車両作業計画”をAIでどう自動化するのか?

JR西日本は、熟練担当者が手書きで作成していた鉄道車両基地の構内作業計画を、AIで自動作成するシステムを開発している。なぜ開発し、どのような効果を見込むのか。

2026-05-29 13:12 JSTITmedia AI+LLM/生成AI規制/政策

OpenAI、日本政府とサイバーセキュリティで協力 最新AI「GPT-5.5-Cyber」を金融機関に提供

米OpenAIは、日本政府とサイバーセキュリティで協力する取り組み「日本サイバー・アクションプラン」を発表した。まずは、金融機関にサイバーセキュリティに特化した最新AIモデル「GPT-5.5-Cyber」を提供する。

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Gradient temporal-difference methods provide stable off-policy prediction with linear function approximation, but their practical performan…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

Temporal-difference learning with function approximation can be unstable under off-policy sampling. TDC stabilizes off-policy TD through an…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitive…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

Tropical forests worldwide are under intense deforestation pressure driven by economic and political interests, and scientific evidence sug…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. W…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Orthogonal Concept Erasure for Diffusion Models

Concept erasure has emerged as a promising approach to mitigate undesired or unsafe content in diffusion models, yet existing methods still…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

Linking free-text phenotype descriptions to ontology terms, typically referred to as phenotype annotation, is essential for the cross-study…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

Finite Element Analysis (FEA) serves as the cornerstone of modern engineering design. However, its workflow is inherently complex and relie…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

AI tools to support real world decision making must be able to build simulation models that inform their recommendations and render them in…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

Although a growing body of research has begun to describe user--LLM interactions, the picture it paints is largely static; little is known…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, where the model's organization of the rec…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Mind Your Tone: Does Tone Alter LLM Performance?

The use of Large Language Models (LLMs) is proliferating, yet their performance is observed to vary based on prompting styles and tones. In…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

This study reports findings from a cross-sectional survey (n = 72) of higher education practitioners examining beliefs, behaviors, and inst…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Differentiable Belief-based Opponent Shaping

Human coordination often relies on the ability to influence the beliefs of others through strategic action. In multi-agent reinforcement le…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claim…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Robust and Efficient Guardrails with Latent Reasoning

Maintaining the safety of large language models (LLMs) is crucial as they are increasingly deployed in real-world applications. Existing sa…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics

Event-driven scheduling policies are increasingly deployed in industrial environments, where decisions are made under asynchronous and part…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

AI agents are increasingly expected to operate as digital employees: accessing enterprise data, making decisions, and taking actions autono…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Reasoning models are evaluated on single-turn benchmarks but deployed in multi-turn dialogue, where users push back on correct answers. Und…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

This paper examines records retrieved from the ClinicalTrials.gov registry to characterize temporal trends in AI terminology and the geogra…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

When multiple LLM agents solve the same problem, standard practice compresses each agent's reasoning into a majority vote or layered synthe…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

PRO-CUA: Process-Reward Optimization for Computer Use Agents

Computer use agents (CUAs) have shown strong potential for automating complex digital workflows, yet their training remains constrained by…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Masked diffusion language models (MDMs) uniquely support any-order generation, with confidence-based decoding currently serving as the de f…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Governing Technical Debt in Agentic AI Systems

Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through w…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Question answering (QA) is a core challenge in AI, particularly for complex queries requiring multi-hop reasoning across documents, or symb…

2026-05-29 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達研究/論文

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

DeFi investment agents, systems that use AI for autonomous on-chain trading, have attained over USD 3 billion in combined token valuations…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

ReasonOps: Operator Segmentation for LLM Reasoning Traces

Chain-of-thought traces from large reasoning models can span tens of thousands of tokens, yet we lack a vocabulary for describing their int…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is incr…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Self-evolving agents improve over time by reflecting on past failures, but existing evaluation is limited in two ways: it measures only tas…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Reasoning distillation transfers complex reasoning abilities from large language models (LLMs) to smaller ones, yet its success depends on…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human refere…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

AI-augmented classrooms generate rich teacher and student feedback before graded outcomes become available, yet these signals can be diffic…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

Large language models (LLMs) demonstrate strong chain-of-thought (CoT) reasoning abilities, while smaller models (<= 3B parameters) signifi…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Provably Secure Agent Guardrail

As large language models transition from bounded generative engines to agents with expansive execution privileges, AI going out of control…

2026-05-29 13:00 JSTarXiv cs.AIエージェント研究/論文

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

Task success can hide process anomalies in real-world agent executions. An agent may pass the final task oracle while still accumulating un…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

The Dynamic Flexible Job Shop Scheduling Problem (DFJSP) necessitates a trade-off between instant reaction to stochastic disturbances and g…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Foundation models are increasingly trained on synthetic data generated by prior model iterations rather than exclusively on real data. This…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

The era of the Internet of Agents (IoA) is taking shape: LLM agents are expected to fulfill user goals by orchestrating fast-growing popula…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified languag…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Supervised fine-tuning (SFT) followed by reinforcement learning (RL) has become a standard post-training paradigm for large language models…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Rubric-Guided Process Reward for Stepwise Model Routing

Stepwise model routing improves the efficiency of Large Reasoning Models (LRMs) by assigning each reasoning step to a suitable model. Recen…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

We demonstrate that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, address…

2026-05-29 13:00 JSTarXiv cs.AIロボティクスビジネス/資金調達

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

Action-conditioned world models are increasingly used as scalable simulators for robot learning, yet current evaluations provide limited ev…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics

While large language models (LLMs) excel at static scientific reasoning, they struggle to model the temporal structure of dynamic physical…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Safety alignment for large language models (LLMs) aims to reduce harmful or unsafe behavior while preserving general utility. However, rece…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark

We benchmark three supervised fine-tuned models against frontier zero-shot baselines on a 661-row held-out slice of PiSAR (Persona, intent,…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Persona prompting is widely used to steer large language models, yet its practical value remains unclear. Prior work often evaluates person…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

Reinforcement learning (RL) has shown promise in traffic signal control (TSC). However, its reliance on predefined states limits responsive…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials

Miller-index identification from powder XRD patterns requires capabilities untested by existing multimodal benchmarks: the model must read…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data

Wearable devices enable continuous monitoring of physiological signals such as ECG and PPG, but existing mHealth systems are largely limite…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF

Large Language Models (LLMs) are increasingly deployed in agentic and retrieval-augmented generation (RAG) systems, where they must execute…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Xetrieval: Mechanistically Explaining Dense Retrieval

Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-d…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Large language models (LLMs) are increasingly deployed as interactive agents, yet their capacity for social and strategic reasoning over ex…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

As scientific literature grows rapidly, automated survey generation has become a key capability for AI scientists and human researchers. Ho…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on la…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification

Building mathematical optimization models is critical in operations research (OR), while it requires substantial human expertise. Recent ad…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation

Parameterizing high-fidelity "digital twins" of batteries is a critical yet challenging inverse problem that hinders the pace of battery in…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

ParaTool: Shifting Tool Representations from Context to Parameters

Tool calling extends large language models (LLMs) by enabling grounded interaction with external executable interfaces, thereby supporting…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Planning with the Views via Scene Self-Exploration

Can VLMs predict how each camera move changes the view, and plan many such moves ahead? We call this capability view planning, requiring (1…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

Tool-Integrated Reasoning (TIR) extends LLM capabilities by leveraging external environments. However, existing methods lack the deliberati…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

GPS-Enhanced Tourist Mobility Modeling with Seasonal Spatial Priors and LLM-Based Activity Chain Generation

Tourist mobility poses a distinct challenge for urban transportation planning. Unlike resident commuting, tourist travel is largely non-rou…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

FinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement Verification

We introduce FinVerBench, a benchmark and validity study for financial statement verification: determining whether a set of corporate finan…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

Modeling the interplay between external stimuli and internal neural representations is a pivotal research area for Brain-Computer Interface…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

Retrieval-augmented generation (RAG) for document-based Open-domain Question Answering (ODQA) on large-scale industrial corpora faces two c…

2026-05-29 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models

The topic of Co-creation, i.e., AI agents interacting with humans to generate outputs (e.g., art), has gained significant attention recentl…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

Attack Success Rate (ASR) evaluates each jailbreak with a single yes/no label at the end of generation, telling us whether a failure happen…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

VikingMem: A Memory Base Management System for Stateful LLM-based Applications

Large Language Models have revolutionized interactive applications; however, their finite context windows pose a critical data management c…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planni…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Large language models (LLMs) are increasingly being used to generate health text from structured records such as wearable time series, biom…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?

Given a strategically complex board game, human players can quickly learn to devise strategies after playing a few rounds. Autonomous agent…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体ビジネス/資金調達

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics re…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

LLM agents acting in structured environments fail in operational rather than conversational ways, and reliability depends on procedural kno…

2026-05-29 13:00 JSTarXiv cs.AIエージェント研究/論文

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The de…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

As large language models (LLMs) are increasingly applied in social contexts such as emotional companionship and customer service, measuring…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Large Language Models (LLMs) excel at understanding natural language but struggle with optimisation tasks involving multiple constraints an…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting

Approximately 10% of newborns require assistance to initiate breathing at birth, and around 5% need ventilation support. Fetal heart rate (…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices

Trajectory prediction is a fundamental task for autonomous systems, requiring complex reasoning about multi-agent interactions and intents.…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computati…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Uncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy Management

Scaling data-driven energy forecasting to district level requires models that can be re-used across buildings with minimal target-domain da…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Citation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question Answering

Deploying Large Language Models (LLMs) for regulatory compliance demands rigorous traceability via comprehensive citations across multi-tie…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

The impressive performance of generalist large language models (LLMs) such as GPT and Claude in healthcare raises a critical question: will…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Benchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation Models

Electroencephalography (EEG) is a widely used non-invasive technique for measuring brain activity in brain-computer interface (BCI) applica…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

As large language models continue to scale, low-bit weight-only post-training quantization (PTQ) offers a practical solution to their memor…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks

Existing traffic forecasting benchmarks assume a fixed sensor set, but real road-sensor networks grow continuously as the road network chan…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

Reproducibility is fundamental to the scientific method, yet remains a critical challenge in machine learning. Contributing factors include…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Critical sequential decisions are rarely single-timescale: a strategic decision causally shapes the context in which every subsequent tacti…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

LLM agents now draw on growing skill libraries to handle complex tasks. However, injecting more skills does not always improve task complet…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Real-world tasks often lack large labeled datasets, motivating extensive work on learning in low-data regimes. However, existing approaches…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sou…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing

The growing number of submitted papers has motivated the exploration of Large Language Models (LLMs) as a means to support and augment the…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Harnessing non-adversarial robustness in large language models

The work presents an approach for addressing the challenge of robustness in Large Language Models (LLMs) to alterations and potential error…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Quantifying and Optimizing Simplicity via Polynomial Representations

Deep networks often exhibit a preference for "simple" solutions, and such a simplicity bias is widely believed to play a key role in genera…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation

Leveraging Large Language Models (LLMs) to automatically formulate and solve optimization problems from natural language has emerged as an…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

As multimodal language models play an increasingly important role in scientific research, materials science offers a critical testbed due t…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation

Key-Value (KV) cache remains a major bottleneck for deploying Large Language Models (LLMs) in long-generation tasks. Prior work often appli…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories

LLM-based agents have demonstrated strong capabilities in solving complex tasks through multi-step reasoning and tool use. However, existin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

On the Geometry of Games and their Solvers

A central challenge in game theory and learning systems such as GANs is understanding which algorithms can efficiently compute equilibria a…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Toward AI Systems That Understand Self and Others: A Multi-Phase Inference Framework for Human Cognitive Diversity and World-Model Alignment

Mutual misunderstanding in contemporary society does not arise merely because people hold different opinions or values. Even under the same…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

It`s All About Speed: AI`s Impact on Workflow in Music Production

In this paper, we present the results of an ethnographic study into the impact of AI and automated tools on music production workflow. Focu…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Make LLM Learn to Synthesize from Streaming Experiences through Feedback

Large language models (LLMs) have been widely adopted for synthetic data generation, significantly reducing annotation costs. However, most…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization

Understanding how harm emerges from interaction between otherwise benign image-text pairs requires intent-aware cross-modal reasoning beyon…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Formalizing Mathematics at Scale

We present AutoformBot, a multi-agent system for building an Autoformalized Textbook Library At Scale (Atlas) in Lean 4. AutoformBot orches…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Meta-Programming for Linear-time Temporal Answer Set Programming

The development of temporal extensions of Answer Set Programming (ASP) has led to the emergence of non-monotonic linear-time (TEL), dynamic…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

Marine lead (Pb) and its isotopes are critical tracers for ocean circulation and anthropogenic pollution, yet in-situ observations remain c…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Accelerating Constrained Decoding with Token Space Compression

To guarantee that an LLM's outputs conform to a specified structure, context-free grammar (CFG) decoding engines force the selection of nex…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation

Front-end web code has become a core product surface for every frontier LLM release, yet evaluating these interactive applications at devel…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Cross-domain multimodal time series forecasting is a challenging task, requiring models to integrate precise numerical comprehension, cross…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs

Urban trajectories play a crucial role in modeling urban dynamics and supporting various smart city applications. However, privacy concerns…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

RAISE: RAG Design as an Architecture Search Problem

Retrieval-augmented generation (RAG) systems expose numerous design choices spanning query rewriting, chunking, retrieval depth, reranking,…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs

Large Language Models (LLMs) demonstrate a remarkable capacity to adopt different personas and roles; however, it remains unclear whether t…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection

Automating scientific computing workflows requires more than generating executable code: autonomous systems must also select appropriate co…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

Diffusion Transformers have become a powerful backbone for text-to-image generation, but their layered and cross-modal generation process m…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Conformal Certification of Reasoning Trace Prefixes

Language model reasoning traces are rarely all-or-nothing; they frequently contain valid intermediate steps before a critical error occurs.…

2026-05-29 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison

Emerging personal AI agents are moving toward persistent, multi-source memory. This creates an evaluation problem: systems must decide how…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers built on counterfactual regr…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

Understanding how Vision-Language-Action (VLA) models transform multimodal knowledge into embodied control remains an open challenge. We pr…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

LLM-based multi-agent systems have demonstrated remarkable performance on complex tasks through collaborative reasoning. However, these sys…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

Despite the rapid deployment of LLMs into classrooms, validating educational AI remains uniquely intractable: interventions act on developi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Anchorless Diversification for Parallel LLM Ideation

LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Temporal Stability and Few-Shot Prompting in Math Task Assessment

As AI tools become increasingly integrated into educational contexts, questions arise about both their stability over time and their respon…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. Howe…

2026-05-29 13:00 JSTarXiv cs.AIハードウェア/半導体ビジネス/資金調達研究/論文

BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

Biosecurity evaluations of language models typically ask whether models produce hazardous output. This paper asks a complementary question:…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance

The widespread adoption of AI chatbots in education will drastically change learning, making responsible deployment a critical concern. Whi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

The double-edged sword of integrating Large Language Models (LLMs) requires an effective triadic collaboration mechanism among LLMs, teache…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit

The same prompt -- "best CRM software" -- reaches AI assistants from buyers in widely different contexts: a solo founder, an enterprise VP,…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their st…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

mcp-proto-okn: Natural-language access to open scientific knowledge graphs through the Model Context Protocol

MCP Server Proto-OKN (mcp-proto-okn) is a Python-based Model Context Protocol server that enables AI assistants to discover, inspect, query…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure

Scientific discovery is an inherently creative and uncertain process, requiring reasoning beyond the recall of known knowledge. While many…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Demystifying Data Organization for Enhanced LLM Training

Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curatio…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can vi…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfa…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While g…

2026-05-29 13:00 JSTarXiv cs.AIエージェント研究/論文

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Cl…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form ge…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptua…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

As Large Language Models (LLMs) are increasingly deployed in safety-critical applications, robust content moderation becomes essential. We…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

Long-horizon interactive agents often accumulate large trajectory histories yet still fail to answer questions about earlier events reliabl…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A comparative study of transformer-based embeddings for topic coherence

Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups accor…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Automatic speech recognition (ASR) has the potential to substantially reduce manual annotation effort in child speech research by generatin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

Syllabification describes the task of dividing words into syllables. Due to many rules and exceptions, training an algorithm to perform syl…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

Large Language Models (LLMs) extend their capabilities through function-calling (FC), which relies on training data with high quality, dive…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

The Plain Writing Act in the United States requires government documents to be accessible in clear and simple language that the general pub…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation

While Large Language Models (LLMs) have demonstrated remarkable capabilities, their reliability is significantly compromised by hallucinati…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Specialty-Specific Medical Language Model for Immune-Mediated Diseases

Extracting detailed clinical information from free-text medical narratives remains a practical challenge for researchers and healthcare sys…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

Large language model (LLM) agents with tool-calling capabilities are increasingly deployed in production systems, yet a fundamental reliabi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

The success of large language models (LLMs) across diverse NLP tasks has elevated the importance of reasoning chain optimization as a criti…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

Deployed language models are evaluated in a non-stationary environment: model versions, retrieval layers, safety systems, and real-world in…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinf…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Imperfect-information multiplayer games test whether agents can act under hidden information, sparse rewards, and non-stationary opponents.…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

What does a world model learn from physical exploration, without any linguistic supervision? We argue the answer is organized by a single p…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

Token-based time series large language models (TS-LLMs) have emerged as a promising direction for time series analysis and reasoning. Howev…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation

Generating high-quality time-series data is challenging because real-world signals often exhibit multimodal patterns and multiscale dynamic…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

Metagenomic taxonomic annotation aims to identify the microbial origins of DNA fragments in environmental samples. Traditional methods that…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Balancing Multimodal Learning through Label Space Reshaping

Multimodal learning often suffers from modality imbalance, where modalities that converge faster dominate optimization while others remain…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Representation Alignment Rests on Linear Structure

We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, an…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis

CI failure logs are large (median 5k lines, max 200k in this corpus) and noisy. Coding agents that try to debug them depend on an upstream…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

With the rapid advancement of large language models, evaluating human-likeness in open-ended conversation has become increasingly important…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Context Distillation as Latent Memory Management

Context distillation compresses contextual information into model parameters, yet existing methods often ignore how multiple distilled late…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Quantum-Enhanced Adversarial Robustness in Artificial Intelligence

Artificial Intelligence has achieved remarkable success across diverse application domains. However, its vulnerability to adversarial attac…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Hallucination Detection-Guided Preference Optimization for Clinical Summarization

Large language models (LLMs) have shown promise on summarization tasks, but they often produce hallucinations, which are unsupported or inc…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

AIRGuard: Guarding Agent Actions with Runtime Authority Control

Tool-using language agents turn model decisions into external side effects: they read files, run scripts, call APIs, send messages, and inv…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope

We report a comparison of two state-of-the-art agentic AI systems, Claude Code (Anthropic) and Codex (OpenAI), tasked with autonomously exe…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

Large language models have achieved strong reasoning capabilities, though often at the cost of massive parameter counts and expensive infer…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Conf-Gen: Conformal Uncertainty Quantification for Generative Models

Conformal prediction (CP) and its extension, conformal risk control (CRC), are established frameworks for quantifying uncertainty in superv…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to me…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

Recent advances in deep learning have enabled increasingly accurate electroencephalography (EEG)-based classification of Major Depressive D…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Hamilton-Jacobi Theory of Deep Learning

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

LLMs are vulnerable to prompt injection attacks. However, this vulnerability has been primarily demonstrated conceptually in academic studi…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

A paraphrase-quality audit of MathCheck (ICLR 2025) detected 4 semantically incorrect paraphrases in 129 groups (3.1%); removing them drops…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

Diffusion-based neural solvers for combinatorial optimization repeatedly re-evaluate dense edge/factor interactions, making inference expen…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Label-Free Reinforcement Learning via Cross-Model Entropy

Post-training large language models with reinforcement learning is bottlenecked by the reward signal. Existing approaches require either gr…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

Conditioned Sequence Models (CSMs) learn policies by treating return-to-go (RTG) as a control signal. However, existing CSMs often treat th…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

Smart contract decompilation aims to recover high-level source code from bytecode, but evaluating decompilers remains difficult because exi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same que…

2026-05-29 13:00 JSTarXiv cs.AIハードウェア/半導体

OISD: On-Policy Internal Self-Distillation of Language Models

Recent reinforcement learning (RL) post-training approaches primarily optimize the final output policy using sparse outcome-level rewards,…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization

Large language models (LLMs) increasingly rank products, documents, and recommendations for user queries, which makes manipulating these ra…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning

Unix competence is the ability to use shell and operating-system primitives as first-class tools, not merely to write programs through a te…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

When and How Long? The Readout-Mediator Angle in Temporal Reasoning

A linear probe can decode a representation almost perfectly and yet be completely irrelevant to how the model uses it. On calendar-date dur…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving

Latency-accuracy tradeoffs are fundamental in real-time applications of deep neural networks (DNNs) for cyber-physical systems. In autonomo…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

Traditional recommender systems (RecSys) primarily infer user preferences from implicit signals (such as clicks, watches, and purchases), o…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

Medication recommendation predicts medications for patient visits, but existing methods still face two key challenges. At the model level,…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Real-rootedness of the Poincar\'e polynomials of $\overline{\mathcal M}_{0,n}$: an AI-assisted proof

We prove real-rootedness for the Poincar\'e polynomial \[ P_n(t)=\sum_{i=0}^{n-3} \dim H^{2i}(\overline{\mathcal M}_{0,n};\mathbb{Q})t^i \]…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

Neural networks trained under different hyperparameter settings can fall into distinct training "regimes," with consistent behavior within…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CA-AC-MPC: CUDA-Accelerated Actor-Critic Model Predictive Control

In the literature, actor-critic model predictive control (AC-MPC) integrates MPC with reinforcement learning to enable high-performance con…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Parallax: Parameterized Local Linear Attention for Language Modeling

Large Language Models (LLMs) have become the central paradigm in artificial intelligence, yet the core computational primitive of attention…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Evolutionary Refinement of Generative Graph Topologies: A Hybrid WGAN-GA Approach

Generating realistic graph-structured data is challenging due to discrete connectivity, varying graph sizes, and class-specific structural…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Domain-Informed Representation for Evolutionary Sieving in Integral and Module Lattices

Traditional cryptography, rooted in problems, e.g., integer factorisation or discrete log, is inevitably vulnerable to a fully operational…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

Legal NLP benchmarks are overwhelmingly English-centric, leaving failure modes in morphologically rich, non-Latin-script languages undetect…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era

Metal-organic frameworks (MOFs) are excellent candidates for water harvesting due to their tunable pore environments, which can be precisel…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints

As machine learning(ML) systems evolve to continual adaptation, each re-training cycle uses compute, annotation, and energy. We introduce T…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback

Large Language Models (LLMs) offer a promising avenue for scientific discovery, yet their application to symbolic regression is often const…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Stochastic Lifting for Generating Trajectories of Stochastic Physical Systems

Many stochastic physical systems evolve smoothly over time in the sense that the distribution of states changes regularly across time steps…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents

AI agents augment large language models with external tools such as web retrieval, enabling grounded and up-to-date responses. However, inc…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Toward Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children's Data

Age estimation from facial images typically relies on training data that includes images of minors, a practice that raises serious ethical,…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference

Diffusion language models (dLLMs) generate text by iteratively denoising multiple token positions in parallel, offering an attractive alter…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

Forecasting conversational derailment is the task of predicting, as the conversation unfolds, whether it will eventually derail into person…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to know…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

Symmetry is a central organizing principle in natural systems, yet its use as a unifying design strategy in robotics has largely remained l…

2026-05-29 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

Role-playing with large language models is fundamentally a session-level task, requiring agents to sustain character identity and interacti…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs

Given the wide range of deployment targets, flexible model selection is essential for optimizing performance within a given compute budget.…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most exis…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Causal Label Recovery in Payment Networks

Fraud detection models in payment networks train on chargeback labels that are systematically biased. Every label must survive three sequen…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation

Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffe…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts

Recent physics foundation models claim general spatiotemporal forecasting ability, yet their evaluations often collapse performance into a…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

Pocket-Dentist: On-Device Dental Image Understanding via Efficient Multimodal Large Language Models

Evaluations of dental vision-language models remain fragmented across datasets, task definitions and metrics, and often ignore their comput…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

Recent Large Audio-Language Models (LALMs) have demonstrated promising abilities in understanding musical content. However, whether their r…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GrepSeek: Training Search Agents for Direct Corpus Interaction

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reason…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Rethinking FID Through the Geometry of the Reference Dataset

Fr\'echet Inception Distance (FID) is widely used to evaluate image generators, yet lower FID does not always correspond to better sample q…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Does Distributed Training Undermine Compute Governance?

Compute governance proposals often rely on the assumption that frontier AI training requires large, detectable computing clusters. However,…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow

The intricate nature of modern surgical care necessitates intelligent systems that can synthesize extensive patient records, support collab…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

TRACER: Persistent Regularization for Robust Multimodal Finetuning

Mainstream strategies for finetuning pretrained multimodal models often degrade out-of-distribution (OOD) robustness, a phenomenon known as…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

On the Optimizer Dependence of Neural Scaling Laws

The scaling exponent $\alpha$ in neural scaling laws $L(N) \propto N^{-\alpha}$ is commonly treated as a fixed constant set by architecture…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

Reinforcement learning (RL) can be used to improve the policy (denoiser) of diffusion large language models (dLLMs), while being hindered b…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge

Understanding long-form egocentric videos remains challenging for multimodal large language models (MLLMs) due to limited context length an…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

Under standard graphical assumptions, the Markov boundary of a target variable is the smallest set of features that renders every other fea…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning

Recent studies have shown that code-switching data (CSD), in which multiple languages are mixed within the same context, can improve cross-…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework

We present DEtection in phase-folded Light curves with cOntrastive Scoring (DELOS), a contrastive-learning-based framework designed to sear…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

Existing sentence-level watermarking methods enhance robustness to paraphrasing by anchoring watermarks in sentence semantics. However, the…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents

Retrieval-augmented LLM agents increasingly rely on curated skill banks: collections of reusable textual principles that guide decision mak…

2026-05-29 13:00 JSTarXiv cs.AIエージェント研究/論文

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

AI coding agents increasingly act directly within software environments, yet existing analyses of their failures rely on benchmark trajecto…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions

Neural scaling laws appraise data through dataset size, while the Vendi Score uses quantum entropy to measure dataset value. We show both t…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Forget Less, Generalize More: Unifying Temporal and Structural Adaptation for Dynamic Graphs

Representation learning on dynamic graphs requires capturing complex dependencies that evolve across both time and structure. Existing appr…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment

Accurately simulating the decisions of a specific individual remains challenging for large language models (LLMs), partly because persona i…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

The emergence of Large Vision-Language Models (LVLMs) has substantially expanded model capabilities beyond text-only understanding, enablin…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Reflexion-style agents rely on self-generated reflections as memory, implicitly assuming that agents can accurately diagnose their own fail…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference

Stacking probabilistic building blocks into deeper architectures typically breaks closed-form inference. We show that closed-form inference…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Large language models (LLMs) are increasingly used to support scientific work, but it is unclear whether they uphold responsible conduct of…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

Language models are increasingly being deployed for conversational support in informal caregiving contexts, where interactions often extend…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery

Large language models (LLMs) show remarkable potential in scientific hypothesis discovery. However, existing approaches face two critical l…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Evolutionary Rule Extraction from Corporate Default Prediction Models

Small and medium-sized enterprises (SMEs) represent the majority of firms in most economies and often face financial constraints and higher…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

PhoneWorld: Scaling Phone-Use Agent Environments

A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build a…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Conditional human motion generation remains a fundamental challenge in computer vision and robotics. Despite significant progress, current…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation

Since public access to generative AI tools became widespread, federal civil litigation has seen a marked increase in pro se (self-represent…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities

Off-policy evaluation estimates how a target policy would perform using data collected by a different behavior policy, which is crucial whe…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation

Low-resource target-language generation is often limited by scarce parallel data, while high-resource source-language monolingual data is a…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Network Optimization Aspects of Autonomous Vehicles: Challenges and Future Directions

Global megatrends, such as urbanization, population growth, and emerging network solutions are accelerating the development of the Connecte…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

Relay and reseller APIs increasingly intermediate access to large language models (LLMs), but users have no direct way to verify that a cla…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast num…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing

Exploratory GUI testing is a particularly demanding setting for MLLM agents: without predefined test scripts, an agent must autonomously na…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection

Vision-language foundation models have shown promising zero-shot generalization for Cross-Domain Few-Shot Object Detection (CD-FSOD). Howev…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring

Pilot readback of Air Traffic Control (ATC) voice instructions is a primary safeguard against miscommunication in air transportation. Howev…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

Deep learning optimization relies heavily on the assumption of smooth loss landscapes, a condition systematically violated by modern archit…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

Vision-Language-Action~(VLA) models have shown strong potential for general-purpose robotic manipulation, yet they still struggle to genera…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Brain-IT-VQA: From Brain Signals to Answers

Decoding visual content from fMRI signals recorded while a person views images, and specifically answering questions about the seen images,…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Training Deliberative Monitors for Black-Box Scheming Detection

As autonomous agents become more capable of performing real-world tasks, distinguishing scheming behavior from benign task pursuit may beco…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Learning Context-Conditioned Predicate Semantics via Prototype Feedback

In scene graph generation, a central challenge is modeling polysemous predicates whose meanings shift across contexts. Prior approaches add…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

Steering language model generation toward desired textual properties is essential for practical deployment, and inference-time methods are…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swappi…

2026-05-29 13:00 JSTarXiv cs.AIエージェント研究/論文

Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor ent…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Predicting Causal Effects from Natural Language Queries using Structured Representations

Randomized controlled trials are a cornerstone of medicine and the social sciences as they enable reliable estimates of causal effects. How…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Sample Complexity of Multiclass and Sparse Contextual Bandits

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, select…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning

Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both comp…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content

Real-time safety filtering for large language model (LLM) applications requires classifiers that can detect unsafe prompts, toxic language,…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema co…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration

Collaborations with Generative AI often begin with a short prompt and end with an opaque output, leaving implicit who was involved, what ta…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Personalized Turn-Level User Conversation Satisfaction Benchmark

User satisfaction with AI assistants is highly personalized: the same response may satisfy one user but disappoint another depending on wha…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies

Grounded claim factuality checking is important for large language model (LLM) applications such as retrieval-augmented generation, as it h…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Little Book of Generative AI Foundations: An Intuitive Mathematical Primer

This book provides a compact, derivation-oriented introduction to the mathematical foundations of modern generative artificial intelligence…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

Legal NLP benchmarks overwhelmingly evaluate a single language or aggregate tasks that differ fundamentally across jurisdictions, making cr…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging

Dual-energy CT (DECT) enables virtual monochromatic imaging (VMI) and improved contrast resolution, but its clinical adoption is limited by…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Reliable semantic segmentation for mobile robots requires both accurate dense prediction and robust uncertainty estimation under distributi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning

Reinforcement learning (RL) refines large language models (LLMs) by directly optimizing model behavior through reward signals. While accura…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems

LLM-based multi-agent systems (MAS) have emerged as an effective paradigm for complex and long-horizon tasks. However, in real-world tasks,…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Data filtering methods for training language models

Data quality is a critical factor in the effectiveness of machine learning models. Label errors, present even in widely used benchmarks, in…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Inferring Code Correctness from Specification

Large language models (LLMs) have become integral to modern software development, enabling automated code generation at scale. However, val…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models

Existing methods in Multimodal Knowledge Editing (MKE) have advanced the ability to correct outdated or inaccurate knowledge in Multimodal…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CB-SLICE: Concept-Based Interpretable Error Slice Discovery

Despite strong average-case performance, deep learning models often exhibit systematic errors on specific population groups, known as error…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization

Post-training quantization (PTQ) is essential for deploying LLMs under memory and bandwidth constraints. However, extreme low-bit quantizat…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

ESPO: Early-Stopping Proximal Policy Optimization

When a large language model under reinforcement learning commits a wrong reasoning step early in a trajectory, standard algorithms force it…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, w…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions

AI-driven respiratory sound classification (RSC) is promising for automated pulmonary disease detection, yet multi-site deployment is hinde…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension

Do next-generation LLM agents inherit the cooperative biases documented in their predecessors, or does scale and provider diversity reshape…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding we…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) improves knowledge-intensive question answering by incorporating external evidence. However, existing…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little ex…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体研究/論文

Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

Patient-voiced clinical-triage benchmarks report high under-triage rates for consumer LLMs for constrained multiple-choice output, yet the…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

Consensus protocols form the backbone of distributed systems and blockchains, where implementation bugs can cause data corruption and finan…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes b…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Despite recent advances, LLM-based web agents still struggle with limited exploration, omission of critical steps, and sensitivity to task…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning judgments,…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

CityGen: Structure-Guided City-Style Synthesis for Cross-City Autonomous Driving

Autonomous driving systems are commonly trained and evaluated within limited geographic regions, which hinders their scalability when deplo…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-qu…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Large language model (LLM) agents increasingly leverage long term memory to support persistent and autonomous task execution. However, this…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Honeypots are decoy systems mimicking real system components designed to defend against cyber attacks. Recently, LLMs increasingly serve as…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Evaluating Skill and Stability of ArchesWeather and ArchesWeatherGen under Multi-Decadal Climate Simulations

We evaluate the climate simulation capabilities of ArchesWeather and ArchesWeatherGen, two machine learning models originally trained for w…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Genetically Aligned Patient Representations Improve Hematological Diagnosis

Multimodal alignment of histopathology encoders with transcriptomic and genomic data has been shown to significantly improve performance in…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-syn…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

Recent work has begun to equip vision-language-action (VLA) policies with explicit intermediate reasoning. In embodied control, however, te…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Test Time Training for Supervised Causal Learning

Supervised Causal Learning (SCL) has shown promise in causal discovery by framing it as a supervised learning problem. However, it suffers…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Positional encoding (PE) underpins how permutation-invariant Transformers represent sequence order, yet how positional information is proce…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline,…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models

Diffusion models generate highly realistic images but often struggle with precise text-image alignment. While recent post-training methods…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Per-token billing is now the standard pricing model for commercial large language models (LLMs), so the honesty of reported token counts di…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Masked Diffusion Modeling for Anomaly Detection

Anomaly detection aims to identify samples that deviate from the nominal data distribution and is central to many safety-critical applicati…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

REPOT: Recoverable Program-of-Thought via Checkpoint Repair

One-shot Program-of-Thought (PoT) emits a Python program that prints a primitive-action plan; a single invalid action silently invalidates…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Projectional Decoding: Towards Semantic-Aware LLM Generation

Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Predictive Law for On-Policy Self-Distillation From World Feedback

Moving beyond simple scalar rewards toward richer world feedback is a natural path to more scalable RL post-training. On-policy self-distil…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Large language models (LLMs) can autonomously conduct multi-stage cyber attacks, but the consistency of their offensive behavior under repe…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offer…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDAR

Point cloud segmentation is a fundamental task in 3D scene understanding. Its progress is constrained by the high cost and time required fo…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis

Survival analysis concerns the task of predicting the time until an event occurs. Often used in the medical field, survival analysis deals…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grain…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Beyond MSE: Improving Precipitation Nowcasting with Multi-Quantile Regression

Deep-learning precipitation nowcasting models are often optimized using pointwise losses such as mean squared error or mean absolute error,…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

Large Vision-Language Models (LVLMs) map visual inputs into dense token sequences, imposing a quadratic computational bottleneck for infere…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

DAMEL: Dual-Axis Multi-Expert Learning for Class-Imbalanced Learning

Various algorithms have been proposed to address the challenges posed by class-imbalanced learning from real-world data with long-tailed di…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) f…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

Proactive agents read user activity as text and call an LLM on every event to decide whether to act. But user activity is not natively text…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Neural Network Verification using Partial Multi-Neuron Relaxation

The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guarant…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

On Distributional Reinforcement Learning in Chaotic Dynamical Systems

Chaotic dynamical systems pose a fundamental challenge for Reinforcement Learning (RL): exponential sensitivity to initial conditions induc…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis

Parameter-efficient adaptation has made LLMs practical for domain prediction, but standard LoRA still relies on a static low-rank update an…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

Reliable probability estimates are critical in many machine learning applications, yet modern classifiers are often poorly calibrated. Post…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

We show that LoRA adapters, the dominant distribution format for fine-tuned LLMs, can be reliably backdoored through training data poisonin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

What drives performance in molecular MPNNs? An operator-level factorial benchmark

Message-passing neural networks (MPNNs) are widely used for molecular property prediction, but their deployment as monolithic architectures…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

HPO: Hysteretic Policy Optimization for Stable and Efficient Training under Sparse-Reward Regime

We investigate a narrow but common failure mode of GRPO-style reinforcement learning in the context of sparse verifiable rewards: early upd…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year ove…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world roboti…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizi…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Vision-Language Models (VLMs) often struggle with robust 3D spatial reasoning. Prevailing methods that rely on fine-tuning with 3D visual q…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Do Language Models Track Entities Across State Changes?

Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Reinforcement Learning with Robust Rubric Rewards

While Reinforcement Learning with Verifiable Rewards (RLVR) is effective for deterministically checkable tasks, many vision-language tasks…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Large language models (LLMs) often solve a task when all instructions are given in a single prompt, but fail when the same information is r…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Large language models (LLMs) show promise in generating supportive responses for mental health queries, but improving their usefulness, emp…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Document-level translation remains one of the most challenging tasks for large language models, which are constrained by limited context wi…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fra…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Self-Trained Verification for Training- and Test-Time Self-Improvement

Self-improvement at scale has been a longstanding goal for reasoning models, and there are two natural places to do it: at test time, throu…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health recor…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

City-scale 3D surface reconstruction from multiview images for downstream 3D simulation, poses highly challenging problems due to the scale…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Archon: A Unified Multimodal Model for Holistic Digital Human Generation

Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motio…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes

Portrait photography is largely decided before the shutter opens: the subject's pose, the camera configuration, and the lighting devices mu…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Gram: Assessing sabotage propensities via automated alignment auditing

We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

In-Context Reward Adaptation for Robust Preference Modeling

Reinforcement Learning from Human Feedback (RLHF) typically relies on static reward models to align Large Language Models with human prefer…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

On Language Generation in the Limit with Bounded Memory

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language o…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world env…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Reasoning with Sampling: Cutting at Decision Points

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

GPIC: A Giant Permissive Image Corpus for Visual Generation

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permis…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Unlocking the Working Memory of Large Language Models for Latent Reasoning

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens bef…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and fa…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layo…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

PersonaAgent: Bridging Memory and Action for Personalized LLM Agents

Large Language Model (LLM) empowered agents have recently emerged as advanced paradigms that exhibit impressive capabilities in a wide rang…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data

High-quality mathematical and logical datasets with verifiable answers are essential for strengthening the reasoning capabilities of large…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Estimating the Empowerment of Language Model Agents

As language model (LM) agents become increasingly capable and adopted in real-world applications, there is a growing need for scalable eval…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance

Large language models (LLMs) have recently advanced in reasoning when optimized with reinforcement learning (RL) under verifiable rewards.…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting

Large language models (LLMs) can be influenced by harmful or irrelevant context, which can significantly harm model performance on downstre…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis

Modern enterprises generate vast streams of time series metrics when monitoring complex systems, known as observability data. Unlike conven…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization

We introduce CodeEvolve, an open-source framework that couples large language models with island-based evolutionary search for end-to-end a…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Large-Scale AI and Foundation Models for Neuroscience: A Comprehensive Review

The development of large-scale artificial intelligence (AI) models is influencing neuroscience research by enabling end-to-end learning fro…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Modeling Hierarchical Thinking in Large Reasoning Models

Large Reasoning Models (LRMs) solve complex tasks by generating long Chain-of-Thought (CoT) sequences; however, the emergent dynamics gover…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Graph-Enhanced Policy Optimization in LLM Agent Training

Multi-step LLM agents in interactive environments represent a crucial step toward long-horizon decision-making. To train such agents, group…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

The evolution of mathematics is shaped importantly by interestingness: researchers choose which problems to pursue, and students choose whi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents

Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts. However, a critical bo…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models

Instruction following is critical for large language models, yet real-world instructions often involve multiple constraints with logical st…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech

Social media platforms are increasingly dominated by long-form multimodal content, where harmful narratives are constructed through a compl…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Current LLM post-training methods optimize complete reasoning trajectories through Supervised Fine-Tuning (SFT) followed by outcome-based R…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning

Agentic Reinforcement Learning (ARL) trains large language models to interleave reasoning with external tool execution to solve complex tas…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

LLM-driven agents excel at sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scena…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) Agents

The design of Analog and Mixed-Signal (AMS) integrated circuits remains heavily reliant on expert knowledge, with transistor sizing a major…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs

Inference-time scaling via chain-of-thought (CoT) reasoning is a major driver of state-of-the-art LLM performance, but it comes with substa…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Small Agent Group is the Future of Digital Health

The rapid adoption of large language models (LLMs) in digital health has been driven by a "scaling-first" philosophy, i.e., the assumption…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these interme…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Causal-JEPA: Learning World Models through Object-Level Latent Masking

World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations pr…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Benchmarking at the Edge of Comprehension

As frontier Large Language Models (LLMs) increasingly saturate new benchmarks shortly after they are published, benchmarking itself is at a…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

Mean Field Games (MFGs) provide a principled framework for modelling interactions in large population systems. However, algorithmic progres…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information from individual…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is propos…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse Autoformalization

Autoformalization aims to produce formal statements that compile and faithfully preserve the intended meaning of informal mathematics. Yet…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

When Models Learn to Ask Why: Adaptive Causal Reasoning for Trustworthy Medical Vision-Language Models

Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet,…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MemCollab: Cross-Model Memory Collaboration via Contrastive Trajectory Distillation

LLM agents increasingly rely on memory mechanisms to reuse knowledge from past problem-solving experiences. However, existing methods typic…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' out…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MediHive: A Decentralized Agent Collective for Medical Reasoning

Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplina…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models

Multimodal Large Reasoning Models (MLRMs) have achieved remarkable strides in visual reasoning through test time compute scaling, yet long…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning

Current multimodal models often suffer from shallow reasoning, leading to errors caused by incomplete or inconsistent thought processes. To…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

Large language models (LLMs) are increasingly used for causal and counterfactual reasoning, yet their reliability in real-world policy eval…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents

Random rules improve a coding agent's task performance as much as expert-curated ones (both $+13.8$pp on a discriminative subset of SWE-ben…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration

While chain-of-thought (CoT) reasoning enables LLMs to solve challenging reasoning tasks, the linear growth of the KV cache leads to substa…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Human-Guided Harm Recovery for Computer Use Agents

As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but al…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling

Large Language Models (LLMs) now exhibit remarkable reasoning capabilities through test-time compute scaling (TTS), with impressive perform…

2026-05-29 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulat…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Foundation Model for Zero-Shot Logical Rule Induction

Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Hierarchical Task Network Planning with LLM-Generated Heuristics

HTN planning is a variation of classical planning where, instead of searching for a linear sequence of actions, an algorithm decomposes hig…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framewor…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence

Emotional intelligence (EI), the ability to perceive, understand, and respond appropriately to others' emotional states, is central to huma…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

Document parsing converts visually rich documents into machine-readable structured representations, forming a crucial foundation for inform…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Learning A Simulation-based Visual Policy for Real-world Peg In Unseen Holes

This paper proposes a learning-based visual peg-in-hole that enables training with several shapes in simulation, and adapting to arbitrary…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting

Accurate forecasting of renewable generation is crucial to facilitate the integration of Renewable Energy Sources into the power system. Fo…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Survey on Recent Advances in Conversational Data Generation

Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, train…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Crafting Desirable Climate Trajectories with RL Explored Socio-Environmental Simulations

Climate change poses an existential threat, necessitating effective climate policies to enact impactful change. Decisions in this domain ar…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Are LLMs Socially Adaptive? Contrasting Belief Evolution in Large Language Models and Humans

As large language models (LLMs) increasingly engage in complex social interactions, ensuring that their behaviors align with human ethical…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation, enabling…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Dataset-Driven Channel Masks in Transformers for Multivariate Time Series

Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of la…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Personalized learning represents a promising educational strategy within intelligent educational systems, aiming to enhance learners' pract…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

A Composable Multimodal Framework for cine CMR-Text-Driven Prediction of Heart Failure Outcomes

Objective. Heart failure is one of the leading causes of death worldwide, with millions of deaths each year, according to data from the Wor…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data

Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio

Monaural multi-speaker automatic speech recognition (ASR) remains challenging due to data scarcity and the intrinsic difficulty of recogniz…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based servic…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

Recent approaches for video generation with camera control often create anchor videos (i.e., rendered videos that approximate desired camer…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成

VRAG: Learning World Models for Interactive Video Generation

Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices.…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Online Fair Division with Additional Information

We study the problem of fairly allocating indivisible goods to agents in an online setting, where goods arrive sequentially and must be all…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Position: Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning

This position paper argues that text embedding research should move beyond surface meaning and embrace implicit semantics as a central mode…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Model Fusion via Retrofitting

Model fusion seeks to combine independently trained neural networks into a single model without retraining, but is complicated by represent…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Taming Data Challenges in ML-based Security Tasks Using Generative AI

Machine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorit…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models

Recent text-to-image models produce high-quality results but still struggle with precise visual control, balancing multimodal inputs, and r…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成

Finding DoRI: Discovery of Retained Images in Diffusion Models

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intelle…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Approximate Proportionality in Online Fair Division

We study the online fair division problem, where indivisible goods arrive sequentially and must be allocated immediately and irrevocably. P…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GroundAct: Can LLM Agents Ground Actions in Environmental States?

LLM agents achieve 85-96% success on tasks where instructions fully specify the action, but drop to 29-53% when action feasibility depends…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Scalable RF Simulation in Generative 4D Worlds

Radio Frequency (RF) sensing has emerged as a powerful, privacy-preserving alternative to vision-based methods for various perception tasks…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Less Is More: Elevating RAG via Performance-Driven Context Compression

Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm for improving the timeliness of knowledge updates and the factual…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

GRPO is Secretly a Process Reward Model

Process reward models (PRMs) allow for fine-grained credit assignment in reinforcement learning (RL), and seemingly contrast with outcome r…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy

Time series anomaly detection (TSAD) is a critical task, but developing models that generalize to unseen data in a zero-shot manner remains…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting

As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detec…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials

Large language models (LLMs) have shown promising potential in scientific research, enabling tasks ranging from knowledge retrieval to prop…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Impact of Semantic Pairs on Self-Supervised Representation Learning

Instance discrimination learns visual representations by treating different augmented views of the same image as positive pairs. While this…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Obfuscation Rules for Detecting and Detoxifying Korean Toxicity

As language models become increasingly deployed in online environments, toxicity detection and detoxification have received growing attenti…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Offline Reinforcement Learning with Generative Trajectory Policies

Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture comp…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

The recent development of foundation models for time series data has generated considerable interest in using such models across a variety…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations

We present Empathic Prompting, a novel framework for multimodal human-AI interaction that enriches Large Language Model (LLM) conversations…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成研究/論文

LoCoT2V-Bench: Benchmarking Long-Form and Complex Text-to-Video Generation

Recent advances in text-to-video generation have achieved impressive performance on short clips, yet evaluating long-form generation under…

2026-05-29 13:00 JSTarXiv cs.AIロボティクスハードウェア/半導体

ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling

Bimanual and humanoid robots are appealing because of their human-like ability to leverage multiple arms to efficiently complete tasks. How…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

An accuracy-aware extension to LRP-based pruning for CNNs to prevent cascading accuracy degradation in data-scarce transfer learning

Convolutional Neural Networks (CNNs) pre-trained on large-scale datasets such as ImageNet are widely used as feature extractors to construc…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

Reinforcement learning (RL) in 3D environments with high-dimensional sensory input poses two major challenges: (1) the high memory consumpt…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MiAD: Mirage Atom Diffusion for De Novo Crystal Generation

In recent years, diffusion-based models have demonstrated exceptional performance in searching for simultaneously stable, unique, and novel…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic style…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models

Foundation models have revolutionized various fields such as natural language processing (NLP) and computer vision (CV). While efforts have…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Topological Order in Neural Wavefunctions

Topologically ordered states are among the most interesting quantum phases of matter that host emergent quasi-particles having fractional c…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the succ…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation

Conventional Sequential Recommender Systems (SRS) typically assign unique hash IDs (HID) to construct item embeddings, which mainly capture…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach

Motion planning for autonomous driving (AD) faces a critical trade-off. While traditional rule-based pipelines offer verifiable safety and…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Revisiting the Reliability of Language Models in Instruction-Following

Advanced LLMs have achieved near-ceiling instruction-following accuracy on benchmarks such as IFEval. However, these impressive scores do n…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens

Proteins inherently possess a consistent sequence-structure duality. The abundance of protein sequence data, which can be readily represent…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models

Qualitative data are widespread in domains such as healthcare, marketing, and bioinformatics, where clustering offers a fundamental tool fo…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges

Rubric-based text evaluation increasingly uses large language models (LLMs) as scalable judges, but aligning frozen black-box models with h…

2026-05-29 13:00 JSTarXiv cs.AIハードウェア/半導体

Steering Language Models Before They Speak: Logit-Level Interventions

Controllable generation requires language models to realize output characteristics such as reading level, politeness, and toxicity. Existin…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CORE-T: COherent REtrieval of Tables for Text-to-SQL

Realistic text-to-SQL workflows often require joining multiple tables. As a result, accurately retrieving the relevant set of tables become…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

Post-training pretrained autoregressive models (ARMs) into masked diffusion models (MDMs) has emerged as a cost-effective way to overcome t…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Grammar-Aware Literate Generative Mathematical Programming with Compiler-in-the-Loop

Mathematical programming is widely employed across various sectors - such as logistics, energy, and workforce planning - to model and solve…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers

Reasoning-oriented Large Language Models (LLMs) have achieved remarkable progress with Chain-of-Thought (CoT) prompting, yet they remain fu…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Pushing the Limits of Block Rotations in Post-Training Quantization

Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Learn from A Rationalist: Distilling Intermediate Interpretable Rationales

Because of the pervasive use of deep neural networks (DNNs), especially in high-stakes domains, the interpretability of DNNs has received i…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Post-training of reasoning LLMs is a holistic process that typically consists of an offline SFT stage followed by an online reinforcement l…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Scaling Small Agents Through Strategy Auctions

Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are suff…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders

The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approac…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling

Human conversation is organized by an implicit chain of thought and manifests as temporally structured conversational behaviors. Capturing…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) offers a resource-efficient way to personalize or specialize. Howe…

2026-05-29 13:00 JSTarXiv cs.AI画像/動画生成

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR

Reward-maximizing RL methods have shown to be capable of enhancing the reasoning performance of LLMs, but often lead to reduced generation…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Who can we trust? LLM-as-a-jury for Comparative Assessment

Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwi…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remai…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching

REPresentation Alignment (REPA) improves the training of generative flow models by aligning intermediate hidden states with pretrained teac…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A k…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MOO: A Multi-view Oriented Observations Dataset for Viewpoint Analysis in Cattle Re-Identification

Animal re-identification (ReID) faces critical challenges due to viewpoint variations, particularly in Aerial-Ground (AG-ReID) settings whe…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Post-Training Language Models for Crosslingual Consistency

Language models often respond inconsistently to translation-equivalent prompts across languages, undermining the reliability of multilingua…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answe…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial p…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Steering at the Source: Style Modulation Heads for Robust Persona Control

Activation steering offers a computationally efficient mechanism for controlling Large Language Models (LLMs) without fine-tuning. While ef…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達

P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privac…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェントロボティクス

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decisi…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data

While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-l…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing

The task of authorship style transfer involves rewriting text in the style of a target author while preserving the meaning of the original…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More

Developers and consumers increasingly choose reasoning models (RMs) based on their listed API prices. However, how accurately do these pric…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Bridge-RAG: An Abstract Bridge Tree Based Retrieval Augmented Generation Algorithm

As an important paradigm for enhancing the generation quality of Large Language Models (LLMs), retrieval-augmented generation (RAG) faces t…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Multi-Level Barriers to Generative AI Adoption Across Disciplines and Professional Roles in Higher Education

Generative Artificial Intelligence (GenAI) is rapidly reshaping higher education, yet barriers to its adoption across different disciplines…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

EvA: An Evidence-First Audio Understanding Paradigm for LALMs

Large Audio Language Models (LALMs) still struggle in complex acoustic scenes because they often fail to preserve task-relevant acoustic ev…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existin…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Combating Data Laundering in LLM Training

Data rights owners can detect unauthorized data use in large language model (LLM) training by querying with proprietary samples. Often, sup…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown

The recent, super-exponential scaling of autonomous Large Language Model (LLM) agents signals a broader, fundamental paradigm shift from ma…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Skill-based agent systems tackle complex tasks by composing reusable skills, improving modularity and scalability while introducing a large…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic s…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models

Future spacecraft operations require autonomy that can interpret high-level mission intent while preserving safety. However, existing traje…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic str…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

We introduce DialToM, an annotated Theory of Mind (ToM) benchmark built from naturalistic human-human dialogues using a multiple-choice eva…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Causal Disentanglement-Inspired Degradation Representation Learning for Full-Reference Image Quality Assessment

Existing deep network-based full-reference image quality assessment (FR-IQA) models typically work by performing pairwise comparisons of de…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Architecture-Induced Recoverability Bias in Differentiable Symbolic Regression

Symbolic regression aims to recover closed-form expressions from numerical data, but in differentiable symbolic regression the recovered ex…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls with…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

In the LLM era, many symbolic and structured problems are presented to models through 1D text serialization. Yet some such problems are nat…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio

Medical audio data is difficult to collect due to privacy regulations and high annotation costs arising from domain expertise. Thus, existi…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning

On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks expose…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AIエージェント

CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

Personal AI assistants are beginning to act as delegates with access to calendars, inboxes, and user preferences. Calendar scheduling makes…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

In this paper, we propose Concentrate and Concentrate (CaC), a coarse-to-fine anomaly reward model based on Vision-Language Models. During…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main obj…

2026-05-29 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Here is the updated abstract: Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch pass…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence

On-policy distillation (OPD) has become a promising paradigm for reasoning-oriented post-training of large language models (LLMs), especial…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

While many-shot ICL achieves remarkable performance, prior studies of its scaling behavior have mainly focused on non-reasoning tasks. In t…

2026-05-29 13:00 JSTarXiv cs.AIロボティクス

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions…

2026-05-29 13:00 JSTarXiv cs.AIエージェント

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterpr…

2026-05-29 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

While interpretable prototype networks offer compelling case-based reasoning for clinical diagnostics, their raw continuous outputs lack th…

2026-05-29 13:00 JSTarXiv cs.AI研究/論文

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Geometric problem solving, as a typical multimodal reasoning problem, has attracted much attention and made great progress recently, howeve…

2026-05-29 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation

We introduce JMed48k, a multi-profession Japanese healthcare licensing benchmark for evaluating vision-language models. Built from official…

2026-05-29 13:00 JSTarXiv cs.AIハードウェア/半導体

The Distillation Game: Adaptive Attacks & Efficient Defenses

Distillation attacks create a deployment trade-off for model providers: the same outputs that make a model more useful can also make it eas…

2026-05-29 13:00 JSTarXiv cs.AILLM/生成AI

Reducing Political Manipulation with Consistency Training

Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart…

2026-05-29 11:30 JSTITmedia AI+その他

クラウド依存、コストの課題を解消? MicrosoftのローカルAI基盤「Foundry Local」

Microsoftは、開発者がアプリケーションにAI機能を組み込めるローカルAI実行基盤「Foundry Local」の一般提供を開始した。ユーザーの端末上でAI処理を完結させる仕組みにより、クラウドへの依存やネットワーク遅延、トークン課金が発生しないAI実装が可能になるという。

2026-05-29 10:00 JSTITmedia AI+その他

それで、メモリ不足はいつまで続くの? なかなか終わらない狂騒のウラ側

長引くメモリ不足。いつ安く購入できるようになるのか……。

2026-05-29 09:55 JSTITmedia AI+LLM/生成AI

「Mythos級モデル」一般提供、数週間以内に 米Anthropic「Opus 4.8」リリース

より強力な安全策を講じた上で、数週間以内に全顧客に提供するとの見通しだという。

2026-05-29 09:44 JSTTechCrunch AIその他

Glean’s top line crosses $300M as AI budget-cutting becomes its major selling point

The enterprise AI search startup tripled its annual revenue even as tech giants entered the category.

2026-05-29 09:00 JSTITmedia AI+その他

AI-CAEで風荷重評価を効率化、大林組がRICOS製ソリューションで検証

RICOSは、大林組が建物の風荷重予測に向けたAI-CAEソリューションの検証を開始したと発表した。AIが風向や建物形状などの条件を踏まえて予測し、設計工程の効率化を図る。

2026-05-29 08:00 JSTITmedia AI+LLM/生成AI

富士通がOpenAI、Anthropicと相次ぎ提携 AIベンダーと組む狙いは?

富士通はOpenAIとAnthropicとの提携を同じ日に発表した。自社独自のAI技術を持つ同社は、AIベンダーとの提携によって何を狙うのか。

2026-05-29 08:00 JSTITmedia AI+その他

データ分析の「分からない」「準備が面倒」を解消 ソニーの「初心者」特化ツール、記者が使ってみた【レポート】

自業務の成果を高めるために、データを分析したいけれど、何から始めればいいのか分からない──と悩むビジネスパーソンは少なくない。ソニーネットワークコミュニケーションズは、こうした初心者向けのニーズに着目した。データ分析業務の初心者である筆者が体験してみたところ……。

2026-05-29 07:30 JSTITmedia AI+ビジネス/資金調達

「日本は製造業のパワーハウス」、IFSが産業AI投資を急拡大する理由

IFSジャパンは記者会見を開催し、日本市場への投資継続とパートナーシップ強化の方針を説明した。日本IBMらとの戦略的協業を通じ、製造業などアセット集約型産業のAI実装とDXを支援する。

2026-05-29 06:24 JSTTechCrunch AIエージェント

The internet is being rebuilt for machines

As AI agents move from experiments to production, AWS, Cloudflare, and others are redesigning cloud infrastructure for a future dominated b…

2026-05-29 05:06 JSTTechCrunch AIエージェント

Asana acquires no-code agent-builder StackAI

Asana will incorporate StackAI into its growing suite of AI workflow tools.

2026-05-29 05:00 JSTITmedia AI+LLM/生成AI

「Google Antigravity 2.0と戯れながら感じたこと」と「LLM Wikiを実践して『ロケスマペディア』を作ってみた」

かわさきからは「Google Antigravity 2.0と戯れながら感じたこと」というタイトルで生成AI時代における教科書的コンテンツの存在意義と、AIにコードを書かせる時代の学び方について、一色からは「LLM Wikiを実践して『ロケスマペディア』を作ってみた」というタイ…

2026-05-29 03:52 JSTTechCrunch AILLM/生成AIビジネス/資金調達

Anthropic raises $65 billion, nears $1T valuation ahead of IPO

Anthropic has closed a $65 billion Series H round at a $965 billion post-money valuation, marking what could be the AI startup's final priv…

2026-05-29 03:32 JSTTechCrunch AIハードウェア/半導体

Just like gold and oil, we’ll soon be able to trade AI token futures

Large exchanges are designing derivative products around AI tokens, which are increasingly being considered less a computational output and…

2026-05-29 03:22 JSTITmedia AI+ロボティクス

「国産人型ロボ」量産化へ 東大発スタートアップ 三菱自動車も出資

東京大学発のロボット開発スタートアップHighlandersは、国産人型ロボットの量産化を目指す取り組みを始めると発表した。

2026-05-29 02:54 JSTITmedia AI+LLM/生成AI規制/政策

デジタル庁、AI「源内」向け国産LLM再公募 有償の政府調達へ 評価テストは50問→300問に

初回公募は無償試用を前提に募集したが、27年度向け公募では政府調達(有償)に移行する。

2026-05-29 02:30 JSTTechCrunch AIその他

In just 3 weeks, StrictlyVC is coming to Los Angeles

StrictlyVC Los Angeles is on June 18. Join for meaningful networking and fireside chats with leaders from Mach Industries, Shinkei Systems,…

2026-05-29 02:00 JSTTechCrunch AILLM/生成AIエージェント

Anthropic releases Opus 4.8 with new ‘dynamic workflow’ tool

The new Opus model comes with a tool called Dynamic Workflows, for coordinating swarms of subagents.

2026-05-29 01:16 JSTITmedia AI+LLM/生成AIエージェント

Anthropic、Claude Opus 4.8を一般提供 誠実さが飛躍的に向上、Mythosに並ぶアライメント性能を実現

Anthropicは、AIモデルの最新版「Claude Opus 4.8」の一般提供を開始した。前世代から推論やコーディング能力を向上させ、自らの作業の不確実性に対する「誠実さ」が劇的に改善した。また、数百のサブエージェントを並行して走らせる新機能「dynamic workfl…

2026-05-29 00:36 JSTTechCrunch AILLM/生成AI

How long is Anthropic’s lease with SpaceX? Opinions vary

Elon Musk is publicly reframing xAI’s massive Anthropic compute deal as short-term and cancellable, despite SpaceX’s own S-1 filing describ…

2026-05-29 00:35 JSTTechCrunch AIエージェント

Sesame, the conversational AI startup from Oculus founders, launches its iOS app

Sesame’s new iOS app brings its conversational AI agents to the public, offering more natural back-and-forth interactions designed to feel…

2026-05-28(552件)

2026-05-28 23:45 JSTTechCrunch AILLM/生成AI

Sneak peek at new Siri app reveals Apple’s plans to take on ChatGPT and more

New renders offer a closer look at Apple’s planned AI overhaul for iOS 27, including a redesigned Siri experience and standalone Siri app.

2026-05-28 23:30 JSTTechCrunch AIその他

RSI is the new AGI — and it’s just as hard to pin down

A new crop of AI labs are focused on recursive self-improvement — but the goal is proving elusive.

2026-05-28 23:30 JSTTechCrunch AIその他

At TechCrunch Disrupt 2026: Databricks’ co-founder on what kills enterprise AI deals

Enterprise AI is entering a different phase now, one where enterprises are no longer evaluating whether AI is exciting. They are evaluating…

2026-05-28 23:28 JSTTechCrunch AIその他

YouTube adds new podcast features, including an AI recommendation tool and ‘Auto speed’

The update signals YouTube's ongoing efforts to compete with other platforms for podcast audiences.

2026-05-28 23:00 JSTTechCrunch AIその他

2 days left: Lock in ticket savings of up to $410 to TechCrunch Disrupt 2026

Savings of up to $410 on TechCrunch Disrupt 2026 tickets end tomorrow, May 29, 11:59 p.m. PT. Register now to save and join 10,000+ tech le…

2026-05-28 23:00 JSTTechCrunch AIエージェント

Visa invests in Replit to power agentic payments for developers

Visa said that over 1,000 employees have been using Replit for prototyping and development.

2026-05-28 22:00 JSTTechCrunch AIハードウェア/半導体

Has the hunt for AI compute uncovered the next Cerebras?

General Compute is betting SambaNova will be the next breakout chipmaker.

2026-05-28 21:00 JSTOpenAIエージェント

How Endava builds an agentic organization with Codex

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks…

2026-05-28 19:25 JSTITmedia AI+ハードウェア/半導体

レノボ、国内に“水冷AIインフラ”の検証施設 GPUサーバ需要増で水冷活用促す

レノボ・ジャパンが水冷技術を活用したAIインフラの検証施設「Neptuneラボ」を新設した。レノボの冷却技術を使う顧客やパートナー企業に対し、本番に近い検証・PoC環境として提供する。クラウドベンダーやSIerとの共同検証を通し、推奨される機器構成などの策定にも役立てる。レノボ…

2026-05-28 16:00 JSTTechCrunch AIエージェント

Vertu wants CEOs to run companies from an AI foldable starting at $6,880

Built on top of the open source Hermes project, Vertu's new foldable combines AI-agent workflows, enterprise integrations, and ultra-premiu…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

As intelligent systems become more autonomous, the scientific community focuses on creating decision-making mechanisms that include ethical…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Soro: A Lightweight Foundation Model and Chatbot for Tajik

We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight c…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

On the Origin of Synthetic Information by Means of Steganographic Inheritance

The origin of species has been the mystery of mysteries in natural science. By analogy, the origin of synthetic information, we suggest, is…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Progress in neural combinatorial optimization for Dynamic Flexible Job Shop Scheduling Problem (DFJSP) is currently hindered by a methodolo…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Causal discovery is a cornerstone of scientific reasoning, yet whether large language models can perform it reliably remains an open questi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

RULER: Representation-Level Verification of Machine Unlearning

Machine unlearning aims to remove the influence of specific training records from a deployed model without retraining from scratch. Current…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Parallel LLM test-time scaling techniques (e.g., best-of-$N$) require drawing $N>1$ sequences conditioned on the same input prompt. These m…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

Modern analytics systems are fundamentally reactive, requiring users to define queries over increasingly complex and continuously evolving…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

As organizations move toward production deployments of AI agents, which execute non-deterministic workflows, maintain stateful sessions, an…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

A central puzzle for the behavioural sciences and for human-facing artificial intelligence is the persistence of within-person variability.…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Cyberbullying Governance on Social Media: A Unified Framework from Content Identification to Intervention

The proliferation of social media platforms and online communities has inadvertently catalyzed the spread of cyberbullying, hate speech, an…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Voluntary Collusion with Secret Tools in Competing LLM Agents

Even when a tool is explicitly described as unfair and harmful to others, ostensibly safety-aligned LLM agents still voluntarily engage in…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Laguna M.1/XS.2 Technical Report

We present Laguna M.1 and Laguna XS.2, two Mixture-of-Experts foundation models built for long-horizon, agentic coding: M.1 has $225.8$B to…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Reasoning and Planning with Dynamically Changing Norms

To safely interact with humans, AI agents must both know our norms and consider them during planning. However, such norm-guided planning ha…

2026-05-28 13:00 JSTarXiv cs.AIエージェントロボティクス

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Behavioural Analysis of Alignment Faking

Alignment faking (AF) refers to a model strategically complying with a training objective to avoid behavioural modification while preservin…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Cross-Entropy Games and Frost Training

We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called C…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

Large Language Models are increasingly deployed inside agentic systems, where they must follow structured protocols, adapt to evolving stat…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

Misalignment between claims and their cited evidence is a common failure mode in reports generated by large language models, limiting their…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking

Long reasoning traces need reliability estimates before final answers are known. We study prefix-conditioned eventual-success estimation, $…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

A Policy-Driven Runtime Layer for Agentic LLM Serving

Multi-agent LLM systems have become the dominant production workload, but the serving stack was not built for them. The agent framework abo…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Agent skills provide a lightweight way to adapt LLM agents to specialized domains by storing reusable procedural knowledge in structured fi…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environ…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Auditable Decision Models with Learned Abstention and Real-Time Steering

Production AI systems often operate with incomplete, conflicting, or insufficient evidence. Forced classifiers collapse such cases into act…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

LLM agents are governed by long-lived natural-language prompt policies, but individually reasonable standing rules can interact in uninspec…

2026-05-28 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

A Query Engine for the Agents

The fastest-growing data in production today is unstructured text: agent traces, chat logs, reasoning chains, model outputs. People want to…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test

Retrieval-augmented generation (RAG) systems are often compared by asking a large language model (LLM) judge which answer is better. For mu…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

GraD-IBD: Graph Representation Learning from Diagnosis Trajectories for Early Detection of Inflammatory Bowel Disease

International Classification of Diseases (ICD) is a globally recognized coding system that records diagnostic events during each patient en…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Constrained Auto-Bidding via Generative Response Modeling

Auto-bidding systems aim to maximize advertiser value over long horizons under budget constraints and ratio targets such as cost-per-acquis…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

As AI agents increasingly operate in open, real-world environments, they require a deep synergy of multimodal perception, tool invocation w…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Revealing Algorithmic Deductive Circuits for Logical Reasoning

Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic r…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems

AI governance frameworks increasingly emphasize fairness, transparency, accountability, and lifecycle risk management in high-stakes domain…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

Large Reasoning Models are typically trained via reinforcement learning from verifiable rewards (RLVR). However, existing approaches adopt…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

TCP-MCP: Landscape-Guided Co-Evolution of Prompts and Communication Topologies for Multi-Agent Systems

Effective multi-agent systems cannot be designed by selecting prompts or communication graphs in isolation. Agent behavior depends on the i…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

Safety benchmark scores provide incomplete evidence of deployment readiness: aligned language models often adhere to rigid rules even when…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents

We present MolLingo, a multi-agent system that emulates the reasoning process of a chemist to automate molecular design. Existing LLM-based…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning

Retrieval-augmented generation combined with reinforcement learning has shown promise for grounding large language models in trustworthy me…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research

Large language models (LLMs) are increasingly applied in finance, yet most existing work emphasizes trading signals or financial NLP tasks…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

AI models underpin data-centric applications from image and text processing to scientific discovery in biology, physics, and chemistry. Yet…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Towards Faithful Agentic XAI: A Verification Method and an Open-World Benchmark for Better Model Faithfulness

Explainable AI (XAI) helps users interpret model behavior and identify potential faults. Agentic XAI systems use Large Language Models (LLM…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

LLMs have shown strong performance across diverse financial tasks, yet portfolio management (PM), a critical financial decision-making task…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

A Unified Framework for the Evaluation of LLM Agentic Capabilities

As LLMs are increasingly deployed as agents, reliable assessment of their agentic capabilities has become essential. However, reported benc…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment

Structured skill prompts improve exploration in long-horizon agentic reinforcement learning (RL). Skill-augmented RL methods retain externa…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Dr-CiK: A Testbed for Foresight-Driven Agents

Time series forecasting in real-world settings often depends not only on historical observations, but also on external context that must be…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

Multimodal Large Reasoning Models introduce the reasoning paradigm, demonstrating strong capabilities on complex vision-language tasks. How…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

Suicide is a critical global public health challenge, causing approximately 720,000 deaths each year and calling for timely, effective prev…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Show, Don't TELL: Explainable AI-Generated Text Detection

Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high i…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

LLM agents are increasingly deployed as executable systems that use tools, modify workspaces, and produce concrete artifacts. In such workf…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

DiagramRAG: A Lightweight Framework to Retrieve Scientific Diagram for Figure Generation

Scientific diagrams are essential for communicating complex methodologies in academic papers. A natural way for researchers to specify such…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Recent mechanistic studies suggest that large language models (LLMs) may utilize their depth inefficiently in standard single-turn tasks. W…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection

With rapid advances in audio-visual generative models, reliable forgery detection becomes increasingly critical. Existing methods for audio…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Shape of Overthinking: Backtracking Bursts in Long Reasoning Traces

Reasoning models often generate long traces in which useful self-correction and unproductive revision are hard to distinguish. We study thi…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

While large language models (LLMs) are trained purely on textual data, prior work has shown that their internal representations can exhibit…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

STAB: Specification-driven Testing for Algorithmic Bottlenecks

Evaluating the efficiency of algorithmic code requires test cases that expose runtime bottlenecks. Previous methods generate efficiency tes…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing ev…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure

Single-axis mitigations of reward-model biases (e.g., reducing proxy reliance on length, sycophancy, or style) can rotate optimization pres…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

An Empirical Audit of k-NAF Budget Accounting for Anchored Decoding

We empirically audit the k-NAF budget-accounting mechanism in Anchored Decoding using (i) a fixed, class-stratified workload (approximately…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Large language models (LLMs) can now solve complex problems through long chain-of-thought (CoT) reasoning, but the trade-off between perfor…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated s…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

MIRA: A Bilingual Benchmark for Medical Information Response Audit

Large language models (LLMs) are increasingly used to provide public-facing health information, yet existing safety evaluations overlook wh…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

Large Language Models are increasingly applied in the petroleum industry, highlighting the need for a domain-specific evaluation framework.…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings

Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic spars…

2026-05-28 13:00 JSTarXiv cs.AI画像/動画生成

MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation

In recent years, Multi-Talker Audio-Video Generation (MTAVG) models have shown promising performance on fundamental metrics such as lip-syn…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Cited RAG evaluation often treats visible sources as a grounding signal, but a real, topically relevant citation can still under-warrant th…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

Existing agent memory systems universally follow what we term a Memory-as-Tool paradigm where a single query triggers one-shot retrieval of…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

Verifiable Benchmarking of Long-Horizon Spatial Biology

AI agents are increasingly useful for biological data analysis, but existing benchmarks mostly test broad biological knowledge, executable…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts, which necessitate server-…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Adaptive context compression is vital for scaling Large Language Models (LLMs) to complex, multi-turn agent tasks. However, rule-based comp…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

We highlight a failure mode of large reasoning models on questions with insufficient information: models may recognize that a problem is un…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

MACReD: A Multi-Agent Collaborative Reasoning Framework for Reaction Diagram Parsing

Parsing chemical reaction diagrams from scientific literature is challenging due to heterogeneous layouts, intertwined visual elements, and…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

BuddyBench: A Privacy-Constrained Multi-Task Benchmark for Pediatric Social-Communication Personalization

BuddyBench introduces a privacy-constrained multi-task benchmark for pediatric social-communication personalization. Unlike existing neurod…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems

Multi-agent systems are increasingly deployed to support various tasks where agents interact to achieve individual and collective objective…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Large language models trained with Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI exhibit persistent behavioral pa…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative dec…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Human-like in-group bias in instruction-tuned language model agents

As autonomous AI agents are deployed in persistent, interacting networks -- coordinating tasks, routing resources, and accumulating reputat…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models

Vision-Language Models (VLMs) face severe memory and latency bottlenecks due to high-resolution visual tokens. While current token reductio…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Gradient Step Plug-and-Play Model for Dental Cone-Beam CT Reconstruction

The goal of this work is to reduce the effect of photon noise in dental cone-beam CT reconstruction. We consider an inverse problem formula…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Do Clinical Models Change Treatment Decisions?

Clinical foundation models are evaluated with factual or exam-style medical QA, but treatment decisions must change when patient context ch…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Building competitive automatic speech recognition (ASR) models usually requires large-scale au- dio supervision, which makes reproduction a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

LLMs have shown remarkable proficiency in general language understanding and reasoning. However, they consistently underperform in spatial…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Adaptive Reservoir Computing for Multi-Scenario Chaotic System Forecasting

We present an adaptive reservoir computing framework for the CTF-4-Science Lorenz benchmark, which evaluates machine learning models across…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents

Large language model (LLM) agents are increasingly used to assist with operations research (OR) modeling, yet existing OR-oriented benchmar…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

Existing multimodal reasoning approaches predominantly follow two paradigms: converting visual inputs into text prior to reasoning, or perf…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

OccuReward: LLM-Guided Occupant-Centric Reward Shaping for Demographic Equity in Grid-Interactive Buildings

Large language models (LLMs) have demonstrated promising capability in generating reward functions for deep reinforcement learning (DRL)-ba…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values

As large language models (LLMs) are increasingly integrated into high-stakes decision-making, the ability to reliably quantify uncertainty…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Agentic Active Omni-Modal Perception for Multi-Hop Audio-Visual Reasoning

Multi-hop audio-visual reasoning remains challenging for Omni-LLMs, as relevant evidence is often sparse, temporally dispersed, and distrib…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Large Language Model (LLM) agents remain vulnerable to safety threats from the external environment, where attackers inject adversarial con…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

The Illusion of Opting in AI-Mediated Consequential Decisions

Drawing on Ullmann-Margalit's concept of opting (transformative, irrevocable, and shadowed by foreclosed alternatives), we show that curren…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントハードウェア/半導体

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

LLM-based agents are increasingly used to generate GPU kernels, but they often know what optimizations to try without knowing when those op…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers

In-context learning (ICL) enables multimodal large language models (MLLMs) to classify images from a few labelled examples. Yet, how these…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

Multi-trajectory inference for tool-use LLM agents - generating multiple reasoning attempts and selecting among them - benefits from transf…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

PIRS: Physics-Informed Reward Shaping for SAC-Based Building Energy Management

Occupant comfort and grid-aware energy efficiency are competing objectives whose joint optimization depends critically on how reward functi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

AI systems are fallible, and humans can make mistakes in deciding whether to trust AI over their own judgment. Thus, improving human-AI col…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

Large Language Models (LLMs) often generate factually incorrect outputs, commonly termed hallucinations, that undermine trust and limit dep…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

The Policy-Space Response Oracles (PSRO) framework scales equilibrium computation to large zero-sum games by iteratively expanding a restri…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Whether large language models (LLMs) construct internal spatial world models from pure-text descriptions remains contested, and whether suc…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

ResearchLoop: An Evidence-Gated Control Plane for AI-Assisted Research

AI-assisted research compresses ideation, implementation, evaluation, and manuscript writing into a single interactive loop. This compressi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) trains reasoning models without labeled trajectories, relying on grouped rollouts to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

REED: Post-Training Representation Editing for Cross-Domain Linguistic Steganalysis

In real-world scenarios of linguistic steganalysis, tested texts usually come from unseen domains with different vocabularies, topics, writ…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation

Chain-of-thought (CoT) distillation trains a smaller model to imitate a teacher's reasoning trace, but it is typically evaluated by final-a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

While Knowledge Editing (KE) enables efficient updates, its dominant Static Fact Overwriting paradigm treats LLMs as discrete databases, fo…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

An Enhanced Large Neighborhood Search Approach for the Capacitated Facility Location Problem with Incompatible Customers

A new variant of the classic capacitated facility location problem, which considers incompatibilities between customers, has recently been…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Large language models(LLMs) increasingly match expert performance on licensing examinations, yet routine clinical use remains limited becau…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains

Progress in Prognostics and Health Management (PHM) is hindered by the lack of standardized and reusable evaluation practices across tasks,…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

FedMPT: Federated Multi-label Prompt Tuning of Vision-Language Models

Multi-Label Recognition (MLR) based on Vision-Language Models (VLMs) aims to leverage their pre-trained knowledge to better adapt complex r…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Plan Before Search: Search Agents Need Plan

Training large language models as retrieval-augmented reasoning agents typically combines reinforcement learning with an SFT cold start dis…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Automatic prompt optimization (APO) has driven significant gains in LLM-based agentic workflows. However, existing methods treat each task'…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

Lean is increasingly used to judge natural-language mathematical answers, but its signal is partial: many answers never formalize, and a fa…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict

E-commerce platforms have begun recruiting crowdsourced jurors to adjudicate massive volumes of transaction disputes. Unlike formal legal j…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

Industrial Prognostics and Health Management (PHM) provides a representative case study for a broader challenge in applied machine learning…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs

Reinforcement Learning with Verifiable Reward (RLVR) is empirically shown to notably enhance the reasoning performance of large language mo…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Test-time skill evolving is regarded as a new paradigm for enhancing deployed agentic systems. Existing works mainly focus on hard-coded sk…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Hybrid-reasoning large language models (LLMs) expose explicit controls over reasoning effort, allowing users or systems to trade off answer…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Measuring Progress Toward AGI: A Cognitive Framework

Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims,…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

Post-training using online reinforcement learning (RL) is an important training step for LLMs, including code-generating models. However, o…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depe…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

GONDOR to the Rescue: Satisficing Planning with Low Memory

Greedy Best-First Search (GBFS) is the dominant approach for solving search problems where the goal can be estimated with a heuristic, such…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Diffusion Large Language Models for Visual Speech Recognition

Existing Visual Speech Recognition (VSR) systems commonly rely on left-to-right autoregressive decoding, which can force premature decision…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

Linking learning resources to a structured competency framework is key to enabling competency-based search and curriculum analytics in Lear…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ProvMind: Provenance-grounded reasoning for materials synthesis

Materials process optimization requires reasoning over routes, conditions, tools and causal dependencies, yet most computational formulatio…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

Existing AI evaluation practices often fail to capture how systems actually perform in low-resource environments, where operational constra…

2026-05-28 13:00 JSTarXiv cs.AI規制/政策

GS-FUSE: Granger-Supervised Gated Fusion and Multi-Granularity Alignment for Event-Driven Financial Forecasting

Accurately forecasting the impact of salient financial events on markets is critical for investors and policymakers. However, existing mult…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Let Relations Speak: An End-to-End LLM-GNN Soft Prompt Framework for Fraud Detection

In recent years, Large Language Models (LLMs) have shown great capability in processing graph tasks such as fraud detection. However, most…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Entropy-aware Masking for Masked Language Modeling

Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents

Tool-using agents often incur substantial computational cost due to long reasoning chains and iterative tool usage. In practical scenarios,…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Cultural Binding Heads in Language Models

LLMs often default to equal treatment across cultural groups, even though context warrants differentiation: this is a lack of difference aw…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

As automated vehicles (AVs) increasingly share roadways with human-driven vehicles (HDVs), understanding how pedestrians respond to differe…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear prob…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

As agent capabilities advance, existing benchmarks, such as $\tau^2$-Bench, are becoming increasingly saturated. Yet constructing new bench…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, yet their standard generation process -- auto-regressive…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

A Conflict-Aware Penalty and Statistical Loss Framework for Balancing Modalities and Enhancing Stability in Multimodal Sentiment Analysis

Multimodal Sentiment Analysis (MSA) fuses text, acoustic, and visual streams to infer sentiment. Because pre-trained text encoders are far…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Continual Model Routing in Evolving Model Hubs

AI model hubs provide access to a rapidly growing collection of powerful pre-trained models, enabling off-the-shelf mixture-of-experts syst…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

Large language models (LLMs) have recently advanced text-driven 3D generation, yet Text-to-CAD remains far from supporting industrial produ…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Large language models (LLMs) are increasingly used for tasks that implicitly reduce to Boolean satisfiability (SAT), yet their reasoning ab…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution

Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

LACUNA: Safe Agents as Recursive Program Holes

LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes.…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Multimodal large language models (MLLMs) have demonstrated significant potential for speech-to-text translation (S2TT). However, existing d…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

The Ethics of LLM Sandbox and Persona Dynamics

It is well known that LLM guardrails and trained persona dynamics can produce a reality gap: the distance between the world a LLM is permit…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

Scientific research proceeds through iterative cycles of hypothesis generation, experiment design, execution, and revision. AI agents can a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

In modern industry, dynamic environments and the complexity of modular and reconfigurable resources require automated planning of process s…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Speculative reasoning has recently been proposed as a means to accelerate reasoning-intensive generation in large multimodal models, but it…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Existing benchmarks have laid the foundation for travel planning agents by establishing API-centric paradigms. However, as the capabilities…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Large language models increasingly rely on either reinforcement learning or multi-agent prompting to improve reasoning, yet these two parad…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

The GSM-Symbolic benchmark (Mirzadeh et al., 2025) reported consistent performance drops across 25 Large Language Models (LLMs) when tested…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI

Critical decision-making in socially consequential spaces is increasingly involving AI systems at varying capacities. Yet, despite the ubiq…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Context compression aims to shorten long context inputs with minimal information loss for LLM inference acceleration. While existing method…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol

Modern datacenter RDMA is bottlenecked at the network interface, not the wire. A NIC running RoCE or InfiniBand holds per-connection state…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Are LLM-based search agents genuinely searching, or using the web to verify what they already know? We study this question on BrowseComp wi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Multi-Adapter Representation Interventions via Energy Calibration

Representation intervention has emerged as a promising paradigm for aligning large language models toward desired behaviors without modifyi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

AlphaTransit: Learning to Design City-scale Transit Routes

Designing a transit network requires many sequential route extension decisions, but their quality is often visible only after the full netw…

2026-05-28 13:00 JSTarXiv cs.AI画像/動画生成

Utility-Aware Multimodal Contrastive Learning for Product Image Generation

Product images strongly influence consumer decision-making in online marketplaces. Empowered by multimodal contrastive learning, generative…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-pa…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Interactive 3D assets used in games and simulation are typically decomposed into specific semantic parts to support animation, physics, and…

2026-05-28 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks

Vast quantities of compute (GPU cycles on personal workstations, idle inference servers, and edge devices between jobs) go unused because n…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

Electroencephalography (EEG) is a critical, non-invasive method to monitor electrical brain activity. EEGs can span anywhere from a couple…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Calibrating Conservatism for Scalable Oversight

Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

Physics-informed neural networks (PINNs) have attracted considerable attention for their ability to integrate partial differential equation…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

While prompt-based text-to-speech (TTS) models enable natural language-driven speaking style control, they often provide limited fine-grain…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

We present RAG-Coding, an agentic method for automated ICD-10-CM coding. RAG-Coding orchestrates four large language model (LLM) agents and…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supp…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Computational Boundary of Inference: Capability Internalization, Training, and the Turing Jump

Claims about recursive self-improvement in AI often slide from repeated internal revision to the possibility of qualitatively stronger capa…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

The Alignment Floor: When Persona Customization Is Safe

A key promise of pluralistic AI is behavioral adaptation: persona prompts like "be creative" or "be thorough" let systems respect diverse u…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Spoken Language Models (SLMs) have emerged as a promising paradigm for speech synthesis by bypassing explicit grapheme-to-phoneme pipelines…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

From Instructor to Collaborator: What a 90-Participant Study Reveals about Human-Agent Collaboration in a Mobile Serious Game

This position paper reflects empirical data collected during my PhD from a large-scale within-subjects study (N = 90). The study compared a…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-tr…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Large language models (LLMs) are increasingly utilized as proxies for computational social analysis; yet, their ability to faithfully repre…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Memory-Based vs. Context-Only Conditioning Produces Distinct Behavioral Patterns in Stateful Personalization

We study how conditioning context shapes personalization behavior in a teacher-facing educational recommender system. We compare contextual…

2026-05-28 13:00 JSTarXiv cs.AIハードウェア/半導体

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget

Speculative decoding accelerates Large Language Model inference via a draft-then-verify paradigm, yet the output projection layer becomes a…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Learning after COVID-19 and the ICT career aspirations: Are students entering the AI era with weaker skills?

This paper examines whether students are entering the generative AI era with sufficiently strong educational foundations, focusing on the r…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

Large language models (LLMs) can generate fluent dialogue, but prior works lack situational grounding, dynamic strategy control, and evalua…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Human-AI Collaboration for Estimating Scientific Replicability

Determining whether published scientific findings can successfully be replicated is a long-standing challenge in the empirical sciences. Ex…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Informing AI Policy Assessment using Large-Scale Simulation of Interventions

As the rapid proliferation of AI systems and harms spurs efforts in AI governance around the world, prioritizing among competing policy opt…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Agentic Literacy Debt: A Structural Problem the AI Literacy Field Has Not Yet Named

Autonomous AI agents now plan, decide, and act on behalf of users across healthcare, financial services, and workplace contexts, often with…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Short-Term Gain, Long-Term Fragility: AI Labor Substitution and the Erosion of Sustainable Capability

What looks like acceleration can be a quiet transfer of burden from the present to the future. Attempts to replace human labor with AI syst…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Mathematical Modelling of Ethical AI Use in Higher Education: A Coordination Game Framework for Future-Facing Learning

The rapid uptake of generative artificial intelligence (AI) in higher education is reshaping assessment practices and intensifying concerns…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Using Zero-Shot LLM-Generated Survey Data for Geographically Explicit Population Synthesis

There is a growing interest in utilizing synthetic populations for a diverse range of applications. At the same time, we are witnessing a t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

REC-CBM: Rubric-Aware Error-Correction Concept Bottleneck Models for Trustworthy Open-Ended Grading

Open-ended grading is central to equitable and personalized education, yet manual grading remains time-consuming and costly, underscoring t…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

LLM-assisted sentiment analysis for integrated computational and qualitative mixed methods education research: A case study of students' written reflection assignments

Written reflection assignments give students valuable opportunities for critical self-assessment, meaning making, and learning processing.…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Smaller, Younger, and More Impactful: How AI-Assisted Writing Transforms Research Teams

The era of Big Science has long been defined by increasingly large and specialized research teams pushing the frontiers of knowledge. Howev…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Benchmarking Fairness in Spiking Neural Networks: Data Bias, Spurious Features, and Hardware Effects

Evaluating fairness in Spiking Neural Networks (SNNs) demands rigorous benchmarks that reflect real-world complexities, yet existing assess…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

STARS: Spike Tail-Aware Relational Synthesis for ANN-to-SNN Data-Free Knowledge Distillation

SNNs promise energy-efficient and low-latency inference, but their performance still trails that of ANNs. ANN-to-SNN knowledge distillation…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients

Spiking Neural Networks (SNNs) have emerged with promising energy-efficient property, yet a substantial performance gap persists compared t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Proteins perform their biological functions through three-dimensional structures encoded by amino acid sequences, and ligand-binding protei…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Can Quantum Federated Learning Withstand Circuit-Level Backdoors?

Quantum Federated Learning (QFL) inherits the core vulnerability of federated optimization to malicious clients, while also introducing an…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Quantum Machine Learning-based 6G edge Network: Enabling Adaptive Communication and Model Aggregation

With the advent of sixth-generation (6G) mobile communication technology, vehicle-to-everything (V2X) communication faces unprecedented cha…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Ocean4Rec: Offline LLM-Derived OCEAN Profiles for Request-Time VOD Reranking

Industrial video-on-demand (VOD) recommenders need richer content understanding, but LLM-as-reranker designs repeat prompt construction, to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

Mixture-of-Experts (MoE) presents a naturally compatible and scalable framework for multimodal learning, demonstrating strong adaptability…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

FD-RAG: Federated Dual-System Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) has emerged as a paradigm for grounding large language models in external knowledge, yet most existing…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Heterogeneous Multi-Agent Modeling for Measurement and Network Analysis of the Data Service Market

With the increasing complexity of collaboration among various social entities and user demands, the factors affecting the stable developmen…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

When NPUs Are Not Always Faster: A Stage-Level Analysis of Mobile LLM Inference

Deploying large language models (LLMs) on mobile devices increasingly relies on heterogeneous execution, yet no prior study has systematica…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?

Multimodal alignment is critical for bridging the semantic gap in information retrieval. However, traditional pairwise strategies introduce…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MGRetrieval: Memory-Guided Reflective Retrieval for Long-Term Dialogue Agents

Large Language Models (LLMs) have made significant progress in dialogue, yet redundant memory contexts severely limit their effectiveness i…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Prominence-Stratified Failure Modes in Retrieval-Augmented Commercial Recommendation: A 37,000-Run Audit

AI assistants like ChatGPT and Claude are recommendation engines, not search engines: they answer commercial queries by directly nominating…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Paraphrase Brittleness in Production Retrieval-Augmented Commercial Recommendation: Reproducibility Below the Rerun-Stability Baseline

Small changes to how a buyer phrases a question -- "best CRM" vs "top CRM" vs "best CRM for a SaaS startup" -- produce substantially differ…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

A Systematic Evaluation of Retrieval-Augmented Generation and Language Models for Space Operations

The rapid expansion of space activities has led to an unprecedented accumulation of technical documentation, operational guidelines, and sc…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

RAGe: A Retrieval-Augmented Generation Evaluation Framework

Deploying Large Language Model (LLM) applications, particularly those relying on Retrieval-Augmented Generation (RAG), remains challenging…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Checking Fact with Better Retrieval: Dynamic Contrastive Learning for Evidence Retrieval

In the field of multimodal fact checking, the accuracy of retrieving evidence from different modalities has a significant impact on the dow…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

Transformer has significantly propelled the development of artificial intelligence, and certainly the development of agents as well. We cat…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

When prompt perturbations break your A/B test: A valid statistical test for generative surveying

Generative surveying -- where collections of LLM-based personas provide feedback on messages -- has emerged as a cheap and scalable alterna…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

AR smart glasses need continuous behavioral context to offer proactive assistance, yet their most practical always-on sensor, the head-moun…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

The quadratic cost of self-attention in Vision Transformers (ViTs) constitutes a fundamental bottleneck for practical deployment, motivatin…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility

Traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) units operate on discrete time steps, often failing to captu…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

Continual Learning (CL) is a practical paradigm to utilize power of deep pre-trained neural networks, but which pre-trained model has a bet…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

Graph anomaly detection aims to identify anomaly nodes in attributed graphs and plays an important role in real-world applications. However…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVer…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

HEAL: Resilient and Self-* Hub-based Learning

Decentralized learning enhances privacy, scalability, and fault tolerance by distributing data and computation across nodes. A popular appr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective

We characterize the pre-softmax attention matrix $\mathbf{QK^\top}$ in transformers as an associative memory matrix encoding pairwise assoc…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Resource-Constrained Affect Modelling via Variance Regularisation Pruning

Affective computing systems are increasingly embedded in pervasive and interactive environments, such as adaptive games, assistive technolo…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving

Large language model (LLM) serving creates environmental impacts beyond carbon and water, including ecosystem damage through biodiversity-r…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Energy-Structured Low-Rank Adaptation for Continual Learning

While orthogonal subspace methods try to mitigate task interference in Continual Learning (CL), they often suffer from energy diffusion acr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Debate Helps Weak Judges Reward Stronger Models

Despite theoretical promise, debate as a scalable oversight protocol has produced mixed empirical results: gains in some settings, and null…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Diffusion-Based Ukrainian Handwritten Text Generation with Cross-Domain Style Transfer

Handwritten text generation (HTG) conditioned on writer style has been widely studied for Latin scripts, but remains underexplored for low-…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

Agentic systems increasingly run user-authored orchestration code that invokes tools, spawns subtasks, and delegates work across machines a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves inte…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodolo…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Pre…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Detection Without Correction: A Two-Parameter Decomposition of Multi-Stage LLM Pipelines

Multi-stage LLM pipelines that perform multi-agent debate, intrinsic self-correction, or retrieval-augmented verification exhibit puzzling…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

Introduction. Early detection of malignant skin lesions is critical for prognosis, yet dermatologist shortages in Russian regions limit scr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note

This short note presents a dimension-independent subgaussian concentration bound for Gaussian vectors under coordinate-wise nonlinear mappi…

2026-05-28 13:00 JSTarXiv cs.AIハードウェア/半導体

The Future of Facts: Tracing the Factual Generation-Verification Gap

Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate the…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation Tasks

Large Language Models (LLMs) are being rapidly adopted in agricultural imaging applications, ranging from crop interpretation to synthetic…

2026-05-28 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Agentic AI workloads - where a single user goal triggers multi-step orchestration, tool calls, retries, and failure recovery - are being ta…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning

The rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation

Real-time anomaly segmentation demands both high recall and efficient low-precision inference. We study the three-way interaction of model…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

Learning representations that capture both intrinsic data geometry and target-relevant structure remains a fundamental challenge, particula…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression

We propose \textbf{Hurwitz Quaternion Multiplicative Quantization (HQMQ)}, a \textbf{calibration-free} method for KV cache compression of l…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

Generative translation systems are cultural technologies because they decide how socially meaningful cues are rendered within culturally sp…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Developing an Intelligent Job Recommendation System Using Semantic Retrieval and Explainable AI Techniques

Online recruitment platforms require recommendation methods capable of retrieving relevant job opportunities from large and heterogeneous c…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

Due to limited resources and public safety concerns, deep reinforcement learning (RL) agents for many cyber-physical systems (e.g., autonom…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

Equivariant neural networks encode geometric symmetries by construction, yet they are often difficult to optimize and can underperform less…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting

Probabilistic forecasting estimates the likelihood of uncertain future events. To improve LLM forecasting, existing methods typically learn…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

Cyber-Physical Systems (CPS) integrate sensing, communication, computation, and control to support critical infrastructure, including smart…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Transformers process images and videos by flattening space and time into long token sequences. While attention and KV caching preserve past…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス

Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning

Decentralized multi-robot motion planning requires each robot to generate collision-free trajectories from local observations, without glob…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text

Large language models (LLMs) are increasingly used to generate scientific reports, but they can produce references that appear plausible wh…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

Understanding what a user believes and intends is central to building effective agent assistants. This ability is often evaluated through T…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス

HumanoidMimicGen: Data Generation for Loco-Manipulation via Whole-Body Planning

Imitation learning is a promising approach for training humanoid robots to both walk and manipulate, but it requires a large number of demo…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Worker Disagreement Reveals Sharp Directions in Local SGD

Deep neural network training often exhibits highly anisotropic loss geometry, where a few sharp dominant Hessian directions coexist with a…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Mahalanobis PatchCore: Covariance-Aware and Streaming-Compatible Industrial Anomaly Detection

Industrial visual anomaly detection is usually one-class: normal images are abundant, while defects are rare, heterogeneous, and often unav…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

Recent work has shown that Vision-Language Models (VLMs) used for optical character recognition (OCR) can generate plausible but visually u…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

High-Fidelity Industrial Crash Dynamics Prediction via Geometry-Aware Operator Learning with Memory-Efficient Low-Rank Attention

Automotive crashworthiness optimization remains a safety-critical challenge, requiring the management of large-scale nonlinear structural d…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought

Recent segmentation models couple large language models (LLMs) with mask decoders to ground complex language expressions into masks, yet th…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning

Self-Distillation Policy Optimization (SDPO) provides dense token-level credit assignment for reinforcement learning with large language mo…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

UniMaia: Steering Chess Policies with Language for Human-like Play

Recent advances in large language models have enabled natural language to serve as a flexible interface for controlling complex systems, bu…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Do Models Know Why They Changed Their Mind? Interpretability and Faithfulness of Chain-of-Thought Under Knowledge Conflict

When a language model sees a document contradicting its training knowledge, it must choose: follow the document or trust itself. Prior work…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Locality-Aware Redundancy Pruning for LLM Depth Compression

Large language models are known to contain representational redundancy across network depth, making depth pruning an effective approach for…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

ChildEval: When large language models meet children's personalities

While LLMs enable personalized chatbots, their effectiveness in child-centered personalization remains unclear, as systematic evaluation of…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models

Text-to-image diffusion models generate images through an iterative denoising process, so internal neural layers produce trajectories of ac…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス

Turning Video Models into Generalist Robot Policies

Video generative models have emerged as a promising robotics backbone, capable of generating videos that depict the completion of complex t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions

Sparse autoencoders are usually trained one layer at a time, even though transformer residual stream activations are strongly coupled acros…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanis…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Symmetry Defeats Auditing

We demonstrate an attack on Introspection Adapters (Shenoy et al., 2026).

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation

Audio tokenizers are fundamental to unifying audio understanding and generation. Understanding requires high-level semantics, while generat…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Snippet-Driven Supply Chain Discovery with LLMs: Scaling Visibility in China

Financial and economic research often relies on structured supply-chain disclosures and commercial databases. In China, supplier--customer…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

Despite rapid progress in LLM-based code generation, existing models are predominantly trained on imperative languages, leaving functional…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Fine-Tuned LLM as a Complementary Predictor Improving Ads System

Recommendation systems power engagement and monetization across feeds, ads, and short-video platforms, but translating the latest advances…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification

Claim verification splits between end-to-end classifiers that are accurate but yields no inspectable traces, and decomposition-based method…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

From Detection to Mechanism: Cross-Attention Graph Neural Networks Enable Drug-Drug Interaction Type Prediction An Ablation Study with Acetylsalicylic Acid Validation

Predicting whether two drugs interact (binary detection) is a substantially dif- ferent task from predicting the mechanism type of that int…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

SPAR: Support-Preserving Action Rectification

Offline policy improvement faces an inherent conflict between maximizing value and fitting the data distribution. While in-sample weighted…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

LLM-based agents score well on search benchmarks, yet real users consistently find results unsatisfying, revealing a persistent evaluation-…

2026-05-28 13:00 JSTarXiv cs.AI画像/動画生成

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visu…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Chain-of-thought (CoT) monitoring has been proposed as a promising safety mechanism for detecting misaligned behavior in large language mod…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

Existing emotional support conversation (ESC) systems mainly rely on end-to-end response generation or coarse strategy supervision, offerin…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

Let the Results Speak: A Replication-First Paradigm for LLM Behavioral Benchmarking

Subjective evaluation of LLM behavior -- empathy, restraint, calibrated emotional tone -- is hard. Human inter-rater agreement on such qual…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study

The rapid growth of computer vision and increasingly complex image recognition tasks has exposed fundamental computational limitations of c…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?

Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poo…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning

Multimodal Large Language Models (MLLMs) have increasingly localized and interleaved visual evidence for deliberative reasoning. Grounding-…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teach…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Semantic Flow Regularization: Teaching LLMs to Generate Diverse Yet Coherent Responses

When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Periodic RoPE for Infinite Context LLMs

The ability to process ultra-long contexts is crucial for large language models (LLMs) to perform long-horizon tasks. While recent efforts…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

Speech language models (SpeechLMs) have achieved substantial progress by extending large language models (LLMs) to the speech modality. How…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Geometry-Correct Diffusion Posterior Sampling with Denoiser-Pullback Curvature Guidance and Manifold-Aligned Damping

Diffusion posterior sampling conditions diffusion priors on measurements, but data-consistency updates are typically scaled by hand-tuned g…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models

Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly retraining or…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Learning to Assign Prediction Tasks to Agents with Capacity Constraints

We address the problem of learning to assign prediction tasks to one agent from a set of available human or AI agents. In particular, we fo…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Tool Forge: A Validation-Carrying Toolchain for Governed Agentic Execution

Large language model agents are increasingly expected to perform operational work: calling APIs, manipulating files, assembling workflows,…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Integrated and Cross-Architecture Interpretation of LLM Reasoning

Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Learning Compositional Latent Structure with Vector Networks

Deep networks are powerful function approximators, but they typically store many different computations in shared weight matrices, making i…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models

Memory-augmented large language models extend reasoning beyond a fixed context window by maintaining long-term memory across interactions.…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning

Visual captioning requires models to capture visual content faithfully while minimizing both omission and hallucination. As the dominant pa…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection

Fine-tuning large language models often undermines their safety alignment, a problem further amplified by harmful fine-tuning attacks in wh…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts

Modern large language models (LLMs) achieve state-of-the-art machine translation performance, but they do so as broad generalists largely t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled d…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts

Audio generation has made significant progress, yet synthesizing unified audio where speech and sounds are naturally composited remains a c…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

Automatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually enc…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

PromptEmbedder:: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting

Large Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face signifi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

StoryLens: Preference-Aligned Story Rewriting via Context-Aware Narrative Enrichment

Story rewriting aims to adapt existing narratives to diverse reader preferences while preserving plot consistency and narrative coherence.…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy

We design a class of additive noise mechanisms that satisfy \((\varepsilon, \delta)\)-differential privacy (DP) for scalar, real-valued que…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter

Laughter is a complex social signal that conveys communicative intent beyond amusement. While prior work has focused on isolated laughter a…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Revisiting Change Detection Methods for their Application to Serac Fall Time-Lapse Monitoring

In an era where climate change aggravates environmental uncertainties, the identification and detection of event precursors are becoming cr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

Predicting spatially varying Room Impulse Response (RIR) from sparse observations is a critical but highly challenging inverse problem for…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose acti…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning

Graph-based Retrieval-Augmented Generation (GraphRAG) advances flat document retrieval by structuring knowledge as relational graphs, enabl…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents

A coding agent executes a benign task as a sequence of shell, file, and network actions, any of which can quietly exceed the authorized sco…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

DeltaMCP: Incremental Regeneration via Spec-Aware Transformation for MCP servers

The rapid development of LLMs coupled with the introduction of Model Context Protocol (MCP) has revolutionized how intelligent agents inter…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

DEPART: DEcomposing PARiTy across Multilingual LLMs

Multilingual Large Language Models (mLLMs) leaderboards report per-language accuracy but rarely explain why disparities emerge, leaving sys…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Performance and Explainability Requirements of Evolutionary Algorithms in Real-World Physics-Informed Optimization

Evolutionary computation offers a variety of tools to solve complex real-world optimization problems. However, research often focuses on sm…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

QuITE: Query-Based Irregular Time Series Embedding

Irregular Multivariate Time Series (IMTS) are common in practice, yet their irregular sampling complicates effective modeling. Existing app…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

Foundation models offer a promising route to transferable remote sensing representations, but many current approaches depend on very large…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

BenGER: Benchmarking LLM Systems on Subsumption-Based Legal Reasoning in German Law

We introduce the BenGER (Benchmark for German Law) dataset for evaluating LLM systems on subsumption-based legal reasoning in German law. T…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Visualizing Latent Phase Structures in Locomotion Policies: A Multi-Environment Study with Temporal Feature Extension

Deep reinforcement learning (DRL) has been shown to achieve high performance on locomotion control tasks in MuJoCo benchmarks such as HalfC…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Large language models (LLMs) are increasingly used as scholar recommenders, shaping who is seen as an expert in academia. Existing audits r…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guida…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer

With the rapid development of pre-training technologies, adapting large-scale Vision-Language Models (VLMs) for video understanding \emph{\…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for en- hancing LLM reasoning, yet its data ineffi- ciency…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

GUI Agents for Continual Game Generation

Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game gen…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

PrunePath: Towards Highly Structured Sparse Language Models

Feed-forward networks (FFNs) dominate the parameter count and computation of modern language models, yet existing pruning methods often str…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendat…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Modern large language model (LLM) inference has progressively disaggregated to keep pace with growing model sizes and tight TTFT and TPOT s…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning

Large Language Models (LLMs) often produce explicit reflective traces during complex reasoning, accompanied by anthropomorphic markers such…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Hybrid Neural World Models

Neural surrogates promise large speedups over classical solvers for physical dynamics but fail silently at sharp dynamical events such as s…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス研究/論文

Identifying Explicit Parsimonious Piece-wise Polynomial Relationships in Industrial time-series: Application to manipulator robots

This paper addresses the problem of identifying parsimonious explicit piece-wise polynomial relationships that might involve a relatively l…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Multi-Agent LLM-based Metamorphic Testing for REST APIs

As REST APIs become an increasingly significant part of software systems, their validation is becoming more critical. Hence, testing and un…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Learning the Error Patterns of Language Models

When generating outputs for domains with specific validity constraints (e.g., a program should compile), LLMs often fail in a small number…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Improving Evaluation of Recombination-based Cartesian Genetic Programming

Cartesian Genetic Programming has traditionally been using mutation as its main and often sole genetic operator to drive evolutionary searc…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Score Based Error Correcting Code Decoder

Error-correcting codes enable reliable communication, yet practical soft decoding remains challenging across code families and block length…

2026-05-28 13:00 JSTarXiv cs.AIロボティクス

CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras

Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation

On-policy distillation (OPD) transfers reasoning behavior by training a student on teacher feedback along student-generated trajectories, b…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs

Latent reasoning enables reasoning over continuous hidden states rather than explicit tokens, avoiding the language bottleneck and inferenc…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Anomaly as Non-Conformity via Training-Free Graph Laplacian Energy Minimization

Detecting subtle visual anomalies in images remains challenging, particularly when only normal samples are available a priori. Such unsuper…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Bayesian Gated Non-Negative Contrastive Learning

While Contrastive Learning (CL) has revolutionized self-supervised representation learning, its latent representations remain highly entang…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual Classifiers

Visual data from the Web power image classifiers, which often underpin many web services, such as recommendation and content moderation. Ho…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

Legal Judgment Prediction (LJP) has become a core benchmark for evaluating AI in the criminal legal domain, but it only sees criminal cases…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

The Decision to Verify: How Warmth and User Characteristics Shape Reliance on Conversational Agents for Information Search

Conversational artificial intelligence (AI) provides an efficient and convenient gateway to information access. However, it can cause overr…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertaint…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

Large language models (LLMs) for code completion and generation are increasingly used in software development, yet they may reproduce train…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Learning Theory of the SVRG: Generalization and Convergence Analysis

Variance reduction (VR) methods employ stochastic gradients with decreasing variance, and they have been widely applied to solve large-scal…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Do LLMs Favor Their Providers? Measuring Vertical Integration Bias in Code Generation

Large Language Models (LLMs) have become an integral part of software development, especially with the advent of agentic capabilities. Yet,…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Stochastic Gradient Descent with Momentum is Algorithmically Stable

Stochastic gradient descent with momentum (SGDM) is one of the most widely used optimization algorithms in machine learning. While optimiza…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

LLMs are increasingly used for software modernization, code translation, and database migration. However, LLM-based Oracle2PostgreSQL migra…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the le…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Verified Misguidance: Measuring Structural Citation Failures in Search-Augmented LLMs

Users of search-augmented LLMs rely on citations as evidence that responses are grounded in real sources, and rarely verify the cited pages…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression

Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Efficient Pre-Training of LLMs through Truncated SVD Layers

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and o…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving

Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Rei…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem

We analyzed 3,984 AI agent skills from major marketplaces and found 76 confirmed malicious payloads, including credential theft, backdoor i…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Models That Know How Evaluations Are Designed Score Safer

The validity of AI safety evaluations depends on models behaving consistently across controlled and deployment settings. Prior work has ide…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Thermodynamic properties of chemically disordered compounds via AI-driven estimation of partition function with the PULSE method

In this article, we present an improved version of the PULSE method (Partition function Unsupervised Learning Sampling and Evaluation) for…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

This position paper argues that the AI/ML community should stop overclaiming and retire the label "positive backdoor," and instead treat tr…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News

LLM-powered social agents are increasingly used to simulate online social behavior, yet their realism remains difficult to validate. Existi…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration

Irregular multivariate time series forecasting is critical in many real-world applications, where time series are irregularly sampled and e…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Mining Multi-Modality Spatio-Temporal Cues for Video Important Person Identification

Identifying key individuals in video scenes is essential for applications such as automated video editing and intelligent surveillance. Cur…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Measuring Form and Function in Language Models

We introduce quantitative metrics for child language acquisition to evaluate language models. Our focus is on the formal syntactic and func…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Attentional White Bear Effect in Transformer Language Models

Instruction-based suppression is widely used to prevent language models from generating prohibited content, yet it remains unclear whether…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Sense Representations Are Inducible Interfaces

Sense representations (explicit, per-token meaning decompositions) are useful for disambiguation, steering, and cross-lingual alignment, bu…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

AI in the Workplace: The Impact of AI on Perceived Job Decency and Meaningfulness

The proliferation of Artificial Intelligence (AI) in workplaces is transforming how we work. While existing research on human-AI collaborat…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images

Backpropagation is the core learning mechanism underlying deep learning. However, whether and how this algorithm is implemented in the brai…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Deep Learning Strain Estimation: Is Physics-Based Simulation the Solution?

Speckle tracking echocardiography (STE) is the clinical standard for myocardial strain estimation. Despite good performance on global strai…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

A Fresh Look at Lamarckian Evolution and the Baldwin Effect

Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic lite…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study

Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, yet most prior work focuses on English.…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

An Initial Public Offering (IPO) filing is a document released when a private firm goes public, allowing individual (retail) investors to p…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

BIRDNet: Mining and Encoding Boolean Implication Knowledge Graphs as Interpretable Deep Neural Networks

Tabular data in knowledge-rich domains often carries a latent prior in the form of Boolean implication relationships (BIRs) between pairs o…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text

As large language models are increasingly deployed for clinical text, ensuring they can reliably signal their own uncertainty becomes criti…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Preference-Shaped Expected Hypervolume and R2 Improvement: Exact Computation and Monotonicity

This paper studies preference-shaped expected improvement criteria for Bayesian multiobjective optimization. We consider two indicator fami…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extr…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Rethinking Memory as Continuously Evolving Connectivity

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelin…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval

In the era of autonomous agents, machine-actionable data is critical for data-driven workflows. For more than a decade, semantic metadata l…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for s…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

A primary bottleneck in contact-rich manipulation is the difficulty of collecting real-world data. Sim-to-real reinforcement learning offer…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Planning a Community Approach to Diabetes Care in Low- and Middle-Income Countries Using Optimization

Diabetes is a global health priority, especially in low- and-middle-income countries, where over 50% of premature deaths are attributed to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Tell Me a Story! Narrative-Driven XAI with Large Language Models

In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Apple Intelligence Foundation Language Models

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to ru…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Heterogeneous Causal Discovery of Repeated Undesirable Health Outcomes

Understanding the factors that trigger or prevent undesirable health outcomes across patient subpopulations is essential for designing targ…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Text-Only Data Synthesis for Vision Language Model Training

Training vision-language models (VLMs) typically requires large-scale, high-quality image-text pairs, but collecting or synthesizing such d…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Domain size asymptotics for Markov logic networks

A Markov logic network (MLN) $\mathbb{M}$ determines a probability distribution $\mathbb{P}_n^\mathbb{M}$ on the set $\mathbf{W}_n$ of stru…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

A Comparative Study of Rule-Based and Data-Driven Approaches in Industrial Monitoring

Industrial monitoring systems, especially when deployed in Industry 4.0 environments, are experiencing a shift in paradigm from traditional…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs

Large language models (LLMs) are typically trained by reinforcement learning (RL) with verifiable rewards (RLVR) and supervised fine-tuning…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MetaboT: An LLM-based Multi-Agent Frameworkfor Interactive Analysis of Mass SpectrometryMetabolomics Knowledge Graphs

Mass spectrometry-based metabolomics generates complex, high-dimensional data that holds vast potential for biological discovery but remain…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models

Evaluating the quality of reasoning traces from large language models remains understudied, labor-intensive, and unreliable: current practi…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

SynthTools: A Framework for Scaling Synthetic Tools for Agent Development

For agentic systems to use external tools to solve complex, long-horizon tasks, we need a large set of diverse and controllable tool-use en…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Guaranteed Optimal Compositional Explanations for Neurons

Compositional explanations are a family of methods that aim to describe the spatial alignment between neurons' receptive field activations…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Are frontier AI systems becoming more capable? Certainly. Yet such progress is not an unalloyed blessing but rather a Trojan horse: behind…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies

Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this question through the lens…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts

Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronic design au…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

How Much Can a Few Engine Moves Help? Quantifying Limited Cheating in Chess

Cheating in chess, by using advice from powerful software, has become a major problem, reaching the highest levels. As opposed to the large…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Multimodal Large Language Models (MLLMs) are a major focus of recent AI research. However, most prior work focuses on static image understa…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Aligning Language Model Benchmarks with Pairwise Preferences

Language model benchmarks are pervasive and computationally-efficient proxies for real-world performance. However, many recent works find t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains hig…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents t…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

COOP$^2$: Defining, Observing, and Repairing Cooperation in LLM Multi-Agent Systems

Many complex tasks require extended effort, diverse capabilities, or coordinated actions beyond what a single agent can provide. However, s…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage tex…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Towards automated data analysis: A guided framework for LLM-based risk estimation

Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines, a trend that raises the demand for robust…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders

While modern LLMs are aligned to refuse harmful requests, it is essential to understand the underlying mechanistic basis of this refusal be…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

FactReview: Evidence-Grounded Peer Review with Execution-Based Claim Verification

LLM-based reviewing systems typically take only the manuscript as input, leaving literature and code-based claims hard to verify. We presen…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Modern LLM agents increasingly rely on reusable skills, and as they interact with personal applications, web browsers, and other interfaces…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Text2Model: Modeling Copilots for Text-to-Model Translation

There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Towards Rigorous Explainability by Feature Attribution

For around a decade, non-symbolic methods have been the option of choice when explaining complex machine learning (ML) models. Unfortunatel…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoni…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Escher-Loop: Mutual Evolution by Closed-Loop Self-Referential Optimization

While recent autonomous agents demonstrate impressive capabilities, they predominantly rely on manually scripted workflows and handcrafted…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

Autonomous data analysis agents are increasingly expected to conduct exploratory analysis with limited human guidance about data. However,…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Can We Formally Verify Neural PDE Surrogates? SMT Compilation of Small Fourier Neural Operators

Fourier Neural Operators (FNOs) can greatly accelerate PDE simulation, but they are often used without formal guarantees that they preserve…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Verifiable Process Rewards for Agentic Reasoning

Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of large language models (LLMs), but most existi…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Optimal LTLf Synthesis

Strategy synthesis typically follows an all-or-nothing paradigm, returning unrealisable whenever a specification cannot be guaranteed in an…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. Ho…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequatel…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Who Uses AI? Platform Selection and the Measurement of Occupational AI Exposure

Conversation logs from AI platforms are increasingly used to measure occupational exposure to artificial intelligence, but the users observ…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation

Modern recommender systems rely heavily on ID-based collaborative filtering: each item is represented by a unique ID embedding that accumul…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

LLM agents are shaped not only by their language models, but also by the runtime harness that mediates observation, tool use, action execut…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Measuring Massive Multitask Chinese Understanding

The development of large-scale Chinese language models is flourishing, yet there is a lack of corresponding capability assessments. Therefo…

2026-05-28 13:00 JSTarXiv cs.AIエージェントロボティクス

DSSE: a drone swarm search environment

The Drone Swarm Search project is an environment, based on \textsc{PettingZoo}, that is to be used in conjunction with multi-agent (or sing…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Generalized Holographic Reduced Representations

Hyperdimensional Computing (HDC) is a computationally and data-efficient paradigm that acts as a bridge between connectionist and symbolic…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Sinc Kolmogorov-Arnold network and its application for solving PDEs with singularities

In this paper, we propose to use Sinc interpolation in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Revisiting Graph Autoencoders as Implicit Contrastive Learners

Graph autoencoders (GAEs) and graph contrastive learning (GCL) are two major paradigms for self-supervised representation learning on graph…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Isometry pursuit

Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalizatio…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Improving Requirements Classification with SMOTE-Tomek Preprocessing

This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

HEART: Achieving Timely Multi-Model Training for Vehicle-Edge-Cloud-Integrated Hierarchical Federated Learning

The rapid growth of AI-enabled Internet of Vehicles (IoV) calls for efficient Machine Learning (ML) solutions that can handle high vehicula…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring

Large language models (LLMs) are becoming increasingly capable, but the mechanisms of their thinking and decision-making processes remain u…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation

The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

GradientStabilizer:Fix the Norm, Not the Gradient

Training instability in modern deep learning systems is frequently triggered by rare but extreme gradient-norm spikes, which can induce ove…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Retrieval-augmented generation (RAG) has become a common practice in multimodal large language models (MLLM) to enhance factual grounding a…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used fo…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

In the age of increasingly realistic generative AI, robust deepfake detection is essential for mitigating fraud and disinformation. While m…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Manboformer: Learning Gaussian Representations via Spatial-temporal Attention Mechanism

Compared with voxel-based grid prediction, in the field of 3D semantic occupation prediction for autonomous driving, GaussianFormer propose…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models? A Bias-Controlled Study

3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite so…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

LiDDA: Data Driven Attribution at LinkedIn

Data Driven Attribution, which assigns conversion credits to marketing interactions based on causal patterns learned from data, is the foun…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Structured Agent Distillation for Large Language Model

Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

EVADE-Bench: Multimodal Benchmark for Evaluating and Enhancing Evasive Content Detection

E-commerce platforms increasingly rely on Large Language Models (LLMs) and Vision Language Models (VLMs) to detect illicit or misleading pr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

STFlow: Data-Coupled Flow Matching for Geometric Trajectory Simulation

Simulating trajectories of dynamical systems is a fundamental problem in a wide range of fields such as molecular dynamics, biochemistry, a…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference

Multi-device inference can reduce Transformer latency by parallelizing computation. However, existing methods require high inter-device ban…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding

Multimodal tables i.e. tabular layouts interleaved with charts, maps, icons, and color encodings are ubiquitous in real applications yet re…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering

Sparse Autoencoders (SAEs) are increasingly used to interpret foundation models, but their role as an actionable intervention space remains…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Understanding Automated Program Repair Agents Through the Lens of Traceability: An Empirical Study

Automated Program Repair (APR) agents leverage Large Language Models (LLMs) to autonomously diagnose and fix software bugs through reasonin…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained with model-free reinforcement learning to play the box…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

LLM Watermark Evasion via Bias Inversion

Watermarking offers a promising solution for detecting LLM-generated content, yet its robustness under realistic query-free (black-box) eva…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

In the era of increasingly complex AI models for time series forecasting, progress is often measured by marginal improvements on benchmark…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Regression Language Models for Code

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of prog…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

Speculative decoding accelerates LLM inference by verifying candidate tokens from a draft model against a larger target model. Recent judge…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

EAGer: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computa…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

Reinforcement learning (RL) has driven recent breakthroughs in large language models (LLMs), especially for tasks where rewards can be comp…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Principles of Diffusion Models

This book presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabil…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing

We present ReflexGrad, a dual-process architecture for within-episode failure recovery in LLM agents without demonstrations. When agents co…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Not All Pixels Are Equal: Pixel-wise Meta-Learning for Medical Segmentation with Noisy Labels

Medical image segmentation is crucial for clinical applications, but it is frequently disrupted by noisy annotations and ambiguous anatomic…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Object-Centric Vision Token Pruning for Vision Language Models

In Vision Language Models (VLMs), vision tokens are quantity-heavy yet information-dispersed compared with language tokens, thus consume to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Diffusion-Augmented Markov Decision Processes for Maximum Entropy Reinforcement Learning

Diffusion models excel at sampling from complex, unnormalized distributions. In this work, we extend Maximum Entropy Reinforcement Learning…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Optimal and Diffusion Transports in Machine Learning

Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This in…

2026-05-28 13:00 JSTarXiv cs.AIビジネス/資金調達

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study

In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of si…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Snowveil: A Framework for Decentralised Preference Discovery

Aggregating subjective preferences in social choice traditionally assumes a trusted central authority. In contrast, this paper formalises D…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Feature Learning Dynamics in Infinite-Depth Neural Networks

Deep neural networks have achieved remarkable success in practice, yet a mechanistic understanding of how features evolve during training r…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Adapting, Fast and Slow: On Few-Shot Transportability of Compositions

Generalization across domains requires stable structure that links the source and target distributions. Building on causal transportability…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks th…

2026-05-28 13:00 JSTarXiv cs.AIエージェント研究/論文

The Optimal Sample Complexity of Linear Contracts

In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn fro…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models

As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical saf…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning

Vision Transformers (ViTs) excel in semantic recognition but exhibit systematic failures in spatial reasoning tasks such as mental rotation…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Differential syntactic and semantic encoding in LLMs

We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation

Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content while preservin…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs are ineffective in t…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Do readers prefer AI-generated Italian short stories?

This study investigates whether readers prefer AI-generated short stories in Italian over one written by a renowned Italian author. In a bl…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI画像/動画生成エージェント

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

Recent advances in video generation have produced models capable of synthesizing stunning visual content from simple text prompts. However,…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

We present a systematic review of 337 articles evaluating the syntactic abilities of Transformer-based language models (TLMs), reporting on…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning

Learning from Noisy Labels (LNL) remains a fundamental challenge in deep learning because real-world datasets often contain corrupted annot…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

A Sheaf-Theoretic and Topological Perspective on Complex Network Modeling and Attention Mechanisms in Graph Neural Models

Combinatorial and topological structures, such as graphs, simplicial complexes, and cell complexes, form the foundation of geometric and to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-d…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning

Test-time reinforcement learning generates multiple candidate answers via repeated rollouts and performs online updates using pseudo-labels…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language mod…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation

Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, off…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rel…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Capture Timing-Attention of Events in Clinical Time Series

Automatically discovering personalized trajectories (i.e., sequential event patterns) from longitudinal EHR data is crucial for enabling pr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Singular Vectors of Attention Heads Align with Features

Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: i…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex coopera…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

Training against white-box deception detectors has been proposed as a way to make AI systems honest. However, such training risks models le…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

The rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge evaluation…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Noise Scheduling as Information-Guided Allocation in Diffusion Training

We introduce InfoNoise, an online adaptive noise schedule for diffusion training that reallocates optimization effort toward noise levels w…

2026-05-28 13:00 JSTarXiv cs.AI画像/動画生成

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Probing for Knowledge Attribution in Large Language Models

Large language model (LLM) hallucinations, meaning fluent but factually incorrect generations, fall into two types: faithfulness violations…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Learning Tangent Bundles and Characteristic Classes with Autoencoder Atlases

We introduce a theoretical framework that connects multi-chart autoencoders in manifold learning with the classical theory of vector bundle…

2026-05-28 13:00 JSTarXiv cs.AIエージェントロボティクス

SPARC: Spatial-Aware Path Planning via Attentive Agent Communication

Efficient communication is critical for decentralized Multi-Robot Path Planning (MRPP), yet existing learned communication methods treat al…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

RelaxFlow: Text-Driven Amodal 3D Generation

Image-to-3D generation faces inherent semantic ambiguity under occlusion, where partial observation alone is often insufficient to determin…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

Open-world interactive object search in household environments requires understanding semantic relationships between objects and their surr…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers fr…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexteri…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like fede…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

From Causal Discovery to Dynamic Causal Inference in Neural Time Series

Time-varying causal models provide a powerful framework for studying dynamic scientific systems, yet most existing approaches assume that t…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation

Direct Preference Optimization (DPO) guides large language models (LLMs) to generate recommendations aligned with user historical behavior…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code

Code agents resolve 65-70% of SWE-bench Verified issues, but Pass@1 cannot tell us why the rest fail, and, as we show, capable-model failur…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Driving

Developing robust models to accurately predict the trajectories of surrounding agents is fundamental to autonomous driving safety. However,…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Speaking of Language: Reflections on Metalanguage Research in NLP

This work aims to shine a spotlight on the topic of metalanguage. We first define metalanguage, link it to NLP and LLMs, and then discuss o…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation

Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into c…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Compositional Consistency-Guided Decoding for Three-Way Logical Question Answering

Three-way logical question answering (QA) assigns one of $\text{True}$, $\text{False}$, or $\text{Unknown}$ to a hypothesis $H$ given a pre…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces sec…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential f…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks

Evaluating large language models (LLMs) for legal reasoning requires workflows that span task design, expert annotation, model execution, a…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

Women with polycystic ovary syndrome (PCOS) face substantially elevated risks of body image distress, disordered eating, and metabolic chal…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at co…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Retention Consequence in Lifecycle Memory Control

Persistent memory can fail after successful admission: a premise is written, then becomes a silent assumption, and later maintenance treats…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

Negative Advantages Is a Double-Edged Sword: Calibrating advantages in GRPO for Search Agents

Search agents achieve strong question-answering performance through multi-turn interactions with search engines, with Group Relative Policy…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, w…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

LASER: Learning Active Sensing for Continuum Field Reconstruction

High-fidelity measurements of continuum physical fields are essential for scientific discovery and engineering design but remain challengin…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

DiagramBank: A Quality-Audited Dataset of Scientific Schematic Diagrams with Multi-Level Document Context

Scientific papers use schematic diagrams to communicate methods, workflows, and system structure, yet existing scientific-figure corpora of…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs

Large language models (LLMs) show promise for molecular optimization, but aligning them with selective and competing drug-design constraint…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Graph Memory Transformer (GMT)

We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning

Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work typically treats layer…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

The Forensic Cost of Watermark Removal: From Dedicated Attacks to Image Editing

Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. Whil…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Escaping Mode Collapse in LLM Generation via Geometric Regulation

Mode collapse is a persistent challenge in generative modeling and appears in autoregressive text generation as behaviors ranging from expl…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction

Modern astrophysical studies rely heavily on complex data analysis pipelines; however, published descriptions often lack the detail require…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval

Every major benchmark for LLM memory systems, LoCoMo foremost, measures whether a model answered correctly, not whether the memory system r…

2026-05-28 13:00 JSTarXiv cs.AIエージェント

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to fil…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Retinal diagnosis is inherently bilateral: clinicians compare homologous structures across eyes (e.g., optic disc asymmetry), yet most deep…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin

Vector Quantized Variational Autoencoder (VQ-VAE) has become a fundamental framework for learning discrete representations in image modelin…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

Multi-head Latent Attention (MLA), the attention used in DeepSeek-V2/V3, jointly compresses keys and values into a low-rank latent and matc…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

From Prediction to Intervention: The Evolution of AI in Biomedicine

Artificial intelligence has advanced rapidly in biomedicine through large-scale multimodal data integration, enabling increasingly accurate…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Voice "Cloning" is Style Transfer

Artificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity pr…

2026-05-28 13:00 JSTarXiv cs.AI画像/動画生成エージェント

MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation

Text-to-video (T2V) generation has rapidly progressed in visual fidelity, yet its ability to faithfully represent multiple cultures within…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AI

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but dec…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting

Test-time reinforcement learning (TTRL) reports substantial accuracy gains on mathematical reasoning benchmarks using majority vote as a ps…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

We demonstrate that in knowledge distillation for diffusion models, the teacher network's highly complex denoising process - stemming from…

2026-05-28 13:00 JSTarXiv cs.AI研究/論文

Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

Deep learning has brought significant progress to medical image classification, yet most existing methods still rely on isolated visual evi…

2026-05-28 12:00 JSTITmedia AI+エージェント

Docker専用のAIエージェント「Gordon」が正式リリース 無料アカウントでも利用可能

米Docker社は、Docker DesktopとDocker CLIの新機能として、Dockerに関する質問への回答やベストプラクティスの提案、エラーの修正対応などを行ってくれるAIエージェント「Gordon」が正式版になったと発表しました。

2026-05-28 10:55 JSTITmedia AI+LLM/生成AI

「国会議員マップ」話題 建設職人が個人で開発、議員の発言や政治の動きを分かりやすく 生成AI活用

小さな建設会社を経営し、工事現場で働く職人でもある「中島」氏が、けがで現場を離れた期間に立ち上げたというサイトだ。余暇に個人で開発・運営しており、収益は目的にしていない。

2026-05-28 10:50 JSTITmedia AI+LLM/生成AI

OpenAI、Anthropicが新会社設立 国内SIerは「黒船襲来」に対抗できるか?

AnthropicとOpenAIがAIサービスを担う新会社の設立を相次いで発表した。FDEやApplied AI Engineerを擁する「黒船」来襲はSIerにとって脅威となるのか。国内SIerが取るべき備えとは何か。

2026-05-28 09:17 JSTTechCrunch AIその他

Why Google’s AI can’t spell Google (or anything else)

Google is embarrassing itself, again.

2026-05-28 08:00 JSTITmedia AI+エージェント

AIコーディングエージェント市場が「新段階」突入 IDEが必要不可欠でなくなる“3つの理由”

Gartnerによると、AIの進化を受けてAIコーディングエージェントの市場が「新たな段階」に入った。統合開発環境(IDE)が必要不可欠なものでなくなる3つの理由とは。

2026-05-28 07:00 JSTITmedia AI+その他

「背中を見て覚えろ」はもう限界 「職員激減」に向けて自治体DXに必要な「AIと共有する業務マニュアル」の作り方

自治体DXの推進が叫ばれる中、現場では業務の属人化や知識継承の停滞といった課題が依然として残っている。背景には、「オレの背中を見て覚えろ」に象徴される暗黙知への依存や、再現性を前提とした業務設計の不足があるのではないか。CIO補佐官として全国の自治体を支援する筆者が、人間とAI…

2026-05-28 06:00 JSTITmedia AI+LLM/生成AI

失敗データこそ資産だ――3Dモデルや解析結果をAIで統合、一目で探せるナレッジに

「RAGでは超えられない製造現場の暗黙知がある」――ギリアは3Dモデルや解析結果をマルチモーダルLLMで統合し、設計の暗黙知を形式知化する新プラットフォームの提供を開始した。不採用理由や失敗の文脈も、組織の資産に変える。

2026-05-28 05:10 JSTTechCrunch AIハードウェア/半導体

In more good news for Amazon, Snowflake signs $6B deal with AWS for AI CPU chips

Snowflake has signed a new, enormous five-year deal with Amazon to secure chips for AI usage. Nvidia is once again being put on notice.

2026-05-28 04:39 JSTTechCrunch AIその他

Payroll startup Remote says it grew revenue 50% per employee without adding headcount

Payroll service provider Remote recently surpassed $300 million in annual recurring revenue (ARR) and became cash-flow positive, thanks to…

2026-05-28 03:39 JSTTechCrunch AIその他

Your SEO strategy is optimized for a search engine that no longer exists.

Google I/O made it official: AI-generated answers are now front and center in search, and most brands have almost no visibility into how AI…

2026-05-28 03:00 JSTITmedia AI+LLM/生成AI

最新AI「ミュトス」を使えても「バグマゲドン」に? Firefox開発元に学ぶセキュリティ対策

米AnthropicのAIモデル「Claude Mythos Preview」のような最先端モデルさえ使えれば、サイバーセキュリティ対策は万全になるのか。Webブラウザ「Firefox」を手掛ける米Mozillaの事例を読み解く。

2026-05-28 03:00 JSTTechCrunch AIその他

Meta launches Instagram, Facebook, and WhatsApp subscriptions, with more to come, including AI plans

Meta is rolling out paid subscription plans for Instagram, Facebook, and WhatsApp worldwide, while also testing new AI, creator, and busine…

2026-05-28 01:00 JSTTechCrunch AIビジネス/資金調達

AI coding startup Cognition raises $1B at $25B pre-money valuation

As Cognition reaches $492 million in annualized revenue run rate, it more than doubled its valuation in eight months, it says.

2026-05-28 00:08 JSTITmedia AI+LLM/生成AI

OpenAI Foundation、AIによる経済激変から労働者を守るため2.5億ドルを拠出へ

OpenAI Foundationは、AIの普及に伴う労働市場や経済への急激な変化に対応するため、初期資金として2億5000万ドルを拠出すると発表した。この資金は、経済への影響測定、労働者の移行支援、新たな経済的安定モデルの構築の3領域に充てられる。組織再編を経た同財団が、ガバ…

2026-05-27(459件)

2026-05-27 23:15 JSTTechCrunch AIその他

Startup Battlefield 200 applications close today: Nominate a founder or submit your startup

Today is the final day to apply or nominate a startup for Startup Battlefield 200. Once the clock strikes 11:59 p.m. PT, the window closes…

2026-05-27 23:14 JSTTechCrunch AIその他

ElevenLabs’ new music-generation model can switch genres mid-track

ElevenLabs' new model will let users regenerate a section of a song without affecting the rest of the track.

2026-05-27 23:00 JSTTechCrunch AIその他

TechCrunch Disrupt 2026 Early Bird ticket savings end in 3 days

There are only 3 days left to save up to $410 on your ticket to TechCrunch Disrupt 2026. Early Bird pricing ends May 29 at 11:59 p.m. PT, a…

2026-05-27 23:00 JSTTechCrunch AIその他

SOND, a sleep tech startup from Bose’s former head of sleep, exits stealth with $7M

SOND introduced its debut product: Dreambuds, a closed-loop, in-ear system that captures 12 physiological signals from the wearer, then act…

2026-05-27 22:48 JSTTechCrunch AIその他

China is increasingly keeping its best AI talent to itself

China's AI boom is producing world-class talent, and Beijing is increasingly reluctant to let them go elsewhere.

2026-05-27 22:04 JSTTechCrunch AIビジネス/資金調達

ClickHouse triples annualized revenue to $250M, charting a path toward an IPO

The database provider is eyeing a public debut within the next few years.

2026-05-27 22:00 JSTTechCrunch AIその他

YouTube will now automatically label AI videos

YouTube will now automatically label videos that use significant photorealistic AI, instead of relying solely on creators to disclose AI-ge…

2026-05-27 21:30 JSTTechCrunch AIその他

Tech CEOs are apparently suffering from AI psychosis

"CEOs are uniquely prone to AI psychosis," Box CEO Aaron Levie opines. Maybe that explains the almost religious belief in AI productivity g…

2026-05-27 21:30 JSTTechCrunch AIエージェントビジネス/資金調達

Robinhood now lets your AI agents trade stocks

While these agents would be able to read and analyze users' portfolios to come up with trading strategies and suggest investments, they'll…

2026-05-27 20:00 JSTOpenAILLM/生成AIエージェント

Cisco and OpenAI redefine enterprise engineering with Codex

Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, a…

2026-05-27 18:25 JSTITmedia AI+LLM/生成AI

NEC、日立、富士通が“Anthropic協業”でそろい踏み 狙いは? 【3社の幹部コメントまとめ】

わずか1カ月の間にNEC、日立製作所、富士通がAnthropicとの協業を発表した。各社の狙いはどこにあるのか。

2026-05-27 16:48 JSTITmedia AI+LLM/生成AI

「この答弁はAIが原案作成」 参院本会議で松本デジタル相「職員が事実確認し私が決済」

5月から全府省庁で実証事業を開始している行政用の生成AI基盤「源内」について、5月27日の参院本会議で参政党の梅村みずほ氏が活用状況を尋ねたのに対し、松本尚デジタル相は「この答弁も源内が原案を作成した」と述べた。

2026-05-27 16:00 JSTOpenAILLM/生成AIエージェント

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating wor…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization

Generating physically buildable brick structures from 3D shapes requires more than geometric reconstruction: the output must also satisfy d…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Can LLMs Introspect? A Reality Check

Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question is…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables au…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Constraint acquisition needs better benchmarks

Constraint Acquisition (CA) and related research on the validation and enhancement of Mathematical Programming (MP) models from domain know…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Long-lived AI agents are increasingly deployed as persistent operational systems, yet they are still evaluated like freshly initialized mod…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Experiments in Agentic AI for Science

This paper details two novel frameworks for developing autonomous, agentic AI in scientific workflows. Both systems leverage a hybrid Local…

2026-05-27 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達研究/論文

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

AI agents are beginning to complete valuable, long-horizon business operations tasks, but training and evaluation environments for enterpri…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Theory of Mind (ToM), the ability to infer others' knowledge, intentions, and emotions, is commonly evaluated in large language models (LLM…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

JobBench: Aligning Agent Work With Human Will

Current benchmarks for occupational AI agents are scoped primarily by economic values, telling a replacement story. We introduce JobBench,…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

Educational virtual laboratories can make experimental training more scala-ble, adaptive, and accessible, especially when students have lim…

2026-05-27 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体研究/論文

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Autonomous research agents produce competitive solutions and professional-looking manuscripts, yet their outputs contain verifiability fail…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Automatic Layer Selection for Hallucination Detection

Recent studies on hallucination detection have shown that hallucination-related signals are more strongly encoded in intermediate layers th…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL

Hierarchical Reinforcement Learning (HRL) promises to solve long-horizon Reinforcement Learning (RL) tasks more efficiently than non-hierar…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Advancing Creative Physical Intelligence in Large Multimodal Models

Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities ge…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

A long-standing goal of the research community is to develop highly interactive LLM-based dialogue agents. Recent research focuses on optim…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems ar…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unl…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Legal reasoning requires distinguishing changes that matter from those that do not. Legal AI should remain stable under legally irrelevant…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Polymer discovery is central to fields ranging from energy storage to biomedicine, but it is hindered by an astronomically large chemical d…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

Mobile graphical user interface (GUI) agents enable AI models to autonomously operate smartphones on behalf of users. However, most existin…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Clinical practice guidelines (CPGs) encode evidence-based decision logic that clinicians apply by evaluating patient variables, conditional…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, ba…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning

Vision-language models such as CLIP have shown impressive capabilities in aligning images and text, but they often struggle with lengthy an…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

LLM-based multi-agent systems decompose complex tasks into interacting roles, but most remain manually orchestrated by prompts, tools, and…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems

Long-horizon decision problems with cumulative damage couple locally attractive actions to globally adverse outcomes. We identify two ortho…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Large language model (LLM) agents increasingly rely on external memory systems to remain consistent across long-horizon interactions, but l…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

Medical AI agents increasingly use external tools for diagnosis, treatment recommendation, and evidence retrieval, yet most existing approa…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

Large language models (LLMs) have shown strong empirical gains as self-evolving agents for CUDA kernel generation, driven by feedback-condi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

It's Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers

A prevalent assumption in LLM agent deployment holds that more structured harnesses universally improve reliability, and that higher-capabi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIロボティクス

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Large Language Models (LLMs) have brought huge improvements to Artificial Intelligence (AI), which can be applied to general-purpose tasks.…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Large reasoning models (LRMs) generate chain-of-thought (CoT) traces before producing final outputs, introducing a dynamic internal state t…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Retrieval-augmented generation promises to ground language model outputs in external evidence, yet the field has no reliable way to verify…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

Advanced Large Multimodal Models (LMMs) have demonstrated impressive performance in K-12 reasoning tasks, exhibiting great promise as intel…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Post-training is routinely evaluated through aggregate benchmark scores that treat multi-hop reasoning as a single capability -- as if a mo…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

Chain-of-thought (CoT) prompting reliably improves language-model accuracy, but which properties of a rationale text drive the improvement…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Helicase: Uncertainty-Guided Supply Chain Knowledge Graph Construction with Autonomous Multi-Agent LLMs

LLM-based multi-agent systems have been widely adopted for knowledge retrieval and report generation, synthesizing known information throug…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation

Multi-stakeholder tasks require one output to satisfy users with conflicting preferences. Holistic LLM judges conflate utility estimation a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

On the Detection of Commutative Factors in Factor Graphs: Necessary and Sufficient Conditions

Exploiting the indistinguishability of objects in a probabilistic graphical model such as a factor graph is key to lifted probabilistic inf…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews

LLM-generated peer reviews are increasingly common at major venues, yet their deficiencies are hard to detect because they are uniformly fl…

2026-05-27 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

From Norms to Indicators (N2I-RAG): An Agentic Retrieval-Augmented Generation Framework for Legal Indicator Computation

Computing legal indicators from normative texts is a key task in legal monitoring and policy evaluation, but presents significant challenge…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Developing a Totally Unimodular Linear Program for Optimal Conformance Checking: When and Why It Complements A*

Alignment-based conformance checking is the state-of-the-art approach for comparing observed process executions with normative process mode…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

LLMs deployed in high-stakes domains face fundamental reliability challenges: hallucinations, inconsistencies, and privacy vulnerabilities…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

Entity linking is a key component of many downstream NLP systems, yet existing approaches are often tied to the specific target knowledge b…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Generating Robust Portfolios of Optimization Models using Large Language Models

Mathematical optimization is a powerful tool for structured decision-making across domains such as resource allocation and planning. Formul…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

ORCA: An End-to-End Interactive Copilot for Optimized Root Cause Analysis

Causal analysis is a crucial task in many domains, including manufacturing, social science, and medicine. However, despite recent progress,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

Knowledge graphs (KGs) have become the core backbone of numerous downstream tasks such as question answering and recommender systems. Howev…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting

Early battery degradation trajectory forecasting (BDTF), which predicts the full-life state-of-health trajectory from early operational dat…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Traceable Knowledge Graph Reasoning Enables LLM-Assisted Decision Support for Industrial VOCs in the Steel Industry

Key knowledge for steel-industry volatile organic compounds (VOCs) governance is scattered across unstructured scientific literature, makin…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Can Broad Biomedical Knowledge be Contextualized into Scenario-Grounded Propositions?

Biomedical discovery often requires connecting broad biomedical knowledge with specific experimental or clinical data. Background knowledge…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation

Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Position: AI Safety Requires Effective Controllability

AI safety is still largely framed as alignment: training models to follow human preferences, safety policies, and normative constraints. Th…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, b…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules

Machine unlearning aims to remove the influence of specific data from trained language models. In real-world deployments, unlearning reques…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning

Reinforcement learning for multi-turn agents suffers from a credit-assignment mismatch: rewards are sparse and trajectory-level, while succ…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Retrieval-augmented LLMs are deployed for tasks where evidence quality determines action safety, yet evaluation protocols assume that singl…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering

Retrieval-Augmented Generation (RAG) systems for question answering typically retrieve evidence by semantic similarity between the query an…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Compressive Knowledge Graph Hypothesis: Which Graph Facts Matter for Scientific Hypothesis Generation?

Knowledge graphs (KGs) can provide structured scientific context to language models, but it remains unclear which graph facts actually shap…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoni…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering

An effective method of teaching across disciplines is to provide examples of high-quality work. However, an example may be significantly di…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

SIA: Self Improving AI with Harness & Weight Updates

Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding

Agentic AI systems combine probabilistic reasoning with delegated action through tools, context, memory, orchestration, and external workfl…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Maat: The Agentic Legal Research Assistant for Competition Protection

Competition law experts conducting legal research must review extensive volumes of cases, decisions, and judicial reports to identify prece…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

ASP(Q) extends Answer Set Programming (ASP) with Quantifiers over answer sets. In this paper we focus on the class of ASP(Q) programs with…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In t…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Natural Language Query to Configuration for Retrieval Agents

Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy --…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

Porting deep learning algorithms to new hardware accelerators requires developers to repeatedly apply the same low-level optimizations -- q…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

Industrial Edge AI programs often begin with the model and only later confront the platform. That sequencing is attractive because it allow…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorizat…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretrain…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Eroding Trust in Real Speech: A Large-Scale Study of Human Audio Deepfake Perception

Audio deepfakes have improved rapidly recently, yet their effect on human trust in real speech remains unstudied. We present the largest li…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

AssetGen: Deployable 3D Asset Generation at Interactive Speed

While 3D generation is progressing rapidly, recent work has often focused on obtaining high-resolution assets, leaving user experience and…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

We present VISTA (VIsual Spec-To-App Benchmark), a benchmark for evaluating the end-to-end web-app generation capabilities of LLM-based age…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Augment Engineering: A Methodology for Multi-Tool AI Orchestration Across Professional Domains

Organizations increasingly deploy separate purpose-built AI tools across professional domains, often hiring domain specialists for each, re…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

LLM-driven agents are capable of selecting external tools to complete users' tasks. However, attackers could compromise such process, steer…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability

Guided Soft Actor-Critic (GSAC) distills knowledge from a privileged full-state teacher to a partial-observation student for autonomous dri…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored secu…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Safety alignment in large language models (LLMs) and multimodal large language models (MLLMs) is commonly assumed to operate as a near-bina…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

Time series foundation models (TSFMs) are increasingly pretrained on large corpora, raising concerns that evaluation datasets may have been…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach

Asynchronous decentralized federated learning (ADFL) eliminates central coordination and global synchronization, making it attractive for l…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Enhancing Autonomous Online Intrusion Detection for IoT with Balanced Learning, Reliable Pseudo-Labels, and Lightweight Architectures

The rapid proliferation of Internet of Things (IoT) devices has created an urgent demand for adaptive, resource-efficient Intrusion Detecti…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning

We propose Lie group embedded dynamical neural networks (LieEDNN) and the corresponding learning algorithms based on gradient descent and m…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

Production language-model systems answer a request by partitioning it across an invisible orchestration of worker agents that recompose one…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only t…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

PitchBench: Measuring Pitch Hearing in Audio-Language Models

Audio-language models (ALMs) are increasingly used in real-world applications that require understanding music, from music tutoring and tra…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether s…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

AutoDFT: A Closed-Loop Multi-Agent Framework for Autonomous DFT Calculations

Density functional theory (DFT) serves as the basis for computational discovery in materials science and chemistry, yet each calculation de…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

Hybrid post-training usually combines supervised fine-tuning and reinforcement learning, but fixed mixing schedules cannot adapt when the r…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

Functionality-correct repository setup aims to configure execution environments (e.g., dependencies, build scripts) to successfully execute…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

Quantization-aware training (QAT) with low-bit floating-point formats enables efficient LLM deployment, yet introduces subtle failure modes…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals

This paper presents the HRVConformer, a novel deep learning architecture for the classification of hypoxic-ischemic encephalopathy (HIE) us…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体研究/論文

Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series

This research addresses the problem of adaptive modeling in time-series data streams with clear input-output relationships. This problem is…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Co-folding model guided by structural proteomics

Protein structure generative models excel at predicting single protein static structures from sequence, but routinely fail to capture the c…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection

Time series anomaly detection (TSAD) has long been a hot research topic in data mining due to its various applications. Recent studies chal…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that stru…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Workflow Closure Is Not Scientific Closure in Auto-Research Systems

This paper argues that workflow closure is not scientific closure in auto-research systems. Current systems can increasingly complete resea…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

AgentSociety: Incentivizing Agentic Social Intelligence

The success of deployed agents relies on their ability to handle open-ended user requests using their inherent capabilities, not only in so…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Unified Neural Scaling Laws

We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

Prospective evaluation of multimodal respiratory failure prediction: Do chest X-rays improve performance beyond EHR signals?

Early prediction of respiratory failure is critical for timely clinical intervention in intensive care units. Existing electronic health re…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Chunk-wise autoregressive video diffusion models rely on a KV cache of previously generated chunks to avoid redundant computation, but this…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

VesselSim: learning 3D blood vessel segmentation without expert annotations

Blood vessel segmentation is a core task in medical image analysis for the care of vascular diseases and surgical planning, yet the challen…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Decoupled Delay Compensation: Enhancing Pre-trained MARL Policies via Learned Dynamics Filtering

Real-world multi-agent reinforcement learning (MARL) systems must often operate under stale observations, stochastic communication delays,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Intelligent Detection and Mitigation of Carpet-Bombing DDoS Attacks in SDN Using Retrieval-Augmented Generation and Large Language Models

Software-Defined Networking (SDN) provides flexible and programmable network management; however, its centralized control architecture rema…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Curriculum Learning for Safety Alignment

Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成エージェント

E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

Controllable and physically grounded egocentric video generation is essential for embodied agents to reason about how their own and others'…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Semigroup Consistency as a Diagnostic for Learned Physics Simulators

Learned physics simulators are often evaluated by one-step or short-horizon prediction error, but these metrics can miss failures in tempor…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Machine unlearning aims to remove specific concepts from pretrained text-to-image diffusion models, yet several white- and black-box attack…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning

In-context learning (ICL) is often motivated by the intuition that demonstrations help because they provide correct input-output examples.…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Personalized Generative Models for Contextual Debiasing

Different visual patterns appear with different frequencies in the world: e.g., beach balls appear on sand more often than they do on a roa…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

In many reasoning tasks, large language models (LLMs) rely on structured external knowledge, such as graphs and tables, which is typically…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to r…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) is biologically heterogeneous, shaped by the interplay between hepatic functional reserve and tumor-related…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes

Frontier multimodal large language models (MLLMs) have been reported to achieve over 90% accuracy on fine-grained perception benchmarks. Ho…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking

Large-scale search, recommendation, and retrieval-augmented generation (RAG) systems typically employ a two-stage architecture: an early-st…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection

Large language models (LLMs) are increasingly used in decision-making tasks where they can amplify or suppress perspectives, raising concer…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Plans for Evaluating Structured Generative Search Summaries

We propose a framework for evaluating structured generative search summaries that are placed atop organic web search results. A structured…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models

Evaluating and mitigating a generative system's susceptibility to jailbreak attacks is critical to its safe deployment. Given the number of…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Confounder Detection via Treatment Intent: A New Observational Study Design

Understanding the effects of interventions is central to scientific progress, with randomized controlled trials (RCTs) regarded as the gold…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP

Deploying Vision-Language Models on resource-constrained hardware typically requires INT8 quantization, but in joint-embedding architecture…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost acros…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Uniboost: Global Coordination with Value Alignment for Fair and Efficient Traffic Allocation

With the rapid evolution of internet services, recommendation systems have become indispensable. In particular, the blending (re-ranking) s…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal metho…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models

Discrete masked diffusion language models such as LLaDA generate text through iterative denoising, where mask tokens are progressively repl…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

LURE: Live-Usage Replay Evaluations for Reducing Evaluation Awareness

Large language models can recognize when they are being evaluated (evaluation awareness) and behave differently because of that, which unde…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective

This paper addresses the challenging task of weakly-supervised video temporal grounding. Existing approaches are generally based on the mom…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Alignment Tuning for Large Language Models: A Data-Centric Lens on Alignment Data Pipelines

Much of the alignment tuning literature is organized around optimization objectives, while the construction of alignment data is often trea…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection

Graph anomaly detection (GAD) aims to identify nodes or substructures whose behavior or attributes deviate significantly from the overall p…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

Cross-scale Aligned Supervision for Training GANs

Modern GANs often introduce adversarial supervision on intermediate generator outputs and interpret the resulting multi-stage synthesis as…

2026-05-27 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

AI coding agents are increasingly used to write real-world software, but ensuring that their outputs are correct remains a fundamental chal…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation

Multi-Modal Diffusion Transformers (MM-DiTs) encode rich representations for training-free concept grounding, but existing attention-based…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Towards Error-Free EHRs: Reasoning-Intensive Consistency Verification Between Clinical Notes and Structured Tables in Electronic Health Records

Data consistency between unstructured clinical notes and structured tables in Electronic Health Records (EHRs) is essential for patient saf…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection

Latent defect screening is challenged by extremely low failure rates, high-dimensional test data, and absence of labeled anomalies. We prop…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes

Vision-based metric distance and area measurement remains challenging in large-scale outdoor environments due to long-range sensing, camera…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse vis…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories

LLM-generated stories are a popular use case, but they show very low variability. We sample 20,000 total stories from four current models u…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling

The Mixture of Experts MoE architecture is highly promising for resource constrained on device deployments yet training these models from s…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization

Large Vision-Language Models (LVLMs) have transformed multi-modal understanding, excelling in tasks like image captioning and visual questi…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Foundations of a Time-Consistent Counterfactual Actuarial Runtime for Autonomous AI Agents

We propose a foundational runtime actuarial layer for autonomous AI agents in which every side-effect-bearing action carries a time-consist…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies

Confirming Alzheimer's disease (AD) typically relies on positron emission tomography (PET), which remains costly and invasive, motivating t…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward

While vision-language models (VLMs) have exhibited multi-turn visual reasoning capabilities, their reasoning trajectories remain relatively…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

StreamSplit: Continuous Audio Representation Learning via Uncertainty-Guided Adaptive Splitting

Large-batch Contrastive Learning (CL), the foundation of modern representation learning, is fundamentally incompatible with the volatile re…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence

Maritime intelligent transportation systems (MITS) are essential for ensuring navigation safety and efficiency in busy waterways. However,…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成

ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation

Minute-scale cinematic video generation is a central challenge for generative video models. Existing paradigms address only fragments of th…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection

Automated industrial inspection requires both precise defect localization and structured maintenance report generation; in current practice…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Recursive Flow Matching

Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, ac…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials

Energetic-materials performance gains translate directly into reduced propellant mass, smaller warheads, and more efficient civilian gas-ge…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

Tool-using agents increasingly operate in open-ended deployment environments, where they compose file systems, web APIs, code interpreters,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference

Aligning a few-step generative model is challenging, since existing alignment frameworks typically rely on restrictive assumptions: a tract…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Linear and Neural Dueling Bandits with Delayed Feedback

Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in recommender systems and la…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice

Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Reliable Extraction of Clinical Follow-Up Instructions: A Hybrid Neural-Symbolic Pipeline

Objective. Outpatient notes carry follow-up instructions pairing actions with future times ("MRI brain in two weeks"). Extracting (action,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial

Learning-based methods for synthesizing controllers have gained popularity due to their high expressiveness and strong empirical performanc…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成

On the Error-Correcting Effects of Stochasticity in Discrete Diffusion

Discrete diffusion models achieve strong performance in text and image generation, but their inference remains slow and must inherently bal…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

Childhood anemia affects around 40% of children aged 6-59 months globally and arises from heterogeneous factors, limiting model generalizab…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Examining the Challenges of Intellectual Property in AI-Generated Productions

With the advancement of artificial intelligence systems capable of autonomously generating artistic, literary, musical works, and even inve…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primar…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition

Standard Self-Supervised Learning (SSL) for Automatic Modulation Recognition (AMR) struggles with ineffective isotropic augmentations, spec…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training

Reinforcement learning (RL) is the dominant paradigm for post-training large language models. However, in the online, on-policy setting, ro…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation

Volumetric Reasoning Segmentation (VRS) aims to segment a target region in a 3D medical scan from a free-form clinical query, where the ref…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

We introduce JetViT, a novel family of hybrid-architecture Vision Transformer (ViT) models that match the accuracy of state-of-the-art full…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Bilevel Optimization over Saddle Points of Zero-Sum Markov Games

Reinforcement learning (RL) often has a hierarchical structure, where an upper-level (UL) learner selects model parameters and a lower-leve…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models

Out-of-distribution (OOD) detection has emerged as a popular technique to enhance the reliability of machine learning models by identifying…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

AI evaluation may bias perceptions: The importance of context in interpreting academic writing

This paper examines how estimates of AI use in scientific writing can be biased when evaluation methods ignore contextual differences acros…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models

Sequential editing of structured knowledge in large language models allows targeted factual updates without retraining, yet existing method…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Certified Causal Attribution for Real-Time Attack Forensics in 6G Network Slicing

Cross-slice attack attribution in 6G networks requires identifying causal propagation chains through shared infrastructure in under 100 ms.…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

Recent video multimodal large language models (MLLMs) increasingly couple step-by-step reasoning with on-demand visual evidence retrieval,…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

An In-Vitro Study on Cross-Lingual Generalization in Language Models

Cross-lingual transfer in language models is difficult to study in natural corpora because lexical overlap, morphology, data imbalance, and…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning

Group-based reinforcement learning (RL) methods have achieved remarkable success in improving the performance of large language models (LLM…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets

Protein sequence optimization under tight oracle budgets requires methods that explore vast combinatorial spaces while making each evaluati…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Model Merging on Loss Landscape: A Geometry Perspective

Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

Reliable watermarking of panoramic imagery is fundamentally challenged by arbitrary 3D rotations. As panoramas are defined on the sphere, t…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation

Epidemic forecasting faces a fundamental challenge: human behavior dynamically responds to disease spread, creating feedback loops that ind…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

L2Rec: Towards Dual-View Understanding of LLMs for Personalized Recommendation

Adapting large language models (LLMs) for personalized recommendation requires aligning their general-purpose capabilities with user-specif…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Measuring Prediction Uncertainty in Neural Cellular Automata

Neural cellular automata (NCA) provide a lightweight alternative to encoder-decoder segmentation networks. However, it can be difficult to…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

Looped Language Models (LoopLMs) enable efficient latent reasoning through depth recurrence, yet exhibit unreliable test-time scaling behav…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

Inverse design of materials has significantly advanced target-driven formulation optimization, yet existing materials machine learning benc…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning wh…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Adversarial Training for Robust Coverage Network under Worst-case Facility Losses

The Maximal Covering Location-Interdiction Problem (MCLIP) is a classic bi-level optimization problem, which is fundamental to resilient in…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Generative artificial intelligence and the marginalization of minoritized knowledges in higher education: the case of disability

Generative artificial intelligence redefines higher education by restructuring the processes through which scientific knowledge is produced…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts

In recent years, Deep Reinforcement Learning (DRL) has achieved substantial progress on Vehicle Routing Problems (VRPs). However, existing…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Ratio-Variance Regularized Policy Optimization

Standard on-policy reinforcement learning relies on heuristic clipping to enforce trust regions, but this mechanism imposes a severe cost b…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation

Post-trained LLMs are often optimized to align responses with human preferences, making them safe, polite, and conversationally appropriate…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Implementation of Big Data Analytics for Diabetes Management: Needs Assessment in the Rwanda Healthcare System

Diabetes is a chronic metabolic disease that can lead to serious health problems if not diagnosed and managed early. Big Data Analytics (BD…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability

Large language models (LLMs) achieve impressive performance when a task is fully specified in a single turn, yet the same models lose up to…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

HTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTML

LLMs can now produce full HTML pages, but many of those pages are only superficially correct: they render once, then fail under scroll, hov…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Innovation: An Almost Characterization of Hallucination

Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigat…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

RAGEAR: Retrieval-Augmented Graph-Enhanced Academic Recommender

We present RAGEAR (Retrieval-Augmented Graph-Enhanced Academic Recommender), a neurosymbolic recommender system for academic course recomme…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

ContextGuard: Structured Self-Auditing for Context Learning in Language Models

Recent benchmarks reveal that despite strong reasoning capabilities, large language models (LLMs) still struggle to faithfully apply comple…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery

State estimation is a fundamental problem in control and signal processing, for which the Kalman Filter provides an optimal solution under…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Periodic Topological Deep Learning for Polymer Design and Discovery

Polymers underpin applications across energy, healthcare, and materials science, yet their vast chemical space makes systematic discovery c…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Sensation Modulating Network:Haltability as the architectural ground for object-directed phenomenology

Cognitive science remains split between cognitivism - which accounts for recursion and language but cannot ground formal symbols in meaning…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection

LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

LLM-based agents for industrial asset operations show limited accuracy when reasoning over flat document stores. AssetOpsBench (KDD 2026) e…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

GeoFaith: A Spatio-Temporal Dual View of Faithful Chain-of-Thought

Chain-of-Thought (CoT) reasoning has advanced large language models (LLMs), but outcome-based supervision leads to pervasive post-hoc ratio…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Strategies for Guiding LLMs to Use Software Design Patterns: A Case of Singleton

Large Language Models (LLMs) can generate functional source code from natural-language prompts, but often fail to consistently follow highe…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ICICLE: Expanding Retrieval with In-Context Documents

Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes cor…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Practical Anonymous Two-Party Gradient Boosting Decision Tree

Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features acr…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models

Large EEG Foundation Models (FMs) have shown great potential for decoding EEG signals across diverse cognitive tasks. However, existing EEG…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Reasoning Depth and Environment Complexity: A Controlled Study of RLVR Data Allocation across Logical Reasoning Tasks

Reinforcement learning with verifiable rewards (RLVR) has become central to post-training reasoning models, yet a key limitation of existin…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Beyond Questions: Evaluating What Large Language Models (Actually) Know

Parametric knowledge in large language models (LLMs) is a cornerstone of their success, yet remains poorly understood. Existing knowledge b…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

JuICE: A Benchmark for Evaluating LLM-Judge in Identifying Cultural Errors

As large language models (LLMs) are increasingly deployed to users around the world, they are integrated into everyday tasks across diverse…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation

Reinforcement learning in open-ended long-form generation is challenging because reliable reference answers and automatic metrics are often…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Recon: Reconstruction-Guided Reasoning Synthesis for User Modeling

User modeling aims to use language models (LMs) to mimic an individual's behavior from a corpus of past context-action pairs (e.g., convers…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning

Repeated sampling with a verifier is the standard way to allocate test-time compute for code generation, with pass@$K$ as the canonical met…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V

W4A4 quantization of large video diffusion Transformers offers substantial memory savings but is hindered by two main challenges: sparse la…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable de…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成ビジネス/資金調達規制/政策

Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models

The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infr…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Less is More: Early Stopping Rollout for On-Policy Distillation

On-policy distillation has recently emerged as a promising alternative to standard sequence-level imitation, training a student by scoring…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Tracing Computation Density in LLMs

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Lessons from Penetration Tests on Large-Scale Agent Systems

As AI systems gain increasing autonomy and execution capability, the number of discovered security vulnerabilities continues to rise. Howev…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ConVer: Using Contracts and Loop Invariant Synthesis for Scalable Formal Software Verification

Formal verification of large C programs is impeded by state-space explosion: Bounded Model Checking (BMC) tools must encode the entire stat…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

E3: Issue-Level Backtesting for Automated Research Critique

We present E3, an automated review assistant that augments reviewers and engineering teams by identifying decision-relevant technical conce…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

Training loss and accuracy are the standard signals used to monitor generalization during deep neural network training. Two well-documented…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Trust Region Q Adjoint Matching

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the m…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while mai…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk Decomposition

Reliable set-valued prediction provides a principled way to mitigate hallucinations in open-ended question answering (QA), yet existing con…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework

In recent years, financial institutions and firms have increasingly adopted synthetic data to address data scarcity and to generate counter…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Beyond the Data Mesh Illusion: Designing Modern AI-augmented Lakehouses to Bridge the Gap Between Theory and Practice

Enterprise data platforms face an enduring tension between domain self-service and holistic governance. The data mesh paradigm proposed dec…

2026-05-27 13:00 JSTarXiv cs.AIハードウェア/半導体

Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems

Deep unfolding neural networks derived from iterative optimization schemes and numerical ordinary/partial differential equations (ODEs/PDEs…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemP…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

LitSeg: Narrative-Aware Document Segmentation for Literary RAG

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge, particularly for long-tail…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Grounding Text Embeddings in Stakeholder Associations

Text embeddings are widely used to analyse large corpora of complex texts. However, it is unclear whether the embeddings capture the same s…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

An investigation of AI integration in sound designer workflows and experiences

Artificial intelligence is increasingly being integrated into professional audio production workflows, yet a gap persists between the tools…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotation…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Learning When to Think While Listening in Large Audio-Language Models

Recent advances in Large Audio-Language Models (LALMs) have made real-time, streaming spoken interaction increasingly practical. In this se…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Generative Animations: A Multi-Model Pipeline for Prompt-Driven Motion Synthesis

Animation elevates digital documents into immersive experiences, yet creating custom motion paths remains cumbersome, requiring designers t…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

TWIST: Closed-Loop token Synchronization for Application-Aware Wireless Digital Twins

Wireless digital twins require repeated synchronization between a time-evolving physical scene and its digital counterpart under limited an…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Qiskit QuantumKatas: Adapting Microsoft's Quantum Computing exercises for LLM evaluation

We adapt Microsoft's QuantumKatas -- a well-established quantum computing curriculum -- from Q# to Qiskit, the most widely-adopted quantum…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Many Logics, One Methodology: A Plea for Logical Pluralism in Formalised Reasoning (preprint)

This position statement looks back on two decades of work on shallow embeddings of non-classical logics in classical higher-order logic (HO…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models

Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, c…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing meth…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

PilotTTS: A Disciplined Modular Recipe for Competitive Speech Synthesis

Building state-of-the-art text-to-speech (TTS) systems typically demands millions of hours of proprietary data and complex multi-stage arch…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)

Modern Large Language Models (LLMs) are often criticized for producing repetitive and homogeneous text, despite possessing vast latent voca…

2026-05-27 13:00 JSTarXiv cs.AIロボティクス

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Vision-Language-Action (VLA) models are increasingly expected to not only complete robot tasks, but also follow human instructions about ho…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling

Time series foundation models (TSFMs) are transforming the forecasting paradigm through large-scale cross-domain pretraining. However, most…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty

Large language models (LLMs) are known to abandon their initial stance to conform to user pushback. While prior research largely attributes…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Risk Averse Alert Prioritization for IDS Using Subnormal Gaussian Fuzzy Models

Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness…

2026-05-27 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

Governed Evolution of Agent Runtimes through Executable Operational Cognition

Recent advances in agentic systems increasingly treat code as an executable operational substrate rather than as a disposable output artifa…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering

Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show pro…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Recent generative models have largely closed the gap on low-level artifacts - pixel fingerprints, frequency anomalies, upsampling traces -…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data eng…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

MobileMoE: Scaling On-Device Mixture of Experts

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-bill…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

Cellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per it…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Algorithmic Monocultures in Hiring

Many employers screen job applicants with algorithms built by the same few algorithm vendors. We hypothesize that algorithmic monoculture l…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Multi-Agent Causal Discovery Using Large Language Models

Causal discovery aims to identify causal relationships between variables and is a fundamental problem across the sciences. Traditional stat…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

LiPUP-MA: A Residential Experience-centric Multi-Agent Framework for Living-in-the-loop Participatory Urban Planning

Participatory Urban Planning (PUP) is increasingly supported by LLM-based agents, yet existing methods largely rely on static preference el…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction

LLM-based agents solve complex tasks through iterative reasoning, tool use, and environment interaction, where each intermediate thought di…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning

Document understanding aims to perform question answering and information extraction over document images, where the visual content is high…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze C…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

EvoEmo: Towards Evolved Emotional Policies for Adversarial LLM Agents in Multi-Turn Price Negotiation

Recent research on Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) has demonstrated that agents can engage in \textit{comp…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Large language models (LLMs) demonstrate strong reasoning abilities via Chain-of-Thought (CoT), but their token-level generation encourages…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

AI-Driven Contribution Evaluation and Conflict Resolution: A Framework & Design for Group Workload Investigation

The equitable assessment of individual contribution in teams remains a persistent challenge, where conflict and disparity in workload can r…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

PaTAS: A Framework for Trust Propagation in Neural Networks Using Subjective Logic

Trustworthiness has become a key requirement for the deployment of artificial intelligence systems in safety-critical applications. Convent…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

LEC: Linear Expectation Constraints for Selection-Conditioned Risk Control in Selective Prediction and Routing Systems

Foundation models often generate unreliable answers, while heuristic uncertainty estimators fail to fully distinguish correct from incorrec…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

XGrammar-2: Efficient Dynamic Structured Generation Engine for Agentic LLMs

Modern LLM agents increasingly rely on dynamic structured generation, such as tool calling and response protocols. Unlike traditional struc…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and d…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Drive-P2D: A Progressive Perception-to-Decision Benchmark for VLMs in Autonomous Driving

Autonomous driving requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonst…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito

To facilitate the transformation of legacy finite difference implementations into the Devito environment, this study develops an integrated…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Chain Of Thought Compression: A Theoretical Analysis

Chain-of-Thought (CoT) has unlocked advanced reasoning abilities of Large Language Models (LLMs) with intermediate steps, yet incurs prohib…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning app…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

UCPO: Uncertainty-Aware Policy Optimization

The key to building trustworthy large language models (LLMs) lies in endowing them with inherent uncertainty expression capabilities, there…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Qrita: High-performance Top-k and Top-p using Pivot-based Truncation and Selection

Despite their importance in model sampling, efficient implementation of Top-k and Top-p algorithms for large vocabularies remains a signifi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

The Necessity of a Unified Framework for LLM-Based Agent Evaluation

With the advent of Large Language Models (LLMs), general-purpose agents have seen fundamental advancements. However, evaluating these agent…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Persona Generators: Generating Diverse Synthetic Personas for Arbitrary Contexts

Evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations, but collecting repre…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

DIANOIA: Diagnostic Decomposition and Joint Optimization for Multi-Agent Reasoning

Multi-agent LLM systems consistently outperform single-agent baselines, yet practitioners still cannot predict which design works for a new…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation

Multi-modal recommendation has gained traction as items possess rich attributes like text and images. Semantic ID-based approaches effectiv…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

LLMs often exhibit Aha moments such as self-correction after tokens like "Wait," yet the underlying mechanism remains unclear. Standard LLM…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Counterfactual Credit Policy Optimization for Multi-Agent Collaboration

Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles, but reinforcement learning f…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

OMD-GraphRAG: Enhancing GraphRAG with Ontology-Guided Extraction, Multi-Dimensional Clustering and Dual-Channel Fusion

Retrieval-Augmented Generation (RAG) systems face significant challenges in complex reasoning, multi-hop queries, and domain-specific QA. W…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representation…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning

Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback

Designing effective heuristics for NP-hard combinatorial optimization problems remains challenging and often requires substantial domain ex…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

From Attribution to Action: A Human-Centered Application of Activation Steering

Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

Prefilling computational costs pose a significant bottleneck for Large Language Models (LLMs) and Large Multimodal Models (LMMs) in long-co…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Mechanized Foundations of Structural Governance: Machine-Checked Proofs for Governed Intelligence

We present five results in the theory of structural governance for cognitive workflow systems. Three are mechanized in Coq 8.19 using the I…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly a…

2026-05-27 13:00 JSTarXiv cs.AIエージェントロボティクス

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinfo…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries

We present a machine-checked formalization of structurally governed AI workflow architectures and prove that effect-level governance can be…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Algebraic Semantics of Governed Execution: Monoidal Categories, Effect Algebras, and Coterminous Boundaries

We present an algebraic semantics for governed execution in which governance is axiomatized, compositional, and coterminous with expressibi…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning

Frontier scientific reasoning is rapidly emerging as a key foundation for advancing AI agents in automated scientific discovery. Deep resea…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

Evaluating large language models (LLMs) today rests on fixed benchmarks that apply the same set of items to any model, producing ceiling an…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

From Feasible to Practical: Pareto-Optimal Synthesis Planning

Current computer-aided synthesis planning (CASP) methods often treat retrosynthesis as solved once a single feasible route is identified, f…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

GraphMind: From Operational Traces to Self-Evolving Workflow Automation

Complex operational workflows coordinating personnel, tools, and information are central to system operations, yet end-to-end automation re…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)

Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, a…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but their evaluations…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

Agentic discovery has shown that LLM-driven search can find novel algorithms, designs, and code under benchmark conditions. Translating the…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

Safety-critical scenarios are central to evaluating autonomous driving systems, yet their rarity in naturalistic logs makes simulation-base…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of rece…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Querying and Repairing Inconsistent Prioritized Knowledge Bases: Complexity Analysis and Links with Abstract Argumentation

In this paper, we explore the issue of inconsistency handling over prioritized knowledge bases (KBs), which consist of an ontology, a set o…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Continual Model-Based Reinforcement Learning with Hypernetworks

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dy…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Quantization is a powerful tool for accelerating large language model (LLM) inference, but the accuracy-performance trade-offs across diffe…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

Large-scale web-crawled datasets contain noise, bias, and irrelevant information, necessitating data selection techniques. Existing methods…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

The increasing complexity of Intelligent Transportation Systems (ITS) has led to significant interest in computational offloading to extern…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Yes, Q-learning Helps Offline In-Context RL

Existing offline in-context reinforcement learning (ICRL) methods have predominantly relied on supervised training objectives, which are kn…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Hands-On: Segmenting Individual Signs from Continuous Sequences

This work tackles the challenge of continuous sign language segmentation, a key task with huge implications for sign language translation a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), and especially of the p…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning

Recent advancements in multimodal slow-thinking systems have demonstrated remarkable performance across various visual reasoning tasks. How…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks by effectively utilizing a prompting strategy. H…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution

Arbitrary-scale image super-resolution aims to upsample images to any desired resolution, offering greater flexibility than traditional fix…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reaso…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Genre Controlled Music Generation via Activation Steering

Computational Music Generation is evolving towards non-conventional styles, demanding methods that enable precise and controllable blending…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection

Out-of-distribution (OOD) detection attempts to distinguish outlier samples to prevent models trained on the in-distribution (ID) dataset f…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Real-Time Progress Prediction in Reasoning Language Models

Recent reasoning language models, particularly those that employ long latent chains of thought, achieve strong performance on complex agent…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成ビジネス/資金調達

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Video generation models have achieved remarkable progress in creating high-quality, photorealistic content. However, their ability to accur…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

In-Context Learning has shown great potential for aligning Large Language Models (LLMs) with human values, helping reduce harmful outputs a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

A Physics-Informed Hierarchical Neural Network for Microwave Scattering Analysis of 3D PEC Targets

Accurate modeling of scattering from three-dimensional (3D) perfectly electrically conducting (PEC) targets at microwave frequencies consti…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

How Reliable are LLMs for Reasoning on the Re-ranking task?

With the improving semantic understanding capability of Large Language Models (LLMs), they exhibit a greater awareness and alignment with h…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Conceptual Schema Inference for Tabular Datasets using Large Language Models

Large collections of tabular data from data lakes, web tables and open data portals often originate from heterogeneous sources, leading to…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards

Reinforcement learning with verifiable rewards (RLVR) is a practical, scalable way to improve large language models on math, code, and othe…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Scalable GANs with Transformers

Scalability has driven recent advances in generative modeling, yet its principles remain underexplored for adversarial learning. We investi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

When LLMs Benchmark Themselves: Deconstructing Self-Bias in Automated Evaluation

As LLMs rapidly saturate existing benchmarks, automated benchmark creation using LLMs (LLM-as-a-benchmark) -- where a model generates test…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Searching the Internet for Challenging Benchmarks at Scale

Many static benchmarks are beginning to saturate: as models rapidly improve, they achieve near-perfect scores on fixed test sets, leaving l…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

HiSpec: Hierarchical Speculative Decoding for LLMs

Speculative decoding accelerates LLM inference by using a smaller draft model to speculate tokens that a larger target model verifies. Veri…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction

Diffusion models have been used as priors for solving inverse problems. However, existing approaches typically overlook side information th…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Monte Carlo Permutation Search

We propose Monte Carlo Permutation Search (MCPS), a general-purpose Monte Carlo Tree Search (MCTS) algorithm that improves upon the GRAVE a…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Self-signals Driven Multi-LLM Debate for Efficient and Accurate Reasoning

Large Language Models (LLMs) have exhibited impressive capabilities across diverse application domains. Recent work has explored Multi-LLM…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

EconCausal: A Context-Aware Economic Reasoning Benchmark for Large Language Models

Socio-economic causal effects depend heavily on their institutional and environmental contexts. The same intervention can produce different…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Persian remains substantially underrepresented in open speech-text resources, limiting progress in multi-speaker text-to-speech (TTS), spee…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Securing Multi-Agent Systems Against Corruptions via Node Contribution Backpropagation

Multi-Agent Systems (MAS) have become a prevalent paradigm for Large Language Model (LLM) applications. However, the complex multi-agent de…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI規制/政策

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

Large-scale vision-language models, especially CLIP, have demonstrated remarkable performance across diverse downstream tasks. Soft prompts…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from Low-Dose Computed Tomography

Low-dose chest computed tomography (LDCT) captures pulmonary and cardiac structures in a single scan, enabling joint assessment of lung and…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CFG-OEC: Classifier Free Guidance with Orthogonal Error Correction

Classifier free guidance is a standard method for conditional sampling in diffusion models, but its sampling rule is not aligned with the o…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesi…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade

Extreme sensor sparsity makes full-field reconstruction a fundamentally ill-posed problem in scientific sensing,where the goal is to infer…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs

Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dep…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Mechanistic Interpretability of Antibody Language Models Using SAEs

Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

How to Square Tensor Networks and Circuits Without Squaring Them

Squared tensor networks (TNs) and their extension as computational graphs--squared circuits--have been used as expressive distribution esti…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

EHRSummarizer: A Privacy-Aware, FHIR-Native Reference Architecture for Source-Grounded EHR Summarization

Clinicians routinely navigate fragmented electronic health record (EHR) interfaces to assemble a coherent picture of a patient's problems,…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information

Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering and Reasoning

Reliable epidemiological reasoning requires synthesizing study evidence to infer disease burden, transmission dynamics, and intervention ef…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation

Effective reward design is a central challenge in Reinforcement Learning (RL) for code generation. Mainstream test-suite-level outcome rewa…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI規制/政策

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs

Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance

Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Seeing vs. Believing: Evaluating the Language Bias of Open-Source MLLMs in Counter-Intuitive Scenes

Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in mainstream visual understanding tasks, but their abili…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation

Existing circuit discovery methods rely on templated tasks with clean counterfactuals, limiting their use on diverse natural text. We adapt…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

Speech tokenizers are a key building block of fully discrete Speech LLMs. Existing tokenizers either prioritize semantic encoding, fuse sem…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Left-Right Symmetry Breaking in CLIP-style Vision-Language Models Trained on Synthetic Spatial-Relation Data

Spatial understanding remains a key challenge in vision-language models. Yet it is still unclear whether such understanding is truly acquir…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning

Auditory Large Language Models (LLMs) have demonstrated strong performance across a wide range of speech and audio understanding tasks. Nev…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

LLMs versus the Halting Problem: Characterizing Program Termination Reasoning

Determining whether a program terminates is a central problem in computer science. Turing's Halting Problem established termination as unde…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Operations Research practitioners debug infeasible models through an iterative process: inspecting Irreducible Infeasible Subsystems ( IIS)…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

ECSEL: Explainable Classification via Signomial Equation Learning

We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by th…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Graph is a Substrate Across Data Modalities

Graphs provide a natural representation of relational structure that arises across diverse domains. Despite this ubiquity, graph structure…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning

Floorplanning determines the coordinate and shape of each module in Integrated Circuits. With the scaling of technology nodes, in floorplan…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training

Large language models (LLMs) increasingly rely on external knowledge to improve factuality, yet many real-world knowledge sources are organ…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Rethinking the Trust Region in LLM Reinforcement Learning

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO)…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Emergent Causal-Geometric Dynamics Across Depth in Large Language Models

Geometric analyses of large language model (LLM) representations reveal structured variation across depth but remain fundamentally correlat…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

LLMs have recently shown strong potential in enhancing node-level tasks on text-attributed graphs (TAGs) by providing explanation features.…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Olaf-World: Orienting Latent Actions for Video World Modeling

Scaling action-controllable world models is limited by the scarcity of action labels. While latent action learning promises to extract cont…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Constructing Industrial-Scale Optimization Modeling Benchmark

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirem…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Vital Trace: Protocol-Constrained Patient-State Reasoning for Longitudinal Clinical Trajectories

Longitudinal clinical reasoning over electronic health records requires tracking evolving physiological measurements, laboratory results, a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Assessing Per-Sample Membership Inference Vulnerability without Retraining

Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted ap…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達

GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation

Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples. We show that dataset…

2026-05-27 13:00 JSTarXiv cs.AIエージェント研究/論文

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every deta…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

In environmental monitoring, data collection is often costly, sparse, and shaped by urgent public-health needs. This is particularly true f…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations:…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MedCollab: IBIS-Guided Multi-Agent Collaboration with Hierarchical Disease Relation Chains for Clinical Diagnosis

Large language models (LLMs) have shown promise in clinical diagnosis but remain limited by unreliable report generation, weak evidence gro…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Phase-Type Variational Autoencoders for Heavy-Tailed Data

Heavy-tailed distributions are ubiquitous in real-world data, where rare but extreme events dominate risk and variability. However, standar…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

Misinformation is a growing societal threat, and susceptibility to misinformative claims varies across demographic groups due to difference…

2026-05-27 13:00 JSTarXiv cs.AIロボティクス研究/論文

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manip…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Geometrically Constrained Outlier Synthesis

Deep neural networks for image classification often exhibit overconfidence on out-of-distribution (OOD) samples. To address this, we introd…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Stop Listening to Me! How Multi-turn Conversations Can Degrade LLM Reliability

Large language models (LLMs) excel on static benchmarks, but their performance across multi-turn conversations, which better reflect real-w…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

Time Series Event Detection (TSED) aims to localize semantically meaningful events in time series data, with critical applications in high-…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Early Pruning for Public Transport Routing

Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during th…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

LR-SGS: Robust LiDAR-Reflectance-Guided Salient Gaussian Splatting for Self-Driving Scene Reconstruction

Recent 3D Gaussian Splatting (3DGS) methods have demonstrated the feasibility of self-driving scene reconstruction and novel view synthesis…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing per…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Ethical Fairness without Demographics in Human-Centered AI

In ubiquitous and mobile health systems, computational models infer human states from wearable, behavioral, and physiological sensing data.…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Beyond Linearity in Attention Projections: The Case for Nonlinear Queries

Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity witho…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) connects large language models (LLMs) to external knowledge, but single-round retrieval is often insuf…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models

Evaluating the reasoning abilities of large language models (LLMs) solely from final answers can obscure failures in intermediate steps, es…

2026-05-27 13:00 JSTarXiv cs.AI画像/動画生成

Demystifying Video Reasoning

Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capa…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Alignment Makes Language Models Normative, Not Descriptive

Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling obser…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

When VLMs answer correctly, do they genuinely rely on visual information? We introduce a Tri-Layer Diagnostic Framework with three per-samp…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR

Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Understanding the Challenges in Iterative Generative Optimization with LLMs

Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execu…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Where Code Meets Natural Language: Taxonomy-Driven Information Flow Analysis for LLM-Integrated Applications

LLM API calls are becoming a ubiquitous program construct, yet they create a boundary that no existing program analysis can cross: runtime…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated P…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

OpenClaw's ClawHub marketplace hosts tens of thousands of community-contributed agent skills (49,592 in our 2026-04-04 snapshot), and recen…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation

Strategic interaction in adversarial domains such as law, diplomacy, and negotiation is mediated by language, yet most game-theoretic model…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

The ATOM Report: Measuring the Open Language Model Ecosystem

We present a comprehensive adoption snapshot of the leading open language models and who is building them, focusing on the ~1.5K mainline o…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a ne…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

Content moderation systems classify images as safe or unsafe but lack spatial grounding and interpretability: they cannot explain what sens…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

Table serialization remains a critical bottleneck for Large Language Models (LLMs) in complex table question answering, hindered by challen…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Degradation-Consistent Paired Training for Robust AI-Generated Image Detection

AI-generated image detectors suffer significant performance degradation under real-world image corruptions such as JPEG compression, Gaussi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) improves the reasoning ability of Large Language Models (LLMs), but sparse outcome re…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliabi…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Reasoning Primitives in Hybrid and Non-Hybrid LLMs: Do Architectural Differences Yield Advantages in State-Tracking and Recall?

Reasoning in large language models is often discussed as a single capability, but some of its gains may stem from simpler underlying operat…

2026-05-27 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR

Accurate transcription of handwritten mathematics is crucial for educational AI systems, yet current benchmarks fail to evaluate this capab…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning

Brain MRI underpins a wide range of neuroscientific and clinical applications, yet most learning-based methods remain task-specific and req…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Certified Purity for Cognitive Workflow Executors: From Static Analysis to Cryptographic Attestation

We present a certified purity architecture that converts governance enforcement in cognitive workflow systems from a runtime convention int…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation

Ambiguity resolution is a key challenge in multimodal machine translation (MMT), where models must genuinely leverage visual input to map a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings

Pneumonia remains a leading global cause of morbidity and mortality, particularly in low-resource settings where access to imaging, laborat…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

Representation Engineering analyses often characterize refusal using static directions extracted from terminal or pooled representations. W…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems

Dependency confusion attacks exploit a structural gap in software distribution: once a package is installed, there is no cryptographic proo…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

PHALAR: Phasors for Learned Musical Audio Representations

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard tem…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Governed Metaprogramming for Intelligent Systems: Reclassifying Eval as a Governed Effect

AI systems increasingly synthesize executable structure at runtime: LLMs generate programs, agents construct workflows,self-improving syste…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations

Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorizati…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

Post-training makes large language models less human-like

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture h…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Tool Calling is Linearly Readable and Steerable in Language Models

When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. As…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

Multi-model learning has attracted great attention in visual-text tasks. However, visual-tabular data, which plays a pivotal role in high-s…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite co…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs

Large-scale AI training is now fundamentally a distributed systems problem, and hardware failures have become routine operating conditions…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

Recent think-answer approaches in VLMs, such as Qwen3-VL-Thinking, boost reasoning performance by leveraging intermediate thinking steps be…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?

Graph neural networks (GNNs) deployed as cloud services can be stolen through model-extraction attacks, which train a surrogate from query…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT ta…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

The Context-Compliance Regime in Retrieval-Augmented Generation (RAG) occurs when retrieved context dominates the final answer even when it…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Access Timing as Scaffolding: A Reinforcement Learning Approach to GenAI in Education

In recent years, generative AI (GenAI) in educational settings has become ubiquitous in university students' daily lives, despite its poten…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CitePrism: Human-in-the-Loop AI for Citation Auditing and Editorial Integrity

Editors and reviewers are expected to ensure that manuscripts cite relevant, accurate, current, and ethically appropriate literature, yet m…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Identifiable Token Correspondence for World Models

Token-based transformer world models have shown strong performance in visual reinforcement learning, but often suffer from temporal inconsi…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative Recommenders

Recently, Generative Recommenders (GRs), characterized by a unified end-to-end framework, have exhibited astonishing potential in transform…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exh…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

Rubric-based reward shaping provides interpretable and editable reward signals for fine-tuning LLMs via reinforcement learning (RL), but ex…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

FLUIDSPLAT: Reconstructing Physical Fields from Sparse Sensors via Gaussian Primitives

Reconstructing continuous flow fields from sparse surface-mounted sensors is central to aerodynamic design, flow control, and digital-twin…

2026-05-27 13:00 JSTarXiv cs.AIエージェント

Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty

Simulation-based testing of self-driving cars (SDCs) typically relies on scripted pedestrian models that do not capture the heterogeneity a…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

A Sharper Picture of Generalization in Transformers

We study transformers' generalization behavior on boolean domains from the perspective of the Fourier spectra of their target functions. In…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AI

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate ac…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

BioFormer: Rethinking Cross-Subject Generalization via Spectral Structural Alignment in Biomedical Time-Series

Cross-subject generalization in biomedical time-series refers to training on data from some subjects and testing on unseen subjects.The key…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation

Real-time cognitive load assessment is essential for adaptive human-computer interaction but remains challenging due to limited labeled dat…

2026-05-27 12:30 JSTITmedia AI+ロボティクス

ヒト型AIロボスタートアップのアトムが30億円調達 「日本のGDPを1%アップ」目指す

ヒューマノイドAIロボットを開発するアトム(東京都江東区)は5月27日、開発着手とあわせて、シードラウンドで総額30億円を調達したと発表した。製造業や物流・運輸の現場で使えるロボットを開発し、将来の量産化を目指す。

2026-05-27 11:55 JSTITmedia AI+エージェント

Notion、新開発基盤をリリース ナレッジ共有から「AIと協働する基盤」へ

Notion Labsは、開発者向け機能群「Notion Developer Platform」を発表した。CLIや実行環境「Workers」、外部エージェント連携APIなどを提供し、NotionでAIエージェントや業務ワークフローを構築できる開発基盤だ。

2026-05-27 10:39 JSTITmedia AI+その他

選挙の公正確保を “虚偽”SNS対策が判明、AI生成動画像に改変表示義務付け 与野党案

選挙期間中のSNSでの偽・誤情報拡散対策として、与野党による選挙運動に関する協議会で検討が進められていた関連法改正案の骨子の全容が5月26日、分かった。

2026-05-27 10:30 JSTITmedia AI+その他

「小説家になろう」、AI利用状況を報告必須に 未設定だと9月から投稿不可

Web小説投稿サイト「小説家になろう」の運営は5月26日、作品創作におけるAI利用状況の設定を必須化すると発表した。6月9日に新設する設定項目で、AIの関与度に応じた4区分から選ばせる。利用度が高い作品はキーワード欄などで開示する。

2026-05-27 09:00 JSTITmedia AI+その他

マツダが統合ストレージ基盤を刷新 運用負荷低減、将来のAI活用も視野に

マツダは、デル・テクノロジーズの「Dell PowerScale」ストレージを導入し、モデルベース開発やCAD、アーカイブ用途向けの統合ストレージ基盤を構築。設計開発データの増加に対応するとともに、ストレージ総容量を約10PBへ拡大し、容量単価を従来比で約10分の1に低減した。…

2026-05-27 08:00 JSTITmedia AI+その他

AIが生んだ新たな業務、9割が「負担」 AIOpsの“不都合な実態”

ある調査によると、AIOpsを導入した情報システム担当者の約75%が業務負荷軽減を実感する一方、約9割がAIを利用することで生まれた作業を「負担」と感じていることが明らかになった。

2026-05-27 08:00 JSTITmedia AI+研究/論文

若年人口減少の中で「新卒採用支援市場」拡大 AIは採用活動をどう変える?

若年人口が減少する中で、優秀な人材をいかに確保するかが課題になっている。矢野経済研究所によると、新卒採用支援サービス市場は拡大基調にある。AIはこの市場をどう変えているのか。

2026-05-27 08:00 JSTITmedia AI+エージェント

富士通、業務の変化に合わせて進化するAIエージェント技術を開発

法改正や仕様変更が続く企業業務でAIエージェントを使い続けるには、専門人材による継続的な調整が欠かせなかった。富士通の「自己進化マルチAIエージェント技術」は、この前提をどう変えるのか。

2026-05-27 07:32 JSTTechCrunch AIエージェント

DuckDuckGo installs are up 30% as users reject being ‘force-fed’ Google’s AI Search

Google overhauled Search at I/O 2026, replacing blue links with AI agents. The backlash has been swift. DuckDuckGo app installs spiked 30%…

2026-05-27 03:33 JSTTechCrunch AIビジネス/資金調達

OpenRouter more than doubles valuation to $1.3B in a year

OpenRouter has raised a $113 million Series B led by CapitalG. Its 5x growth in usage over six months indicates the multi-AI-model future i…

2026-05-27 03:08 JSTITmedia AI+LLM/生成AI規制/政策

「AIによる権利侵害」に出版・アニメ制作会社など集う国内団体が声明 「看過できない問題」

出版社やアニメ制作会社などで構成される団体・コンテンツ海外流通促進機構(CODA)は、生成AIによる著作権侵害に関する声明を出した。AIの開発やAIサービスを提供する事業者に対し、権利の保護などを求めている。

2026-05-27 01:00 JSTTechCrunch AIロボティクス研究/論文

This startup is betting India’s gig economy can train the world’s robots

Human Archive, a startup founded by UC Berkeley and Stanford researchers, is paying gig workers in India to wear camera-equipped caps and s…

2026-05-26(772件)

2026-05-26 23:55 JSTTechCrunch AIその他

Universal Music Group and TikTok renew agreement to combat unauthorized AI music

For years, UMG has pushed platforms, streaming services, and AI companies to implement stricter content moderation policies.

2026-05-26 23:00 JSTTechCrunch AIその他

TechCrunch Disrupt 2026 Early Bird ticket rates end May 29

Save up to $410 on your TechCrunch Disrupt 2026 pass before prices increase on May 29 at 11:59 p.m. PT. Register here to join the tech epic…

2026-05-26 19:20 JSTITmedia AI+ハードウェア/半導体

ファーウェイ、半導体で「1.4nm相当」目指す 31年までに 「ムーアの法則」に代わる新法則を提唱

中国Huaweiが半導体進化の新法則「τスケーリング法則」を提唱した。従来の微細化に代わり信号遅延を圧縮しトランジスタ密度を向上させる。秋のKirinチップに独自の回路技術LogicFoldingを初適用し、2031年に1.4nm相当の密度を目指すという。

2026-05-26 14:15 JSTITmedia AI+規制/政策研究/論文

松下幸之助氏の「AI偽動画」に注意 PHP研究所が再告知 津田健次郎さんによるTikTok提訴受け

出版・教育事業を手がけるPHP研究所は5月26日、創設者・故松下幸之助氏の画像や音声を無断でAI合成した偽動画が引き続き出回っているとして、公式Xで改めて注意喚起を投稿した。

2026-05-26 13:00 JSTITmedia AI+LLM/生成AI

Gemini APIが“半額”で使える「Flex」 注意点は? 「Priority」とは何が違う?

Googleは「Gemini」のAPI向けに、新たなサービスティア「Flex」と「Priority」を追加した。Flexは標準サービスティアの半額で利用できるという。両者はどう違い、どう使い分けるべきなのか。

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative prod…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Confidence Calibration in Large Language Models

We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study sh…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and en…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction

We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive go…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment ad…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

BODHI: Precise OS Kernel Specification Inference

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Wri…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct d…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

Quantum computing devices are recognized as powerful tools for solving NP-complete problems. However, the intricacy of their modeling prese…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtim…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications

This book presents a comprehensive and systematic survey of graph theory under uncertainty, with particular emphasis on the unifying role o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization

Knowledge base (KB) embeddings aim at combining the capability of classical knowledge graph embeddings to generalize the information presen…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors

Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning

Web agents require both high-level reasoning (for task decomposition) and low-level interactions (for page elements manipulation) to conduc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

How do multi-turn reasoning systems fail? The expected answer is logical contradiction, in which the system's maintained state becomes unsa…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIロボティクス

MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics

Alzheimer's disease is a neurodegenerative disorder marked by progressive declines in memory and language that reduce independence in daily…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Dynamical Framework for Cognitive Processes Based on Transformations and Semantic Equivalence

This paper proposes a structural and dynamical framework for modeling cognitive processes within a cybernetic perspective. Cognitive states…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Spacetime Formation under Requirements: Contextual Realization and Form-Dependent Probability

Quantum cognition often explains order effects, contextuality, and violations of the law of total probability by replacing classical probab…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search

We model the interaction between a user and an AI driven recommendation system. The user initiates the process by conveying preference info…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

Reinforcement Learning from Human Feedback (RLHF) has become a key post-training paradigm for improving model quality. However, the synchro…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Stop Comparing LLM Agents Without Disclosing the Harness

This position paper argues that, for long-horizon tasks evaluated across models with comparable frontier capability, the agent execution ha…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left t…

2026-05-26 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence

Artificial agents now generate behavior rich enough to invite trust, surprise, and concern, yet our evaluation tools still privilege capabi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

Deploying machine learning in regulated financial environments -- credit risk, fraud detection, and anti-money laundering -- exposes critic…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems

Compound AI systems that chain multiple LLM calls into directed computation graphs are now the dominant architecture for production AI. Alt…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling

Learning-assisted hyper-heuristics can select among dispatching rules while preserving the feasibility and interpretability of constructive…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

Large language models achieve strong performance in language generation and knowledge-intensive tasks, yet remain limited in settings requi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Saturating Scaling Laws for Equational Discovery: A Phenomenology of Growth Dynamics in Three Toy Substrates with Two Real-World Replications

We investigate growth dynamics in deterministic equational discovery substrates. Across three toy domains (arithmetic, boolean, higher-orde…

2026-05-26 13:00 JSTarXiv cs.AIエージェントロボティクスハードウェア/半導体

Beyond Predefined Learning Objects: A Thinking-Learning Interaction Model for Up-to-Date Autonomous Robot Learning

Autonomous robots operating in open and changing environments cannot always rely on predefined inputs, outputs, and action routines. Althou…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving

Large language models (LLMs) are promising for autonomous driving, but semantics-only decision policies can yield physically unsafe behavio…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment vi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery

Large language models (LLMs), have shown strong potential in scientific discovery, yet existing methods still face substantial challenges i…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models

Large Language Models (LLMs) are predominantly governed by probabilistic frameworks in which the sum of outcome probabilities is constraine…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

Coding agents are increasingly used as iterative development partners, but most benchmarks still evaluate one specification followed by one…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

Imperfect-information games (IIGs) are challenging, as players must make decisions without fully observing the true game state. While Alpha…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Multi-step reasoning remains a central challenge for large language models: single-pass generation is efficient but lacks accuracy; tree-se…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Neuro-Inspired Inverse Learning for Planning and Control

We present a neuro-inspired framework for embodied planning and control. Building on three principles that enable fast and highly effective…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs

Current safety alignment of foundation models largely follows a \emph{one-size-fits-all} paradigm, applying the same refusal policy across…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Inference Time Context Sparsity: Illusion or Opportunity?

Sparsity has long been a central theme in LLM efficiency, but its role in context processing remains unresolved. As LLM workloads shift tow…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

EPPC-OASIS: Ontology-Aware Adaptation and Structured Inference Refinement for Electronic Patient-Provider Communication Mining in Secure Messages

Secure patient-provider messages contain clinically important communication behaviors that are difficult to characterize manually at scale.…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

A Sober Look at Agentic Misalignment in Automated Workflows

We study a class of emergent misalignment in multi-agent systems (MAS), with a focus on automated workflows, which we refer to agentic misa…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinfo…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks

As Large Language Models (LLMs) transition from research environments to production deployments, evaluating their performance against stric…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Large Language Models (LLMs) are increasingly deployed as autonomous agents that reason, use tools, and act over multiple steps. Yet most h…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

How Well Do Models Follow Their Constitutions?

Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a)…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Toward Enactive Artificial Intelligence

In this paper, we advocate for incorporating enactive approaches to perception and cognition into artificial intelligence (AI). Enactive ap…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts

Sparse mixture-of-experts (MoE) language models activate only a small subset of parameters for each token, making router behavior a central…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification

We study when LLM-generated synthetic data helps low-resource multi-label patent classification, separating true synthetic value from the c…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Adaptive Human-AI Coordination via Hierarchical Action Disentanglement

Human-AI collaboration requires agents that can adapt to diverse partner behaviors and skill levels while remaining robust to unseen partne…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration

Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behavi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Distilling Game Code World Model Generation into Lightweight Large Language Models

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of autom…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A governance horizon for ethical-use constraints in open-weight AI models

Ethical constraints on open-weight AI models are both a reflection of societal concerns and a foundation for AI governance policy. They are…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from ad…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

Healthcare models are transitioning from unimodal prediction toward multimodal reasoning over heterogeneous diagnostic inputs. In computati…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Advancing Graph Few-Shot Learning via In-Context Learning

Graph few-shot learning, which aims to classify nodes from novel classes with only a few labeled examples, is a widely studied problem in g…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching

Existing language model applications struggle to meet the demand for emotionally oriented support, primarily due to their inability to main…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

JT-SAFE-V2: Safety-by-Design Foundation Model with World-Context Data

We introduce JT-Safe-V2, a large language model designed to advance the safety and trustworthiness of foundation models, extending our prev…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork

In-Context Reinforcement Learning (ICRL) has enabled foundation agents to adapt instantaneously to novel tasks, yet its efficacy in Ad-Hoc…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, obser…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver

Generalist neural routing solvers have shown great potential in solving diverse vehicle routing problems (VRPs) with a unified model. Howev…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning

Recent progress on long-horizon agentic tasks has been driven largely by scaling up individual agents through stronger models, better tools…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval

Enzyme-reaction retrieval is a fundamental problem in computational biology, underpinning enzyme characterization, reaction mechanism eluci…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems

Multi-agent LLM decision systems for portfolio management still lack a principled way to assign credit across specialist agents, remain vul…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs

Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in reasoning and generation tasks and are increasingly deployed in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Hypothesis Generation and Inductive Inference in Children and Language Models

Real world decision-making requires constructing mental models under uncertainty over evidence, over the underlying causal rules, and over…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations

Agent harness evolution improves frozen language-model agents by modifying the executable structures around them. We study this paradigm as…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Emission-Aware Reinforcement Learning for Sustainable Electric Vehicle Charging and Carbon Dioxide Reduction Under Varying Renewable Penetration

The rapid growth of Electric Vehicle (EV) adoption challenges power distribution networks through peak load spikes, voltage instability, an…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond Control-Flow: Integrating the Resource Perspective into Multi-Collaborative Process Modeling from Text

Process modeling is a sub-domain of Business Process Management (BPM) focused on the translation of process artifacts into formal models. T…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models

Efficiently updating Large Language Models (LLMs) with new or evolving factual knowledge remains a central challenge, as even parameter-eff…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful f…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Associations between echocardiographic traits and AI-ECG predictions of heart failure

Artificial intelligence-enabled electrocardiography (AI-ECG) can detect heart failure (HF), including disease not captured by left ventricu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection

While Deep Learning (DL) enhances automated electrocardiogram (ECG) analysis, clinical deployment is hindered by class imbalance and the ge…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Learning to Reason Efficiently with A* Post-Training

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inferen…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents

Large language model (LLM) agents excel at solving complex long-horizon tasks through autonomous interaction with environments. However, th…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントハードウェア/半導体

Agent-as-Peer-Debriefer: A Multi-Agent Framework with Perspective-Based Refinement for Qualitative Analysis

Large language models (LLMs) are increasingly used for qualitative data analysis (QDA), yet their outputs often miss the depth and nuance o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology

We develop a rigorous algebraic framework for deep convolutional architectures, CNNs, ResNets, and encoder--decoder networks such as UNet,…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration

While large language models (LLMs) hold transformative potential for medicine, their reasoning robustness and safety in real-world clinical…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models

Rapid advances in audio-video (AV) generation have enabled high-fidelity synthesis with synchronized sound, particularly for human-related…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Beyond Inference-Only Deployment: Comparing Weight-Based Consolidation Against Cascading Compaction

Major LLM platforms deploy models in an inference-only configuration: the model serves requests but never updates per-user weights. Users m…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

LLMs have achieved remarkable success in complex reasoning tasks, yet current evaluation approaches predominantly rely on final-answer corr…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

When Mean CE Fails: Median CE Can Better Track Language Model Quality

Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during training. We examin…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care

Speech and language technologies offer valuable opportunities for supporting mental health assessment through objective and interpretable c…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

As large language models (LLMs) are increasingly integrated into emotionally sensitive domains, the structural integrity of their emotional…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional

Most reported gains on agentic-LLM clinical benchmarks are often attributed to prompt engineering, yet our results suggest that larger impr…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Fundamental Limitation in Explaining AI

While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Hylos: Operability Contracts for Model-Native Spatial Intelligence

Foundation models can increasingly describe, reconstruct, and generate 3D objects, assemblies, scenes, and environments, but visually plaus…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models

Speech monologues recorded in naturalistic settings provide opportunities to characterize mental illness phenomenology and detect symptom e…

2026-05-26 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

Proper Scoring Rules for Agentic Uncertainty Quantification

Language-model agents increasingly emit uncertainty signals throughout a trajectory, but existing agentic UQ evaluations often conflate ran…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Uncertainty Decomposition via Cyclical SG-MCMC and Soft-label Learning for Subjective NLP

Annotator disagreement in emotion classification reflects ambiguity intrinsic to emotion concepts and is essential for predictor-quality as…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot:…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

GRAIL: AI translation for scientists application workflow on satellite data

Domain scientists increasingly develop Python scripts to analyze satellite imagery but they lack scalability to large-scale data. This pape…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, off…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

CoRe-Code: Collaborative Reinforcement Learning for Code Generation

Large language models (LLMs) have achieved strong performance in code generation, but most methods rely on autoregressive decoding without…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities

Manufacturing has passed through four widely recognized paradigms - mechanization, electrification, programmable automation, and Smart Manu…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Test-Time Deep Thinking to Explore Implicit Rules

With the continuous advancement of Large Language Models (LLMs), intelligent agents are becoming increasingly vital. However, these agents…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning

While general-purpose Large Language Models (LLMs) applied to Geology often hallucinate when reasoning about subsurface structures and deep…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Solving Combinatorial Counting Problems with Weighted First-Order Model Counting

Combinatorial counting problems pervade artificial intelligence, statistics, and discrete mathematics. Whether the task is enumerating subs…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning

Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs) on text-attribut…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

The widespread integration of Large Language Models (LLMs) necessitates rigorous and systematic safety evaluation. Existing paradigms eithe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TaBIIC2: Interactive Building of Ontological Taxonomies using Weighted Self-Organizing Maps

Ontologies represent the conceptual knowledge of a domain. At the core of an ontology is the taxonomy of concepts and subconcepts that repr…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents

Proactive task-oriented agents must autonomously anticipate user needs, identify actionable opportunities, and trigger software actions at…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Noise-Robust Financial Numerical Entity Attribute Tagging

Financial Numerical Entity (FNE) understanding aims to recover the meaning of numerical mentions in financial reports. Existing studies pri…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Energy Shields for Fairness

Runtime fairness is not a one-time constraint but a dynamic property evaluated over a sequence of decisions. To ensure fairness at runtime,…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Towards Multi-Turn Dialog Systems for Industrial Asset Operations and Maintenance

Industrial asset operations and maintenance question answering is inherently multi-turn, iterative, and highly dependent on external tool i…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Langu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

NeurIPS: Neuro-anatomical Inductive Priors for Sphere-based Brain Decoding

Current fMRI decoders face a performance-fidelity trade-off where efficient ID encoders outperform geometrically faithful surface-based mod…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients

Chronic dermatologic diseases such as pemphigus require long-term follow-up, generating extensive longitudinal clinical documentation that…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AION: Next-Generation Tasks and Practical Harness for Time Series

Time series research is moving beyond fixed forecasting benchmarks toward realistic tasks that combine prediction, contextual reasoning, to…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Evolutionary Enhanced Multi-Agent Reinforcement Learning for Cooperative Air Combat

As modern air combat evolves toward beyond-visual-range (BVR) multi-aircraft cooperative engagements, autonomous decision-making for unmann…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection

Autonomous driving stacks must pick one trajectory from a multi-modal candidate set; choosing by model confidence ignores safety, traffic-l…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

Reliably knowing when a language model is correct is almost as important as being correct. We introduce prover-verifier deliberation (PVD),…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling

Test-time scaling improves language model reasoning by spending additional compute to explore multiple solution trajectories. The key chall…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

Representation Without Control: Testing the Realization Effect in Language Models

Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitiv…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluatio…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

SpecAlign: A Semantic Alignment Framework for SystemVerilog Assertion Generation

Existing Large Language Model (LLM) approaches to SystemVerilog Assertion (SVA) generation primarily focus on syntactic validity and formal…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントハードウェア/半導体

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error prop…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Boosting Inference with Guided Reasoning: Stochastic Exploration for Recursive Models

Recent work on recursive architectures has shown that tiny neural networks can be surprisingly powerful on structured reasoning tasks. The…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

LipoAgent: Coordinating Fine-Tuned LLM Agents for Safer Lipid Design

Lipid nanoparticles (LNPs) are among the most clinically mature platforms for nucleic acid delivery, yet designing lipids that are both eff…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

Aligning AI systems with organizational decision-making is typically framed as a single-target problem: make the model behave like the orga…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

While aggregate leaderboard scores drive AI development, they contain substantial measurement noise whose sources and magnitudes remain unq…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis

While LLMs excel at reasoning over prompts using static pretrained knowledge, they struggle significantly with context learning-the ability…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models

Large language models often generate confident but incorrect answers rather than abstaining when uncertain. This problem is particularly ac…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Towards end-to-end LLM-based censoring-aware survival analysis

Objective: Survival analysis is central to medical prediction, yet large language models (LLMs) are rarely used as end-to-end survival mode…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

CODESKILL: Learning Self-Evolving Skills for Coding Agents

Coding agents produce rich trajectories while solving software-engineering tasks. To enable agent self-evolution, these trajectories can be…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

The rapid evolution of large language model (LLM)-driven autonomous agents has given rise to OpenClaw, a new class of open-source agent fra…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

Electrocardiography (ECG) is central to cardiovascular care, but conventional AI models are often restricted to common arrhythmias and may…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ATWL: A Formal Language for Representing, Comparing, and Reusing Visual Analytics Workflows

Visual analytics (VA) workflows are inherently complex, involving data transformation, feature engineering, visual representation, and huma…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Credit Assignment with Resets in Language Model Reasoning

Contemporary reinforcement learning with verifiable reward methods post-train language models on multi-step reasoning by assigning a single…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

What Gets Cited: Competitive GEO in AI Answer Engines

AI answer engines generate answers from retrieved pages but cite only a few sources. This makes visibility depend not just on ranking, but…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs

Multimodal Large Language Models (MLLMs) excel at structural reasoning yet suffer from a sharp logical brittleness in structural consistenc…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ADMFormer: An Adaptive-Decomposition Transformer with Time-Varying Masked Spatial Attention for Traffic Forecasting

Accurate traffic forecasting is essential for intelligent transportation systems, supporting a wide range of real-world applications. Howev…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

PHGNet: Prototype-Guided Hypergraph Construction for Heterogeneous Spatiotemporal Forecasting

As a core task in intelligent transportation systems, traffic forecasting plays a critical role in urban traffic management. Accurate traff…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching

Optimizing the trade-off among predictive performance and computational cost is a central focus in the deployment of Large Language Models…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Uncertainty Reasoning with Large Language Models for Explainable Disease Diagnosis

Clinical decision-making requires reasoning over incomplete, imprecise, and linguistically expressed patient narratives. While large langua…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

Chain-of-thought (CoT) reasoning improves the problem-solving ability of large language models (LLMs), but generated reasoning traces may n…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations

World models enable agents to predict future dynamics conditioned on actions, making the choice of latent representation central to plannin…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering,…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

Autonomous AI agents increasingly issue side-effect-bearing actions: database mutations, refunds, payments, external commitments. We propos…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing c…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

Most of the world's offshore wind resource lies in waters too deep for fixed-bottom foundations, making floating offshore wind turbines (FO…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Learning to Search and Searching to Learn for Generalization in Planning

Combinatorial generalization remains a central challenge in Deep Reinforcement Learning (DRL). Classical planning provides a simple yet cha…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Deep Dive into Axiomatic Design -- Part I: Problem Formulation

Problem formulation translating customer needs and constraints into a minimum set of independent first-level functional requirements, is ar…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Agent-Centric Social Trajectory Prediction: A Free Energy Principle Perspective

Trajectory prediction methods have demonstrated remarkable capabilities in capturing complex motion patterns. However, existing methods rel…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs

Early-warning models built from Learning Management System (LMS) logs aim to predict end-of-course outcomes early enough to enable timely l…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network

Agent-to-Agent (A2A) networks enable autonomous AI agents to collaborate by sharing reusable problem-solving instructions. However, how the…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning

Vision-language models (VLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex multimodal tasks, but their large para…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

From Accounting to Coordination: A Virtual Water-Aware Electricity-Computation-Water Nexus Framework for Data Center Dispatch

The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites.…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

$D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safe…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Explore Before You Solve: The Speed--Depth Trade-off in Epistemic Agents for ARC-AGI-3

We systematically investigate all 25 public ARC-AGI-3 games and find that every one is reachable through non-intelligent strategies: 10 in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

LECTOR: Joint Optimization of Scientific Reasoning Graphs and Introduction Generation

AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper writing…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Neural Scalable Symbolic Search Framework for Complex Logical Queries with Multiple Free Variables

Complex Query Answering (CQA) is a fundamental knowledge representation and reasoning task over incomplete knowledge graphs (KGs). Answerin…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities

Urban representation learning encodes complex urban environments into general-purpose embeddings for diverse downstream tasks and emerging…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

L2IR: Revealing Latent Intent in Graph Fraud Detection

Graph fraud detection has long depended on Graph Neural Networks (GNNs) to propagate and aggregate information across relational data. A cr…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Retrying vs Resampling in AI Control

AI coding scaffolds like Claude Code and Codex use \textit{retrying}: blocking actions flagged as risky and continuing the trajectory. We s…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

VeriTrace: Evolving Mental Models for Deep Research Agents

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's dig…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent,…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

LETS Forecast: Learning Embedology for Time Series Forecasting

Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise fut…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Tokenizer fertility varies 1.6x across foundation models on Ukrainian legal text, yet this cost-critical dimension is absent from model sel…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

We show that singular value decomposition of the lm_head} weight matrix of a transformer-based large language model -- requiring only five…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

AI-Driven Alpha Decay: Algorithmic Homogenization, Reflexive Signal Erosion, and the Paradox of Intelligent Markets

We show that AI-driven investment strategies are inherently self-defeating at scale. As AI adoption rises, three mutually reinforcing chann…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

Information fusion is used widely to improve document classification by the integration of multiple data sources (multimodal) or representa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Raon-Speech Technical Report

We present Raon-Speech, a top-performing 9B-parameter speech language model (SpeechLM) for English and Korean speech understanding, answeri…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

VineLM: Trie-Based Fine-Grained Control for Agentic Workflows

Agentic workflows interleave configurable LLM stages with tool stages and often include retries or refinement loops. Existing workflow mana…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Agent-Facing Information Design in LLM Tool Registries

LLM tool registries function as unregulated advertising platforms: providers write free-text descriptions that agents use for selection, ye…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Artificial Effort

Real-effort tasks, in which participants perform cognitively costly activities whose outcomes depend on actual performance, are widely used…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework

This study seeks to determine the authority signals used by Anthropic's Claude AI in its presentation of sources when answering consumer he…

2026-05-26 13:00 JSTarXiv cs.AI規制/政策

High-Risk AI Systems and the Problem of Identity in the European AI Act

The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning

Intelligent tutoring systems increasingly provide automated feedback on student work, but robust feedback requires assessing reasoning, not…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

KT4EQG: Personalized Exercise Question Generation via Knowledge Tracing

Educational Question Generation (EQG) aims to synthesize customized exercise questions that enhance student learning. An effective EQG syst…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI-Driven Controlled Environment Agriculture as Resilient Infrastructure for U.S. Fresh-Produce Supply Chains

Climate volatility, regional production concentration, labor constraints, cyber risk, and dependence on long-distance fresh-produce supply…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SODE: Analyzing Social Dynamics in LLM Agents

As Large Language Models (LLMs) evolve into interactive agents, understanding their behavioral alignment within human social dynamics becom…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs

Audio Large Language Models (ALLMs) are highly vulnerable to real-world noise, which often induces severe semantic drift and hallucinations…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

AI in the Enterprise: How People Use M365 Copilot Chat

M365 Copilot is used every week by millions of people across more than a million companies around the world as part of their workflows. Uni…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation

The design of RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Despite…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Multi-market value-stacking: Battery control for combined imbalance participation and non-uniform FCR bidding

The growing share of Renewable Energy Sources (RES) in modern power systems increases both grid imbalances and frequency deviations, reinfo…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling

Optimization modeling serves as the pivotal bridge between natural-language problem descriptions and optimization solvers, and remains a co…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Sensing Intelligence as a Trainable Metamaterial Property

In biological systems, sensing is not performed by the brain alone: the body deforms, vibrates, and filters external stimuli before they ar…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Metacognition Should Be the Scientific Framework for Bounded and Effective Self-Governance in Generative AI

Generative AI research increasingly confronts a shared problem: systems must sustain yet govern their own generative activity when uncertai…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

Industrial anomaly detection has attracted significant attention as a fundamental challenge in industrial systems. The rapid advancement of…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

Memory is a fundamental component for enabling long-context LLM agents, supporting persistent state across interactions through a continuou…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A World Model of Radiologist Reading for Medical Image Representation Learning

Radiologist eye-tracking data provide a rich record of how experts search, compare, and accumulate evidence during image reading; yet, exis…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Nano World Models: A Minimalist Implementation of Future Video Prediction

World models have become a central paradigm for learning predictive simulators that support generation, planning, and decision-making. Yet,…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

RAW: Robust Avatar Watermarking -- Benchmarking and Baseline

Digital avatar watermarking presents unique challenges: avatars are routinely post-processed with background replacement, reframing, and fo…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines

Self-supervised learning (SSL) has emerged as a promising paradigm for addressing the annotation bottleneck in medical imaging by learning…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning task…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL

Recent advances in one-step text-to-image generation have enabled real-time synthesis with remarkable efficiency and quality. Previous rein…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

Harnessing AtomisticSkills for Agentic Atomistic Research

Computational materials science and chemistry span vast knowledge domains and fractured software ecosystems. Although large language models…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Remote sensing data imputation using deep learning for multispectral imagery

Remote sensing techniques have been increasingly utilised in aquatic applications in recent years. A common challenge in using optical sate…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SA-Kura: An Energy-Efficient Systolic Array Accelerator for Locally-Coupled Kuramoto Drift in Diffusion Sampling

Diffusion inference remains costly for edge deployment, yet existing accelerators focus almost exclusively on score networks because standa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments

Advancements at the intersection of computer vision and natural language processing are crucial for applications like assistive tech, multi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

WTKO-CNN: Deep Learning Reveals Sequence Motifs Distinguishing Wild-Type and Knockout ATAC-seq Peaks

Chromatin regulators can alter transcriptional programs by modifying the accessibility of regulatory DNA elements. Understanding how regula…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Mode-as-Sequence: Translating Multimodal Motion Prediction into Unified Sequential Mode Modeling

Multimodal motion forecasting is inherently under-supervised: each training scene provides only one realized future, yet multiple plausible…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation

Neural operators serve as fast, data-driven surrogates for scientific modeling but typically rely on a monolithic, single-pass inference pr…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Hidden-State Privacy Has an Empty Middle

Of $1{,}536$ Gaussian release covariances we tested for single-layer hidden-state privacy, zero achieve both moderate utility and moderate…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

Scientific discovery is a closed-loop process in which hypotheses guide data acquisition and observations refine the hypothesis space. Yet…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?

Protein-ligand modeling underpins computational drug discovery and molecular design. Existing protein-ligand benchmarks typically evaluate…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Mixture of Complementary Agents for Robust LLM Ensemble

Multi-AI collaboration, such as ensembling or debating large language models (LLMs), is a promising paradigm for aggregating information an…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries

Skill libraries allow LLM agents to load task-specific instructions on demand, letting non-expert users solve domain-specific tasks through…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large langua…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions

Real-world time-series data in industrial sensing, healthcare, and energy systems is often corrupted by a mixture of Gaussian noise and occ…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Feature Lottery? A Bifurcation Theory of Concept Emergence

Neural networks acquire structured representations at specific moments during training, yet identifying these transitions typically relies…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-speci…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

We present a three-step recipe for identifying attention-head circuits in pretrained transformers. A per-head spectral signal -- the time-i…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

Human-body communication (HBC) is a promising physical substrate for wearable body-area networks because it can localize communication arou…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion

Hyper-relational knowledge graphs (HKGs) effectively represent complex facts. While inferring new knowledge in HKGs is a critical problem,…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

The rise of tool-using Large Language Model (LLM) agents, standardized by protocols like the Model Context Protocol (MCP), has unlocked unp…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Not All Transitions Matter: Evidence from PPO

Training a reinforcement learning agent on-policy means collecting fresh experience at every update, and that experience comes with a hidde…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language model…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

Shapley additive explanations (SHAP) are widely recognised as computationally intractable for neural networks, since they induce an exponen…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Time is Here for Just-in-Time Systems: Challenges and Opportunities

Core systems like key-value stores have historically taken years to build, and are designed to be general so as to amortize cost across dep…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

Rapid and accurate flood extent mapping from Remote Sensing data, such as Synthetic Aperture Radar (SAR), is critical for operational disas…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-co…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Empirical Analysis and Detection of Hallucinations in LLM-Generated Bug Report Summaries

Large Language Models (LLMs) are increasingly used to generate summaries of software bug reports, including sections such as Steps-to-Repro…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development

Large Language Models (LLMs) are increasingly applied to software engineering (SE), yet their potential for autonomous, role-oriented colla…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation

Effective skills-aware talent recommendation must balance behavioral transition patterns, trajectory-sensitive adaptation, and inspectable…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationship…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

Large language models are increasingly used for vulnerability detection, yet their reliability under different prompt formulations remains…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Extracting Training Data from Diffusion Language Models via Infilling

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autor…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Human-AI Collaboration in Science at Scale: A Global Large-scale Randomized Field Experiment

Collaboration is the defining mode of modern science, yet its core mechanism -- feedback -- remains hard to observe, difficult to scale, an…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery

We introduce AvalancheBench, a benchmark for evaluating enterprise data agents through \emph{latent world recovery}. AvalancheBench improve…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

The neural-network denoising functions which form the backbone of image diffusion models are remarkably consistent in their generalization…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation

Analogies help learners understand unfamiliar concepts by relating them to known concepts. Despite recent advances, large language models (…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction

Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and l…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild

Evaluation harnesses are software systems that orchestrate model evaluation by managing model invocation, data loading, metric computation,…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence

Many existing Artificial Intelligence (AI) solutions on mobile devices rely on an extensive collection of sensitive data, raising privacy c…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer

In 3D scene understanding, deep learning models rely on large models and extensive training to capture basic geometric structures that are…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation

Many automated labeling pipelines classify inputs into categories defined by a written specification, content moderation being a prominent…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Attested Tool-Server Admission: A Security Extension to the Model Context Protocol

The Model Context Protocol (MCP) standardizes how a large-language-model (LLM) agent and an external tool server exchange messages, but not…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval

Digital pathology archives increasingly contain multiple whole-slide images (WSIs) per case, capturing spatially distinct tumour regions an…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

An Interactive Paradigm for Deep Research

Recent advances in large language models (LLMs) have enabled deep research systems that synthesize comprehensive, report-style answers to o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection

Android malware detectors often degrade after deployment because of concept drift, while full retraining at each maintenance step is costly…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering

Which fine-tuning signals improve patent embedding models, and do gains transfer across patent landscapes? We benchmark 22 embedding models…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

The growing use of Large Language Models (LLMs) for automated code generation has enhanced software development efficiency, but often at th…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Enhancing Reliability in LLM-Based Secure Code Generation

Large language models (LLMs) are widely used for code generation, but their security reliability remains inconsistent across languages and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views

Articulated object reconstruction from sparse-view images is an ill-posed problem that requires simultaneous inference of geometry and unde…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale

Standard accuracy on binary reasoning benchmarks hides critical failure modes: prior collapse, inconsistency under paraphrase, and inabilit…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training

The rapid scaling of large language model training requires distributing GPU resources across multiple data center buildings and regions. W…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

Estimating individual treatment effect (ITE) from observational graph data is crucial for decision-making in the fields such as commerce an…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approache…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Side-by-side Comparison Amplifies Dialect Bias in Language Models

Language models (LMs) can exhibit systematic biases against speakers based on variations in their dialects, even in the absence of a dialec…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MX-SAFE: Versatile Inference- and Training-Proof Microscaling Format with On-the-Fly Exponent and Mantissa Bit Allocation

As the demand for deep learning grows, cost reduction through quantization has become essential for both training and inference. In 2022, t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation

Recent vision-language model (VLM)-based approaches have achieved impressive results on image vectorization tasks. However, they are typica…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Generative OOD-regularized Model-based Policy Optimization

We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Batch Normalization Amplifies Memorization and Privacy Risks

Batch Normalization (BN) is widely adopted to enable faster convergence and more stable training of deep neural networks. However, its impa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Momentum Streams for Optimizer-Inspired Transformers

The residual update of a pre-norm Transformer layer admits an interpretation as one step of a first-order optimizer acting on a surrogate t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Code2UML: Agentic LLMs with context engineering for scalable software visualization

Large Language Model (LLM)-based code analysis tools are adopted to automate software documentation tasks. However, the scalability of thes…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems

The integration of fairness and privacy in centralized data-driven applications is critical, especially as these systems increasingly influ…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery

Automatically mapping and segmenting global mining footprints using remote sensing and deep learning is critical for monitoring the socio-e…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Robust Fuzzy Multi-view Learning under View Conflict

Trusted multi-view classification aims to deliver reliable fusion for accurate predictions and has recently attracted substantial attention…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

FoodMonitor: Benchmarking MLLMs for Explainable Compliance Analysis

As AI-powered compliance monitoring becomes increasingly important in public governance and industrial safety, the ability to provide verif…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

{\Phi}-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation

Latent video diffusion models generate videos by progressively transforming Gaussian noise into realistic samples conditioned on text or vi…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Adaptive Punishment for Cooperation in Mixed-Motive Games

Mixed-motive scenarios are ubiquitous in real-world multi-agent interactions, where self-interested agents often defect for immediate rewar…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers

The quadratic complexity of self-attention in Transformer models remains a significant bottleneck for processing long sequences and deployi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TRAFA: Anticipating User Actions to Reduce Errors in Procedural Tasks with Predictive Feedback

Interactive assistance systems typically provide feedback after an action has been completed, supporting error recovery but not preventing…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Is Decentralized AI Governable? From Regulative Policy to Constitutive Protocol

Every major framework for governing artificial intelligence presupposes an identifiable entity -- a developer, deployer, or operator -- who…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruct…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI-Driven Adaptive Adversaries and the Erosion of Cryptographic Trust in Public Key Systems

This paper examines the erosion of Public Key Cryptography (PKC) security under adaptive adversarial optimisation driven by artificial inte…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Rethinking Federated Unlearning via the Lens of Memorization

Federated learning (FL) increasingly needs machine unlearning to comply with privacy regulations. However, existing federated unlearning ap…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

PEDESTRIANQA: A Benchmark for Vision-Language Models on Pedestrian Intention and Trajectory Prediction

Pedestrian intention and trajectory prediction are critical for the safe deployment of autonomous driving systems, directly influencing nav…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training

Despite the central role of optimization in deep learning, most optimizers rely on update structures whose functional form is fixed before…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m

Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

LAPLEX: The FFT of Learnable Laplace Kernels

Fast linear algebra in deep learning usually comes with a choice: fixed geometry and exact computation, as in the Fourier transform, or ada…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Correcting Visual Blur Induced by Attention Distraction to Reduce Hallucinations: Algorithm and Theory

Multimodal large language models (MLLMs) frequently suffer from object hallucinations, yet the visual perceptual mechanism underlying this…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Catching MRI outliers: unsupervised detection and localization of MRI artefacts and clinical anomalies using deep learning

Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning

Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Measuring the Depth of LLM Unlearning via Activation Patching

Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whethe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Phase-Aware Wavelet-Based-Scattering Encoder-Decoder for Dense Predictions

Scattering transforms achieve Lipschitz stability and translation invariance, but dense prediction tasks require preserving spatial structu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

Minority sampling aims to generate low-density instances on a data manifold and is of central importance in applications such as medical di…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the na…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

DisDop: Distillation with Domain Priors for Open-Vocabulary Aerial Object Detection

With the widespread application of drones in recent years, object detection of aerial images has attracted increasing attention, especially…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks

Recurrent Neural Networks (RNNs) can learn to predict Signal Temporal Logic (STL) verdicts online from partial trajectories, but deploying…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

How Many Tools Should an LLM Agent See? A Chance-Corrected Answer

Before an LLM agent can use a tool, a retrieval system must decide which candidate tools to show to the agent. How long should that shortli…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CyBOKClaw: Human-in-the-Loop CyBOK Mapping for Cybersecurity Curriculum

This paper presents CyBOKClaw, an interpretable human-in-the-loop retrieval framework for mapping cybersecurity keywords or phrases (KWoPs)…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

VaaWIT: Visual-Aware Adaptation of Large Language Models for Multilingual Web Image Translation

Translating text embedded in Web images is crucial for improving content accessibility and cross-lingual information retrieval, particularl…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. Ho…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs

Multimodal Attributed Graph Learning (MAGL) integrates intrinsic node attributes with structural topology via graph aggregation. However, a…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

HoloFair: Unified T2I Fairness Evaluation and Fair-GRPO Debiasing

Text-to-Image (T2I) models have made significant strides in visual realism and semantic consistency, yet they often perpetuate and amplify…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

Diffusion large language models promise faster generation by refining many token positions in parallel, but this parallelism introduces a h…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering

Large language models (LLMs) and time-series language models (TSLMs) are increasingly applied to time-series question answering (TSQA). Unl…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

World-State Transformations for Neuro-symbolic Interactive Storytelling

Large Language Models (LLMs) have changed the possibilities of Interactive Storytelling systems that process free-text user input. However,…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring

Current approaches to AI compliance treat conformity as a binary, audit-time verdict rather than a continuous, measurable property of produ…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Motion-Compensated Weight Compression

Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Spectral Retrieval: Multi-Scale Sinc Convolution over Token Embeddings for Localized Retrieval in LLM Multi-Agent Systems

[Abridged] - Spectral Retrieval is a plug-in re-ranking stage that interpolates between per-token MaxSim and mean-pool retrieval through a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Leveraging pretrained RGB denoisers for hyperspectral image restoration

Hyperspectral image restoration faces several challenges, including limited training data, strong sensor specificity, and high spectral dim…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

From Theory to Decision Rule: Calibrating the Noisy-Label Crossover for Vision-Language Model Weak Supervision Across Three Medical-Imaging Benchmarks

Classical noisy-label theory predicts that downstream performance under weak supervision is bounded above by the labeler's accuracy, implyi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Complement Submodular Information Measures for Balanced and Robust Data Selection

Submodular optimization has become a fundamental paradigm for data selection, retrieval, summarization, and representation learning due to…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Long-horizon LLM inference turns the key--value (KV) cache into the dominant GPU memory consumer and makes per-token attention increasingly…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

Parameter-Efficient VLMs for Gastrointestinal Endoscopy: Medical Image Generation and Clinical Visual Question Answering

The major limitations of gastrointestinal (GI) endoscopy AI systems arise from a shortage of annotated data, strict privacy policies, and s…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Divide-and-Conquer Inference for Large-Scale Visual Recognition with Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities across a wide range of vision language tasks. However, when…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models

Large audio and language models have recently demonstrated zero-shot reasoning capabilities across various domains. However, it remains unc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Disentangled Double Machine Learning for Accurate Causal Effect Estimation

Confounding bias is a key challenge in causal effect estimation from observational data. Double Machine Learning (DML) addresses this issue…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Multiscale Real-Time Object Detection in the NMS-Free Era: A Comparative Performance Evaluation of YOLOv8 and YOLO26

Non-Maximum Suppression (NMS) remains a key post-processing step in many real-time object detection pipelines, but it can introduce latency…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection

Large language model (LLM) safety classifiers such as Llama Guard are effective at detecting overtly harmful prompts but remain vulnerable…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Adversarial Error Correction for Visual Autoregressive Generation

Visual Autoregressive (VAR) models have emerged as a powerful paradigm for image synthesis by performing hierarchical next-scale prediction…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insuff…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth

Concept formation in transformer language models is depth-extended, not a single-layer event: concepts emerge gradually across a contiguous…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DBPnet: Damper Characteristics-Based Bayesian Physics-Informed Neural Network for Wheel Load Estimation

Advanced driver assistance systems (ADAS) play an important role in modern automotive intelligence, significantly enhancing vehicle safety…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Towards a Universal Causal Reasoner

Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on b…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation

Reasoning-enabled LLMs perform strongly on medical reasoning benchmarks, but it remains unclear whether these gains transfer to structured…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight

Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting

Time series foundation models (TSFMs) have recently achieved strong zero-shot forecasting performance through large-scale pretraining and r…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes

Background: Type 2 diabetes mellitus (T2DM) is increasingly recognised as a systemic disease characterised by coordinated dysfunction acros…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Explainable Multi-Task Retinal Imaging Reveals Microvascular Signals for Systemic Risk Stratification in Type 2 Diabetes: A Pilot Study

Retinal imaging provides a non-invasive window into systemic microvascular health and has emerged as a potential biomarker for systemic dis…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Quaternion Self-Attention with Shared Scores

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a singl…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

Human egocentric video captures rich manipulation demonstrations without any robot hardware, yet transferring these skills to robots remain…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Your Embedding Model is SMARTer Than You Think

Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global rep…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

Steering a language model - intervening on its internal activations to change downstream behaviour - has recently expanded beyond linear in…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

Accurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing b…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

APT-Agent: Automated Penetration Testing using Large Language Models

Penetration testing is essential to securing modern web infrastructures, yet traditional manual methods struggle to keep pace with their sc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack

Despite the strong performance of deep neural networks in modern Web and language applications, they remain vulnerable to adversarial attac…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization

Chain-of-Thought (CoT) faithfulness, i.e., whether CoTs genuinely reflect large language models' (LLM) underlying behavior, is typically ev…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cross-Domain Generalization Limits of Vision Foundation Models in Facial Deepfake Detection

The rapid evolution of generative models has enabled the creation of hyper-realistic facial deepfakes, exposing a critical vulnerability in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition

Long-tailed recognition suffers from a persistent head--tail trade-off: improving tail performance often degrades head accuracy and can inc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism

The growing interest in Temporal Graph Neural Networks (TGNNs) stems from their ability to model complex dynamics and deliver superior perf…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing

VLM-based OCR models have become the de facto choice for document parsing, as they can accurately extract page-level elements (e.g., paragr…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration

Scaling test-time compute has proven highly effective for language models, yet this opportunity remains largely unexplored for industrial C…

2026-05-26 13:00 JSTarXiv cs.AIエージェントロボティクス

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Interpretation, Learning, and Empathy as One Constraint: A Residual-Adequacy Architecture with Accountable Abstention

An agent must act on the situation before it, learn what it cannot yet represent, and model other agents well enough to coordinate. These f…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data

Inferring network-wide traffic states from sparse observations with high accuracy and trustworthy uncertainty quantification is essential f…

2026-05-26 13:00 JSTarXiv cs.AIエージェントロボティクス

Performance Comparison of Classical and Neural Sampling Algorithms for Robotic Navigation

Integrating artificial intelligence (AI) into sampling-based motion planning provides new possibilities for improving autonomous navigation…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

D3S2: Diffusion-Guided Dataset Distillation for Semantic Segmentation

Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic sets while preserving training efficacy. However, ex…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation

Large Vision-Language Models (LVLMs) extend large language models with visual understanding, but remain vulnerable to hallucination, where…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TinyFormer: Preserving Tiny Objects in YOLO-DETRHybridReal-time Detectors

YOLO-series and DETR-based detectors struggle with tiny-object detection. YOLO-style models benefit from efficient dense prediction, but th…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantizat…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Intent Signal Theory: A Computational Framework for Intent-State Control in Human-AI Interaction

Current AI interaction models treat the prompt as the primary object of exchange, omitting a critical layer: the user's latent source inten…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition

EEG-based emotion recognition holds significant promise for objective diagnosis of mood disorders. Graph neural networks (GNNs) have emerge…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cultivating Machine Intelligence: The OMEGA Shift from Top-Down Optimization to Autopoietic Cognitive Ecologies

The dominant artificial intelligence paradigm trains neural architectures via gradient descent against proxy objectives and reinforcement l…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

Background: Fine-tuning is central to adapting pre-trained Large Language Models (LLMs) to downstream tasks, but its reliance on training d…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Zi…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Multi-Agent Specification-based Metamorphic Testing of FMU-Based Simulations

In many industrial domains, the Functional Mock-up Interface (FMI) is used to exchange simulation models as Functional Mock-up Units (FMUs)…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems

Existing work on population dynamics inference often focuses on flows arising from vector fields that are the gradients of scalar potential…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Uncertainty-DTW for Sequences and Visual Tokens

Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, h…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition

We introduce "Courant", a Perceiver-based encoder-processor-decoder surrogate model that has latent features exhibiting adaptive specializa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation

Domain adaptation aims to mitigate performance degradation caused by distribution shifts between a labeled source domain and an unlabeled o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

Radiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structure…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without up…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ reg…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems

Anomaly detection in Industrial Internet of Things (IIoT) environments is essential to protect the Industrial Control Systems (ICS) and Cyb…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support

Reliable forecasting of renewable energy generation is a foundational requirement for grid stability energy trading battery scheduling and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Abduction-Deduction Entanglement: Domain Generalization via Representation Transplants

Prediction models trained under the source distribution do not generalize well to a different target distribution. A valid inference about…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media

Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing d…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

K-U-KAN: Koopman-Enhanced U-KAN for 3D Dental Reconstruction from a Single Panoramic X-ray Radiograph

A panoramic X-ray compresses a 3D jaw into a 2D strip; we aim to recover the missing depth cleanly and fast. Existing implicit neural repre…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

This study presents a methodology for constructing a clinically verified dataset of dermatoscopic images for medical informatics research.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Grow-Prune-Freeze Networks: Adaptive & Continual Learning Technique for Olfactory Navigation

Training data for olfaction is scattered through disparate, non-standardized datasets that limit the ability to build representative world…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience

Knowledge graph (KG) is an abstraction that can be extracted from text corpora and used for in-depth reasoning. Prior work has leveraged KG…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode

Formalizing legal provisions promises machine-accessible law and automated legal reasoning, and recent LLMs make it tempting to generate su…

2026-05-26 13:00 JSTarXiv cs.AIエージェントロボティクス

Beyond Killer Robots: General AI Attitudes and Public Support for Military AI in Nine Countries

AI-enabled military systems are a fixture of modern military conflict. Applications vary from autonomous drones for surveillance and attack…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Hide to Guide: Learning via Semantic Masking

Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive t…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantizati…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Continuous-Depth Field Theory for Transformer Patching and Mechanistic Interpretability

Mechanistic interpretability often uses activation patching, causal tracing, path patching, and steering directions to reveal behaviorally…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Specification-Based Code-Text-Code Reengineering for LLM-Mediated Software Evolution

Direct Code2Code transformation remains challenging to control because it can preserve surface-level syntax while introducing semantic drif…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

On the Epistemic Uncertainty of Overparametrized Neural Networks

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes pa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

We give an attribution method for neural combinatorial-optimisation (CO) policies that (i) decomposes a decision by constraint families via…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment

Two methodologies dominate current practices of benchmarking: rubric-based scoring evaluates items against predefined criteria, whereas com…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Quantifying Empirical Compute-Supervision Tradeoffs in RLVR

Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training language models, but in practice, ve…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Guess the Unified Model: How Much Can We Recover from Generated Images?

With unified model-generated images now widespread online, attributing their model of origin offers a path toward transparency and deeper i…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

First, do no harm: Breaking suicidogenic echo chambers in media recommendation

Recommender systems generally optimises user engagement, but this approach is dangerous in mental health contexts. When vulnerable users sh…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Mimir: Large-scale Multilingual Concept Modeling

Current language modeling approaches are built around tokens. Text corpora are split into tokens, and models are trained by performing comp…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning

Safe in-context reinforcement learning (ICRL) adapts online from interaction history without test-time parameter updates while controlling…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Positivity in classical enumerative geometry: a case study in synchronized AI-assisted mathematics

We study the symmetric polynomial $\prod_{\alpha\in A_{n,d}}\bigl(1+\alpha_1 x_1+\cdots+\alpha_n x_n\bigr)$ where $A_{n,d}:=\{\alpha\in\mat…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

READER: Reasoning-Enhanced AI-Generated Text Detection

Recent advances in large language models (LLMs) have made it increasingly difficult to distinguish human-written text from AI-generated con…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Neuromorphic LiDAR-based Bird's Eye View Object Detection using Energy-efficient Spiking Neural Networks

Autonomous driving perception demands accurate and efficient processing of three-dimensional sensor data under strict power constraints. Tr…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction

Effective features are crucial for predictive model performance, but creating them often requires domain expertise, limiting scalability ac…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

UWM-JEPA: Predictive World Models That Imagine in Belief Space

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactua…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While suc…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

A general tensor-structured compression scheme for efficient large language models

Large language models (LLMs) are dominated by dense linear transformations, whose storage, memory and computational overheads hinder effici…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

Neural network (NN) dynamics models and control policies achieve strong performance in robotics, but providing sound guarantees under uncer…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization

Low-dose computed tomography (LDCT) reconstruction faces a critical tradeoff between reconstruction quality and resource requirements. Whil…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

Deep learning models are vulnerable to adversarial perturbations, raising important concerns for safety-critical deployment. Empirical defe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing

AI-associated lexical shifts have been documented mainly in Scientific English. We extend this work to 34 languages in the WMT News Crawl c…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

KYA: A Framework-Agnostic Trust Layer for Autonomous Systems with Verifiable Provenance and Hierarchical Policy Composition

Observability tells operators when an agent is slow. KYA tells operators when an agent is wrong, drifting, leaking, or quietly going rogue.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

Large Vision-Language Models (LVLMs) have advanced multimodal understanding, yet their reliability is limited by hallucination, where gener…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation

Customized image editing aims to equip pre-trained diffusion models with specific visual effects using limited paired data, typically via L…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Weakly Supervised Camouflaged Object Detection Based on the SAM Model and Mask Guidance

Camouflaged object detection (COD) from a single image is a challenging task due to the high similarity between objects and their surroundi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Evo-Attacker: Memory-Augmented Reinforcement Learning for Long-Horizon Tool Attacks on LLM-MAS

While Large Language Model-based Multi-Agent Systems (LLM-MAS) demonstrate remarkable capabilities in solving complex tasks by orchestratin…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Subspace-Guided Semantic and Topological Invariant Registration for Annotation-Free Ultrasound Plane Quality Control

Reliable quality control (QC) of ultrasound images is essential for both real-time acquisition guidance and retrospective clinical audit, y…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet ex…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Autoregression-Free Neural Operators for Time-Dependent PDEs

Neural operators learn mappings from function-dependent inputs to solutions, providing an effective framework for solving partial different…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models

Large language model safety evaluation remains heavily English-centered, leaving low-resource languages under-measured even when models are…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

A Token/KV-Cache Communication Media Selection and Resource Allocation Strategy for Multi-Agent Collaboration

The convergence of large language models (LLMs) with 6G networks is fostering a paradigm of autonomous multi-agent cooperation, which in tu…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning

Existing LLM routing frameworks treat queries as independent events, neglecting the sequential nature of real-world user sessions constrain…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Binding Visual Features Point by Point

Despite success on standard benchmarks, vision language models display persistent failures on tasks involving processing of multi-object sc…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

Verbal feedback delivered by attending surgeons in the operating room plays a critical formative role in resident trainee skill acquisition…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

AI Content Moderation in Therapy Conversations

Large language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes.…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

From Simulation to Enaction: Post-trained language models recognize and react to their own generations

Language models are pretrained as passive predictors with no incentive to model the consequences of their own outputs. Post-training change…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference

Large Language Models (LLMs) are increasingly expected to operate over long contexts, yet standard softmax attention incurs a KV cache that…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

The ability to efficiently and reliably learn new tasks has been a foundational challenge in robotics. Vision-Language-Action (VLA) models…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation

Audio-driven talking-head generation has achieved remarkable progress with recent models such as AniTalker, FLOAT, and Sonic. Despite their…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Controlled Synthetic Benchmark for Educational Aspect-Based Sentiment Analysis

Educational aspect-based sentiment analysis (ABSA) can support course improvement, but public aspect-labeled student feedback remains scarc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Generative AI impacts on intra-urban inequality and skill premium in Beijing

Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-ur…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cross-Stage Attention Multi-Expert Network for Radiologist-Inspired Breast Ultrasound Diagnosis

Breast ultrasound imaging is an important noninvasive method for early breast cancer diagnosis, but automatic benign/malignant classificati…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

A Tertiary Review of Large Language Model-Based Code Generating Tasks: Trends, Challenges, and Future Directions

Context. Large language models (LLMs) are increasingly applied to code-generating tasks (CGTs) in software engineering. While reported resu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TopoAlign: Topology-Aware Visual Representation Alignment

Neural networks encode inputs as high-dimensional vectors, known as representations, that capture how models process data by encoding task-…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

'Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation Learning

Dynamic graph neural networks (DGNNs) that operate on snapshot sequences typically fall into one of two categories. \emph{Temporal-first} a…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data p…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as thos…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation

The growing complexity of quantum programming frameworks has exposed a critical limitation in existing large language model (LLM)-based cod…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Mosaic: Compositional Multi-Concept Erasure via Vector Field Blending

Concept erasure has emerged as a key research direction for ensuring safe and ethical image synthesis in Text-to-Image (T2I) models. While…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Geometric Flow Matching for Molecular Conformation Generation via Manifold Decomposition

The generation of accurate 3D molecular conformations is a pivotal challenge in computational chemistry and drug discovery. Recently, diffu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Extreme Region Policy Distillation

Reinforcement learning for large language models faces a fundamental trade-off between sample efficiency and asymptotic performance: strict…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

Acting on the Unseen: Communication-Free Collaborative Filtering for Decentralized Multi-Robot Task Allocation

Multi-robot task allocation usually assumes some combination of communication, known task models, or a coordinator. We study the opposite e…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Toward a Benchmark for Controllable Simulation of Imperfect Students with Large Language Models

Teacher education requires deliberate practice with learners who exhibit identifiable strengths, weaknesses, and partial mastery. Large lan…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Towards the Connection between Activation Sparsity and Flat Minima

The observation that activation sparsity emerges in MLP blocks of standardly trained Transformers offers an opportunity to drastically redu…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines

We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated so…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Posture Clip: Sit properly or I wont let you work

Poor posture is a significant concern due to its detrimental effects on health and productivity. This paper presents a collar-clipped devic…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insuf…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Referential Security as a New Paradigm for AI Evaluations

Security evaluations inherently depend on stable identifiers. Any finding, audit, or regulatory decision must remain attached to the specif…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Simulating Human Memory with Language Models

Language models are increasingly being deployed as user simulators, but their memory is far more reliable than that of real users. To measu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Don't Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models

Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder th…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical bene…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

How Should LLMs Consume High-Quality Data? Optimal Data Scheduling via Quality-Aware Functional Scaling Laws

High-quality data is scarce in large language model (LLM) training, yet how to schedule its use jointly with training dynamics lacks theore…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Multi-Agent Coordination Adaptation via Structure-Guided Orchestration

As large language model (LLM)-based multi-agent systems scale to handle increasingly complex tasks, balancing structural stability and dyna…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DeGRe: Dense-supervised Generative Reranking for Recommendation

In multi-stage recommender systems, reranking optimizes overall utility by capturing intra-list contextual dependencies, yet its central ch…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Benchmarking Pathology Foundation Models for Spatial Domain Understanding

Pathology foundation models (PFMs) have emerged as a core approach for learning transferable representations from whole slide images (WSIs)…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models

Concept unlearning aims to erase a target concept from a pretrained text-to-image diffusion model without retraining. Closed-form methods a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MDGMIX: Boundary-Aware Subgraph Mixing for Multi-Domain Graph Pre-Training

Multi-domain graph pre-training is a crucial step in constructing foundational graph models with cross-domain generalization capabilities.…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a s…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

NPSolver: Neural Poisson Solver with Iterative Physics Supervision

Efficiently solving Poisson equations on complex, irregular domains remains a fundamental challenge in scientific computing, as classical i…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

We study a stochastic multi-armed bandit problem where an agent is granted a free exploration budget before regret accumulates, a setting n…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustne…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution

Dirty entity resolution (ER), which identifies records referring to the same real-world entity from a single, messy dataset, is a fundament…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Fine-Tuning Over Architectural Complexity: Broad-Coverage PII Detection on PIIBench with DeBERTa

Personally identifiable information (PII) detection systems are frequently trained within narrow source or domain boundaries, limiting cove…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation

Recent vision-language-action (VLA) models and world action models (WAMs) advance robotic manipulation by enriching intermediate representa…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation

Large language models (LLMs) define a distribution over text, which can be viewed as a probabilistic representation of uncertainty: samplin…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIロボティクス

When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills

Large language models (LLMs) are increasingly used as proposal generators for evolutionary robot design, yet most loops remain memoryless:…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

This paper examines the specialization of Small Language Models (SLMs) with up to 4 billion parameters for generating artifacts in domain-s…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification

Extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports is an open-set, multi-label problem requiring both high rec…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Geometric Evolution Maps: Extracting Stable Concept Probes from Transformer Residual Streams

Concept probes extracted from transformer residual streams are only as reliable as the layer from which they are extracted. The common prac…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness i…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

Large Language Model interfaces are increasingly verbose, exposing intermediate reasoning traces alongside final answers. Traces are framed…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MuNet: A Mutualistic Network for Joint 3D Human Mesh Recovery and 3D Clothed Human Reconstruction from Single Images

3D human mesh recovery and 3D clothed human reconstruction are inherently related, yet they have long been studied in isolation, thereby ov…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express

We find a mismatch between what large language models encode about a causal question and what they answer. On anti-commonsense CLadder item…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Quantitative Evaluation of the Severity of Posttraumatic Stress Disorder through Transfer Learning from Specific Phobia Data

Posttraumatic stress disorder (PTSD) is a prevalent and debilitating mental health condition with significant personal and societal impacts…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

We here study whether training biases can make hidden neurons specialize in minimal one-hidden-layer MLPs, and whether such specialization…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory

Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

Neural PDE solvers have followed the scaling trajectory of vision and language, with recent foundation models reaching billions of paramete…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

VEN-VL: A Visual Ensemble MoE Framework for Effective and Efficient Multi-Modal Understanding

Despite the remarkable progress achieved by recent efficient methods in accelerating multimodal understanding, they still suffer from notic…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization

Despite the strong reasoning capabilities of large language models (LLMs), optimizing the execution efficiency of tensor programs remains c…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability

Large language models (LLMs) face a dual challenge in creative capability evaluation: existing benchmarks (e.g., Story Cloze Test, HellaSwa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Continual Speaker Identity Unlearning with Minimal Interference

Machine unlearning removes designated concepts or knowledge from pre-trained models. Recent work has extended this paradigm to speaker iden…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Creative Quality Alignment: Expert Tacit Knowledge Transfer via Chain-of-Thought Fine-Tuning

This paper provides an empirical implementation of the creative quality metric proposed in Calibrated Surprise (Zou & Xu, 2026a). The quest…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt Optimisation

Ensuring safe and contextually appropriate behaviour in Large Language Models (LLMs) remains a critical challenge for real-world deployment…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

AI-Assisted Systematization for Evaluating GenAI Systems

Evaluating generative AI (GenAI) systems is challenging because many targets of evaluation are broad, contested concepts, such as "reasonin…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning

Deep reinforcement learning (RL) agents commonly rely on high-dimensional neural representations, despite growing evidence that task-releva…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Retrieval-Augmented Detection of Potentially Abusive Clauses in Chilean Terms of Service

Online Terms of Service often function as contracts of adhesion, creating asymmetries that may expose consumers to potentially abusive clau…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring

Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich v…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models

Lightweight vision-language models perform competitively on standard benchmarks yet fail systematically in dense-scene reasoning, where mul…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Activation oracles aim to make the activations of other models legible to humans and yield promising results compared to white-box interpre…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneous…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

Reliable quantification of uncertainty estimates in continuous-time (CT) representation learning remains nascent, particularly within CT at…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kern…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark

Weakly supervised anomaly detection (WSAD) has developed in three primary directions: incomplete, inexact, and inaccurate supervision. Howe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Channel-wise Vector Quantization

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise to…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limita…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, togethe…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instr…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

Recent progress in GUI agents has substantially improved visual grounding, yet robust planning remains challenging, particularly when the e…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning

Reward design plays a pivotal role in the training of game AIs, requiring substantial domain-specific knowledge and human effort. In recent…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Efficient and Scalable Neural Symbolic Search for Knowledge Graph Complex Query Answering

Complex Query Answering (CQA) is a crucial reasoning task over Knowledge Graphs (KGs), which aims to answer first-order logical queries fro…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

FloorplanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations

We introduce FloorplanQA, a diagnostic benchmark for evaluating spatial reasoning in large language models (LLMs). FloorplanQA is grounded…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System

Decentralized finance (DeFi) has introduced a new era of permissionless financial innovation but also led to unprecedented market manipulat…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

From Multi-Agent Systems and the Semantic Web to Agentic AI: A Unified Narrative of the Web of Agents

The Web of Agents (WoA) transforms the document-centric Web into an environment of autonomous agents acting on users' behalf, a vision newl…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning

Large reasoning models (LRMs) combined with retrieval-augmented generation (RAG) have enabled deep research agents capable of multi-step re…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare

We develop new experimental paradigms for measuring welfare in language models. We compare verbal reports of models about their preferences…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Teaching large language models to reason like expert diagnosticians

Differential diagnosis is an iterative process that integrates patient information with broader medical knowledge. Clinical case series suc…

2026-05-26 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達研究/論文

Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents

Although recent tool-augmented benchmarks involve complex requests, evaluation remains limited to answer matching, neglecting critical traj…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Agent Learning via Early Experience

A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

False Fixed Points: Kantian Feedback, Stable Miscalibration, and Representational Compression in LLMs

High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Voting with the Graph: Stable RLAIF via Topological Consistency Maximization

Reinforcement Learning from AI Feedback (RLAIF) relies on LLM judges as preference measurement instruments, yet these instruments are funda…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports -- with Preliminary Extension to Lung Cancer

Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby su…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Rewarding Structural Conformance of Reasoning using Process Mining

Recent advances in sparse reward policy gradient methods have enabled effective reinforcement learning (RL)-based language model post-train…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Chain-of-Thought Hijacking

Large Reasoning Models (LRMs) improve task performance through extended inference-time reasoning. Although previous studies suggest that lo…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Optimizing Sensor Placement for Flow Reconstruction in Urban Drainage Networks: A Digital Twin-Based Sparse Sensing Approach

Urban flooding triggered by intense rainfall is becoming increasingly frequent and widespread. While flood prediction and monitoring in hig…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provi…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

IPR-1: Interactive Physical Reasoner

Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Actionable and diverse counterfactual explanations incorporating domain knowledge and plausibility constraints

Counterfactual explanations improve the actionable interpretability of machine learning models by identifying minimal changes required to a…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

AGI Requires a Coordination Layer on Top of Pattern Repositories

In this paper we argue that influential critiques dismissing Large Language Models (LLMs) as a dead end for AGI misidentify the bottleneck:…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems

Despite initial successes and a variety of architectures, retrieval-augmented generation systems still struggle to reliably retrieve and co…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing

Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Architecting Agentic Communities using Design Patterns

The rapid evolution of Large Language Models (LLM) and subsequent Agentic AI technologies requires systematic architectural guidance for bu…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis

Understanding urban environment change is essential for sustainable development. However, current approaches, particularly remote sensing c…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models

Although the Gradient Boosted Decision Trees (GBDTs) dominate industrial tabular applications, upgrading legacy models in high-concurrency…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs

Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing fra…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DropoutTS: Sample-Adaptive Dropout for Robust Time Series Forecasting

Deep time series models are vulnerable to noisy data ubiquitous in real-world applications. Existing robustness strategies either prune dat…

2026-05-26 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達研究/論文

Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory

Diagnosing failure patterns in Deep Research Agents (DRAs) remains a critical challenge. Existing benchmarks predominantly rely on end-to-e…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MedBeads: An Agent-Native, Immutable Data Substrate for Trustworthy Medical AI

Background: As of 2026, Large Language Models (LLMs) demonstrate expert-level medical knowledge. However, deploying them as autonomous "Cli…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Emergent Analogical Reasoning in Transformers

Analogy is a central faculty of human intelligence, enabling abstract patterns discovered in one domain to be applied to another. Despite i…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FLINGO -- Instilling ASP Expressiveness into Linear Integer Constraints

Constraint Answer Set Programming (CASP) is a hybrid paradigm that enriches Answer Set Programming (ASP) with numerical constraint processi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Recent advances in large language model (LLM) have empowered autonomous agents to perform multi-turn interactions with tools and environmen…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

Socially intelligent AI systems must entail reasoning across diverse human behavioral tasks, and generalization to new contexts. However, A…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

Genomic Foundation Models (GFMs) typically rely on Masked Language Modeling (MLM) or Next-Token Prediction (NTP) to learn the "Laws of Natu…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs

Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy,…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Characterizing Linear Alignment Across Language Models

Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data m…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Dynamic Dual-Granularity Skill Bank for Agentic RL

Agentic RL can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance an…

2026-05-26 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search

Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Re…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Learning Preference-Based Objectives from Clinical Narratives for Dynamic Sepsis Treatment

Designing reward functions for reinforcement learning (RL) in healthcare remains challenging because clinically meaningful outcomes are spa…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function c…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

The A-R Behavioral Space: Execution-Level Profiling of Tool-Using Language Model Agents in Organizational Deployment

Large language models (LLMs) are increasingly deployed as tool-augmented agents capable of executing system-level operations. While existin…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Withou…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Generative structure search for efficient and diverse discovery of molecular and crystal structures

Predicting stable and metastable structures is central to molecular and materials discovery, but remains limited by the cost of searching h…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective

This position paper argues that reliable AI requires infrastructure for human validation of implicit knowledge. AI learns from both explici…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail on financial dom…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

MEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agents

Long-running autonomous AI agents suffer from a well-documented memory coherence problem: tool-execution success rates degrade 14 percentag…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

Long-term personalized memory for LLM agents is challenging on resource-limited edge devices due to high storage costs and multimodal compl…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that operate on other agents, m…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language mode…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Scientific data processing often requires task-specific algorithms or AI models, creating a barrier for domain scientists who need to analy…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Differentiable Learning of Lifted Action Schemas for Classical Planning

Classical planners can effectively solve very large deterministic MDPs represented in STRIPS or PDDL where states are sets of atoms over ob…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

CogniFold: Always-On Proactive Memory via Cognitive Folding

Existing agent memory remains predominantly reactive and retrieval-based, lacking the capacity to autonomously organize experience into per…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology

Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangC…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?

Computer-Using Agents (CUAs) are rapidly extending large language models (LLMs) beyond text-based reasoning toward action execution in more…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

While AI-generated hallucinations pose considerable risks, the underlying cognitive mechanisms by which humans can successfully recognize o…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation in Autonomous Driving Models

We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferen…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Self-supervised Hierarchical Visual Reasoning with World Model

3D open-world environments with adversarial opponents remain a core challenge for reinforcement learning due to their vast state spaces. Ef…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

Leveraging the universal representations of pre-trained LLMs and MLLMs offers a promising path toward brain foundation models. However, vis…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity

Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task p…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

ECUAS$_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

Many safety and alignment failures of large language models (LLMs) occur due to out-of-distribution (OOD) situations: unusual prompt or res…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

LLM agents have incredible potential for scientific discovery applications. However, the performance of LLM agents on real-world, small mol…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)

Knowledge graphs have become the primary vehicle for data integration and are critical to the success of modern AI, but the diversity of KG…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression

The KV cache used in large language models has linearly growing time complexity, so LLMs face memory blow-up and reduced decoding efficienc…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

AMEL: Accumulated Message Effects on LLM Judgments

Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist un…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-vie…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

As an increasingly popular task in multimedia information retrieval, video moment retrieval (VMR) aims to localize the target moment from a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target moment semantically according to a sentence query. Alth…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization,…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot

Large Language Models (LLMs) are reshaping knowledge work, yet their impact on voluntary, self-guided open innovation forums (contributors…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation

The thematic fit estimation task measures semantic arguments' compatibility with a given semantic role for a given predicate. We investigat…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Meme Is the Message: Generative Memesis and AI Visuals in the 2024 USA Presidential Elections

Visual content on social media has become increasingly influential in shaping political discourse and civic engagement, but it also limits…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model

Pose-Guided Person Image Synthesis (PGPIS) aims to generate human images in specified poses while preserving the identity and appearance of…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerat…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Kolmogorov-Arnold Fourier Networks

Although Kolmogorov-Arnold-based interpretable networks (KANs) possess strong theoretical expressiveness, they suffer from severe parameter…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ExplainReduce: Generating global explanations from many local explanations

Most commonly used non-linear machine learning methods are closed-box models, uninterpretable to humans. The field of explainable artificia…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Pragmatic Reasoning improves LLM Code Generation

Pragmatic reasoning helps interlocutors infer intended meaning from ambiguous or underspecified messages by considering shared context and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

How does Bayesian Sampling help Membership Inference Attacks?

Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Existing state…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Subspace Aggregation Query and Index Generation for Multidimensional Resource Space Model

Organizing large-scale resources in a multidimensional semantic space is an approach to efficiently managing and querying resources from di…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning

Few-shot adaptation of vision-language models remains fundamentally limited by how negative class signals are handled at inference. Existin…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

PhySense: Sensor Placement Optimization for Accurate Physics Sensing

Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructin…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward que…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MultiPhishGuard: An Explainable and Adaptive Multi-Agent LLM System for Phishing Email Detection

Phishing email detection faces significant challenges due to evolving adversarial tactics and heterogeneous attack patterns. Traditional ap…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Music Interpretation and Emotion Perception: A Computational and Neurophysiological Investigation

This study investigates emotional expression and perception in music performance using computational and neurophysiological methods. The in…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PageLLM: A Multi-Grained Reward Framework for Whole-Page Optimization with Large Language Models

Whole-page optimization (WPO) decides how search and recommendation results are surfaced to users, and large language models (LLMs) open a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Lan…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs

Direct Preference Optimization (DPO) has emerged as a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicit…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

From Reasoning to Code: GRPO Optimization for Underrepresented Languages

Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning

Embodied Visual Reasoning (EVR) seeks to follow complex, free-form instructions based on egocentric video, enabling semantic understanding…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models

The rapid proliferation of Large Language Models (LLMs) has heightened concerns regarding their exposure to jailbreak attacks, which craft…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unma…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

As psychometric surveys are increasingly used to assess the traits of large language models (LLMs), the need for scalable survey item gener…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs

Every LLM tool call is structurally an RPC -- a function name, JSON arguments, and a serialized result -- yet each protocol (native Python,…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems

While deep learning offers tremendous promise for scientific and medical imaging, any failures and hallucinations (predictions that do not…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Page image classification for content-specific data processing

Digitization projects in humanities often generate vast quantities of page images from historical documents, presenting significant challen…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costl…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection

Malware detection in modern computing environments demands models that are not only accurate but also interpretable and robust to evasive t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Designing Singing Syllabi with Virtual Avatars: AI-Assisted Syllabus Reauthoring

Traditional syllabi often function as static reference documents rather than engaging introductions to a course. In practical teaching, we…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization

Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc. For these problems, gradient-based meth…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

MCPXKIT: The Unified Toolkit for Analyzing Model Context Protocol Security

The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools, sign…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Tool calling has emerged as a critical capability for AI agents. In contrast to conventional tool calling frameworks that rely on static, p…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis

The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hier…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning

Pruning is a typical acceleration technique for compute-bound models by removing computation on unimportant values. Recently, it has been a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Dynamic Relational Priming Improves Transformer in Multivariate Time Series

Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations i…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Equip Pre-ranking with Target Attention by Residual Quantization

The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerfu…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

What Happens Next? Anticipating Future Motion by Generating Point Trajectories

We consider the problem of forecasting motion from a single image, i.e., predicting how objects in the world are likely to move, without th…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

Large language models (LLMs) are increasingly used to help security analysts manage the surge of cyber threats, automating tasks from vulne…

2026-05-26 13:00 JSTarXiv cs.AIハードウェア/半導体

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes

Is basic visual understanding really solved in state-of-the-art VLMs? We present VisualOverload, a slightly different visual question answe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models

Recent Vision-Language-Action (VLA) models show strong generalization capabilities, yet they lack introspective mechanisms for anticipating…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Go witheFlow: Real-time Emotion Driven Audio Effects Modulation

Music performance is a distinctly human activity, intrinsically linked to the performer's ability to convey, evoke, or express emotion. Mac…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI

Real-time speech-to-speech (S2S) models excel at generating natural, low-latency conversational responses but often lack deep knowledge and…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

vAttention: Verified Sparse Attention

State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extensi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Membership Inference Attacks on Tokenizers of Large Language Models

Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when thes…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Local MAP Sampling for Diffusion Models

Diffusion Posterior Sampling (DPS) provides a principled Bayesian approach to inverse problems by sampling from $p(x_0 \mid y)$. While post…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care

Objective: Enteral nutrition (EN) delivery in the ICU remains suboptimal due to limited personalization and uncertainty regarding appropria…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Auditing medical multi-agent AI reveals risks of false consensus

Large language models are increasingly being assembled into medical multi-agent systems that emulate multidisciplinary consultation through…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Fine-grained vision-language understanding requires precise alignment between visual content and linguistic descriptions, a capability that…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI-generated podcasts: Synthetic Intimacy and Cultural Mistranslation in NotebookLM's Audio Overviews

This paper analyses AI-generated podcasts produced by Google's NotebookLM, which generates audio podcasts with two chatty AI hosts discussi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-pa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference

Vision-Language-Action (VLA) models have shown great potential for embodied AI by integrating visual perception, language understanding, an…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Understanding, Accelerating, and Improving MeanFlow Training

MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields. Yet, the un…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Asking LLMs to Verify First is Almost Free Lunch

To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving

End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger'…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction

Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-drive…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Intrinsically Interpretable Attention via Sparse Post-Training

We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible spa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AI as Equalizer or Amplifier? Task Complexity as the Moderating Factor for Human Expertise in Hybrid Intelligence Systems

A growing body of empirical research suggests that generative AI narrows performance gaps between novice and expert workers on routine task…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DynaPURLS: Dynamic Refinement of Part-Aware Representations for Skeleton-Based Zero-Shot Action Recognition

Zero-shot skeleton-based action recognition (ZS-SAR) is fundamentally constrained by prevailing approaches that rely on aligning skeleton f…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Coupled Variational Reinforcement Learning for Language Model General Reasoning

While reinforcement learning has achieved impressive progress in language model reasoning, it is constrained by the requirement for verifia…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Selection-Induced Contraction of Innovation Statistics in Gated Kalman Filters

Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

$M^3-Verse$: A "Spot the Difference" Challenge for Large Multimodal Models

Modern Large Multimodal Models (LMMs) have demonstrated extraordinary ability in static image and single-state spatial-temporal understandi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

DIVER-1: Scaling Intracranial EEG Foundation Models for Transferable Representations

Intracranial EEG (iEEG) provides direct, millisecond-scale recordings of human neural activity, but reusable representation learning is dif…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Multimodal Functional Maximum Correlation for Emotion Recognition

Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundament…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

A Comprehensive Dataset for Human vs. AI Generated Image Detection

Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts

Mixture-of-Experts (MoE) architectures scale large language models efficiently by employing a parametric ``router'' to dispatch tokens to a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions

Generating patent descriptions from scientific papers is challenging due to fundamental rhetorical and structural disparities between the t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SentGraph: Hierarchical Sentence Graph for Multi-hop Retrieval-Augmented Question Answering

Traditional Retrieval-Augmented Generation (RAG) effectively supports single-hop question answering with large language models but faces si…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation

Multimodal medical large language models have shown substantial progress in chest X-ray interpretation but continue to face challenges in s…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Extreme-value forest fire prediction A study of the Loss Function in an Ordinality Scheme

Wildfires are highly imbalanced natural hazards in both space and severity, making the prediction of extreme events particularly challengin…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning

Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains unde…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

While collaborative forecasting on distributed time series is highly desirable, directly pooling localized datasets is often impractical du…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

RiskBridge: Turning CVEs into Business-Aligned Patch Priorities

Enterprises are confronted with an unprecedented escalation in cybersecurity vulnerabilities, with thousands of new CVEs disclosed each mon…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertentl…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

QASA: Quality-Aware Semantic Augmentation for Robust Multimodal Sentiment Analysis

Multimodal large language models have demonstrated strong ability in capturing semantic representations for multimodal sentiment analysis.…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization

Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-training, but its KL regularizatio…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Pixelwise Uncertainty Quantification of Accelerated MRI Reconstruction

Parallel imaging techniques reduce magnetic resonance imaging (MRI) scan time but image quality degrades as the acceleration factor increas…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time s…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs

Existing speech editing detection (SED) datasets are predominantly constructed using manual splicing or limited editing operations, resulti…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption

Neural networks adapt through first-order parameter updates, yet it remains unclear whether such updates preserve logical coherence. We inv…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs

Vision-Language Models (VLMs) achieve strong multimodal performance but are costly to deploy, and post-training quantization often causes s…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models

Recent years have seen a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

Integrating large language model (LLM) representations into multimodal recommendation has shown promise, yet a fundamental challenge remain…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Generative Visual Code Mobile World Models

Mobile Graphical User Interface (GUI) World Models (WMs) offer a promising path for improving mobile GUI agent performance at train- and in…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These f…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Reward-free Alignment for Conflicting Objectives

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world align…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal natur…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Fine-Tuning Language Models to Know What They Know

Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems

While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. M…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Counterfactual Explanations for Hypergraph Neural Networks

Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updat…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CR…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Prism: Spectral-Aware Block-Sparse Attention

Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bo…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language m…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Krause Synchronization Transformers

Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. W…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Two-Sided Time-Independent Regret for Matching Markets with Limited Interviews

Two-sided matching platforms rely on preferences from both sides, yet participants can evaluate only a small fraction of potential partners…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Adversarial Network Imagination: Causal LLMs and Digital Twins for Proactive Telecom Mitigation

Telecommunication networks experience complex failures such as fiber cuts, traffic overloads, and cascading outages. Existing monitoring an…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CARL-CXR: Continual Adapter-Based Routing for Task-Unknown Chest Radiograph Classification

Clinical deployment of chest radiograph classifiers requires models that can be updated as new datasets become available without retraining…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling

Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constraine…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

Current audio-visual large language models (AV-LLMs) are predominantly restricted to 2D perception, relying on RGB video and monaural audio…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Any-Precision LLM

Dynamic runtime latency and memory constraints necessitate flexible large language model (LLM) deployment, where an LLM can be inferred wit…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIロボティクス

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a seq…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Topology-Driven Transferability Estimation of Medical Foundation Models for Segmentation

The advent of large-scale self-supervised learning (SSL) has produced a vast zoo of medical foundation models. However, selecting optimal m…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

MultiPUFFIN is a domain-informed multimodal foundation model for predicting thermophysical properties of small molecules, addressing a crit…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Non-Invasive Reconstruction of Intracranial EEG Across the Deep Temporal Lobe from Scalp EEG based on Conditional Normalizing Flow

Although obtaining deep brain activity from non-invasive scalp electroencephalography (sEEG) is crucial for neuroscience and clinical diagn…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Grouter: Decoupling Routing from Representation for Accelerated MoE Training

Traditional Mixture-of-Experts (MoE) training typically proceeds without any structural priors, effectively requiring the model to simultan…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Human uplift studies, or studies that measure the effects of AI access on human performance via randomized controlled trials (RCT) or simil…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Tasks

The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases specify prompts using natural language, whic…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達

Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation

Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of tra…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performan…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities sim…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Discounted Beta-Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards

Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Procedural Refinement by LLM-driven Algorithmic Debugging for ARC-AGI-2

In high-complexity abstract reasoning, a system must infer a latent rule from a few examples or structured observations and apply it to uns…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Profiling learners' affective engagement: Emotion AI, intercultural pragmatics, and language learning

Learning another language can be a highly emotional process, typically characterized by numerous frustrations and triumphs, big and small.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Beyond Static Uncertainty: Modeling Temporal Uncertainty Dynamics for Probabilistic Time Series Forecasting

Real-world time series exhibit temporally structured uncertainty: volatility clusters in turbulent regimes, dissipates in stable periods, a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates

Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal re…

2026-05-26 13:00 JSTarXiv cs.AIエージェントロボティクス

AEROS: A Single-Agent Operating Architecture with Embodied Capability Modules

Robotic systems lack a principled abstraction for organizing intelligence, capabilities, and execution in a unified manner. Existing approa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

EditCaption: Human-Refined SFT and HAE-DPO for Image Editing Instruction Synthesis

High-quality source-target image pairs with precise editing instructions are essential for instruction-guided image editing, yet constructi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations

Student dropout is a persistent concern in Learning Analytics, yet comparative studies frequently evaluate predictive models under heteroge…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

M$^\star$: Every Task Deserves Its Own Memory Harness

Large language model agents rely on specialized memory systems to accumulate and reuse knowledge during extended interactions. Recent archi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations

When LLM conversations grow beyond the context window, old content must be evicted -- but how does the model recover it when needed? We pro…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which create…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Federation over Text: Insight Sharing for Multi-Agent Reasoning

We propose a federated learning-like framework, Federation over Text (FoT), that enables multiple clients solving different tasks to collec…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods pa…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition

We study post-training W4A4 quantization in a controlled 300M-parameter SwiGLU decoder-only language model trained on 5B tokens of FineWeb-…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing

LLMs edit text and code by autoregressively regenerating the full output, even when most tokens appear verbatim in the input. We study Copy…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return sch…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support

Large language models (LLMs) are increasingly used for conversational clinical decision support, yet they conflate next token prediction wi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

Full-duplex spoken dialogue systems can model natural conversational behaviours such as interruptions, overlaps, and backchannels, yet such…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose instructor pre…

2026-05-26 13:00 JSTarXiv cs.AIエージェント研究/論文

ESIA: An Energy-Based Spatiotemporal Interaction-Aware Framework for Pedestrian Intention Prediction

Recent advances in autonomous driving have motivated research on pedestrian intention prediction, which aims to infer future crossing decis…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful paradigm for answering complex multi-hop questions by progressive…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) polic…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Soft-to-Hard Routing in Sparse Mixture-of-Experts Models

Softmax routing approaches hard top-1 routing as the temperature tends to zero, but the limiting passage is singular at router ties. This p…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Efficient Preference Poisoning Attack on Offline RLHF

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected p…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

Embodied Artificial Intelligence (Embodied AI) integrates perception, cognition, planning, and interaction into agents that operate in open…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

From Muscle Bursts to Motor Intent: Self-Supervised Token Modeling for Heterogeneous EMG

Surface electromyography provides a practical way to infer human movement intention from wearable muscle recordings, but models trained und…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Auditing Stealth Sycophancy in Mental-Health Dialogue: Structured Clinical-State Diagnostics and Clean Matched Benchmarks

Mental-health dialogue models are increasingly evaluated by AI-based evaluators, yet these evaluators often treat surface empathy, supporti…

2026-05-26 13:00 JSTarXiv cs.AI画像/動画生成

BFORE: Butterfly-Firefly Optimized Retinex Enhancement for Low-Light Image Quality Improvement

Low-light images suffer from poor visibility, noise, and color distortion. Existing Retinex-based enhancement methods rely on manually tune…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical se…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment

TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context lear…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Aes3D: Aesthetic Assessment in 3D Gaussian Splatting

As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes beco…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamen…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation

Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot s…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach

Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Flow-OPD: On-Policy Distillation for Flow Matching Models

Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induc…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models

We introduce the first universal pretraining corpus for industrial time-series data: FactoryNet. 51M datapoints across 23k end-to-end task…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning

Supervised Fine-Tuning (SFT) is widely used for task-specific adaptation, yet recent work shows it systematically undermines reasoning gene…

2026-05-26 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industri…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transf…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment.…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operatio…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

Critical events in multivariate time series, from turbine failures to cardiac arrhythmias, demand accurate prediction, yet labeled data is…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding ex…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Persona-Model Collapse in Emergent Misalignment

Fine-tuning large language models on narrow data with harmful content produces broadly misaligned behavior on unrelated prompts, a phenomen…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing wo…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Model providers increasingly release open weights or allow users to fine-tune foundation models through APIs. Although these models are saf…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition

Online surgical phase recognition (SPR) underpins context-aware operating-room systems and requires committing to a prediction at every fra…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Learning Selective Merge Policies for Deadline-Constrained Coded Caching via Deep Reinforcement Learning

In the coded caching, the server uses the cached information at the users to serve multiple users in parallel with a single coded multi-cas…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which create…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

In-context learning (ICL) excels at new tasks from minimal examples, yet we still lack a mechanistic explanation of how few-shot prompts sh…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark eval…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

L-Drive: Beyond a Single Mapping-Latent Context Drives Time Series Forecasting

Mainstream methods for multivariate time-series forecasting largely follow the Direct-Mapping paradigm. They learn a unified mapping from h…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Quantitative backtesting is essential for evaluating trading strategies but remains hampered by high technical barriers and limited scalabi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injecti…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

A Simplex Witness Certificate for Constant Collapse in Variational Autoencoders

We study exact constant collapse in variational autoencoders: the deterministic encoder mean becomes independent of the input. The prior re…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

Time Series Foundation Models (TSFMs) have demonstrated notable success in general-purpose forecasting tasks; however, their adaptation to…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary a…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Simply Stabilizing the Loop via Fully Looped Transformer

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusi…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling

Scaling laws predict loss from compute but not how capabilities interact. We measure the coupling between reasoning and truthfulness across…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence whil…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representat…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Unlocking the Potential of Continual Model Merging: An ODE Perspective

Continual Model Merging (CMM) enables rapid customization of foundation models by sequentially incorporating task-adapted models without re…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fi…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

Existing benchmarks for LLM coding agents primarily evaluate final outcomes. While useful for measuring overall capability, these metrics p…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors o…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

Gated Linear Units (GLU) and their variants are widely adopted in modern open-source large language model architectures and consistently ou…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

Blended emotion recognition is challenging because emotions are often expressed as mixtures of subtle and overlapping multimodal cues rathe…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

Vision-Language Models (VLMs) have significantly advanced medical visual question answering, yet their performance in ultrasound remains su…

2026-05-26 13:00 JSTarXiv cs.AIロボティクス

Action with Visual Primitives

Vision-Language-Action (VLA) models have emerged as a promising paradigm for generalist robotic manipulation. A common design in current ar…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting

Time Series Forecasting (TSF) is highly vulnerable to backdoor attacks, yet effective defenses remain underexplored due to challenges arisi…

2026-05-26 13:00 JSTarXiv cs.AIエージェント

Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

Skills have become a practical packaging mechanism for agent instructions, workflows, scripts, and reference materials. In enterprise setti…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in…

2026-05-26 13:00 JSTarXiv cs.AILLM/生成AI

Understanding Data Temporality Impact on Large Language Models Pre-training

Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose t…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

Robustness, domain adaptation, photometric/occlusion invariance, sensor drift, and alignment style are treated as separate literatures with…

2026-05-26 08:00 JSTITmedia AI+その他

考えるだけでPC操作・脳内発話も 中国がAI脳インプラント開発加速、一般販売も間近か Nature報道

Nature誌によると、中国のスタートアップ企業が、AIを活用したブレイン・コンピュータ・インタフェース(BCI)の開発と実用化を急ピッチで進めているという。

2026-05-26 08:00 JSTITmedia AI+その他

雑談、挨拶、雪かきまで クレディセゾン「43人のおせっかい集団」が変えた、AI時代の「孤独」

「全社員のAIワーカー化」を掲げ、AI活用を前提とした業務効率化を推進するクレディセゾン。先進的なイメージの強い同社が今、社員同士がそれぞれのちょっとした困りごとを解決するために“おせっかい”をする、通称「おせっ会」活動に注力している。

2026-05-26 07:00 JSTITmedia AI+LLM/生成AI

Webサイトの二重管理に限界 青森県庁チャットbot「生成AI化」でコスト7割減

青森県庁は、公式Webサイトに生成AIチャットbot「AIデジタルスタッフ」を導入した。従来のシナリオ型チャットbotで課題となっていた運用負荷を見直し、運用コストを7割超削減したという。

2026-05-26 07:00 JSTITmedia AI+その他

「まず何から……」が62% 中小企業のAI活用を阻む“5大不安”

Leachは、中小企業におけるAI導入実態の調査結果を公表した。AI導入率は約12%にとどまり、最大の課題として「何から始めればいいか分からない」が挙がった。活用領域は、書類処理やデータ入力など、定型業務の効率化を目的とした導入が中心だ。

2026-05-26 07:00 JSTITmedia AI+LLM/生成AI

「あの人が 休むと業務 止まりがち」 AIに“IT用語かるた”を作らせてみた

IT用語は便利だが、一から覚えるのはなかなか大変だ。それならば、IT用語を五七五に置き換えれば、少しは親しみやすくなるのではないか。ChatGPTとNotebookLMで「IT用語かるた」を作ってみた。

2026-05-26 04:38 JSTITmedia AI+LLM/生成AI

Anthropicの「Mythos Preview」、1カ月で1万件超の脆弱性を発見──「Project Glasswing」初期報告

Anthropicは、未公開AIモデル「Claude Mythos Preview」を活用するサイバーセキュリティプロジェクト「Project Glasswing」の初期報告を公開した。約50のパートナー企業と協力し、開始から1カ月で世界の重要ソフトウェアから1万件超の重大な脆…

2026-05-26 01:00 JSTTechCrunch AIエージェント

What ClickUp’s mass layoff tells us about the future of work

The nine-year-old startup is replacing hundreds of employees with thousands of AI agents.

2026-05-26 00:09 JSTTechCrunch AIその他

The pope’s AI encyclical isn’t really about AI

Pope Leo XIV's first encyclical uses AI as a lens to diagnose older problems: concentrated power, eroding democracy, and a tech elite that…

2026-05-25(296件)

2026-05-25 23:30 JSTTechCrunch AIその他

Startup Battlefield 200 applications close in days: Apply before May 27

The deadline to apply or nominate for Startup Battlefield 200 is May 27. This is your shot at VC access, global visibility, TechCrunch cove…

2026-05-25 23:00 JSTTechCrunch AIその他

5 days left: Save up to $410 on TechCrunch Disrupt 2026 passes before prices increase

Early Bird savings for TechCrunch Disrupt 2026 in San Francisco end May 29 at 11:59 p.m. PT. Register now to save up to $410 before prices…

2026-05-25 20:20 JSTITmedia AI+LLM/生成AI

「Claude Mythos」が1万件以上の脆弱性を発見 しかし修正追い付かず Anthropicが報告書

米Anthropicは5月22日(現地時間)、セキュリティプロジェクト「Project Glasswing」の初期報告を公開した。約50社のパートナー企業が1カ月で高・重大レベルの脆弱性を1万件超発見した成果に加え、同社が独自に進めてきたオープンソースソフトウェアのスキャン結果…

2026-05-25 19:38 JSTITmedia AI+その他

日大、教職員1万人が「Google AI Pro」活用へ

日本大学が、米Googleの教育機関向けAIサブスクリプション「Google AI Pro for Education」を導入する。専任の教職員1万人が利用可能にし、定型業務のさらなる効率化を目指す。グーグル・クラウド・ジャパンが発表した。

2026-05-25 15:00 JSTITmedia AI+エージェント規制/政策

ServiceNow、AIエージェントの「データの空白」を埋める機能群を発表

AIエージェント導入の「壁」になるのが、サイロ化したデータとガバナンスの未整備だ。ServiceNowはこの課題をどう解決しようとしているのか。年次イベントで発表された新機能群を紹介する。

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), wh…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic

Effectively translating between natural language (NL) and formal logics like Linear Temporal Logic (LTL) requires expertise that limits for…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

RMA: an Agentic System for Research-Level Mathematical Problems

We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unl…

2026-05-25 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体研究/論文

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,''…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Mediative Fuzzy Logic: From Type-1 Foundations to Type-2, Type-3 and Quantum Extensions

Mediative Fuzzy Logic was conceived as a practical scheme for reconciling hesitant or conflicting assessments in fuzzy control and decision…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

Self-evolving agents should not train on examples they cannot justify. Data-free self-evolving search agents offer a scalable route to syst…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

The emergence of Large Reasoning Language Models (LRMs) has paved the way for tackling complex reasoning tasks through test-time scaling by…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

AI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of fu…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems

Agentic AI orchestrators reduce the interface and assembly costs of composing information systems capabilities across organizational bounda…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literatur…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Foundation Protocol: A Coordination Layer for Agentic Society

Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings. Anticipating the…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Design and Report Benchmarks for Knowledge Work

The development of LLM agents has led to a growing body of work on knowledge-work AI, including coding, research, and healthcare. However,…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Parallel Context Compaction for Long-Horizon LLM Agent Serving

Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's context window. Context compaction via…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems

AI-enabled services deployed in critical digital infrastructure are subject to governance obligations spanning transparency, accountability…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

DART: Semantic Recoverability for Structured Tool Agents

When a structured tool agent fails mid-execution, the runtime faces a dilemma: replaying the entire task is safe but wasteful, while restor…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning

Ventilator decision support requires sequential decisions that track evolving physiology and disease trajectories while respecting safety b…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems

LLM-based multi-agent systems can fail even when planned actions are executed correctly because agents may misjudge their knowledge when ev…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation

On-Policy Distillation (OPD) has gained wide attraction as an LLM post-training paradigm due to its effectiveness in improving capabilities…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem

Dynamic Programming (DP) and Constraint Programming (CP) are well-established paradigms for solving combinatorial optimization problems. Us…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

ReAct-style agents for search-intensive, multi-step reasoning tasks rely largely on their own internal judgment to decide what evidence to…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Solving the Aircraft Disassembly Scheduling Problem

Dismantling aircrafts reaching their end of life is a complex endeavour that is necessary in terms of sustainability but yields small incom…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman r…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

Large language model agents increasingly rely on persistent memory to store past interactions, retrieve relevant demonstrations, and improv…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Agentic Proving for Program Verification

Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intri…

2026-05-25 13:00 JSTarXiv cs.AIハードウェア/半導体

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particu…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence

Environmental monitoring is a crucial component of the smart city infrastructure. It enables informed decision making which enhances sustai…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions

Key Value Indicators (KVIs) provide a decision oriented view of a service by summarizing how operational performance translates into stakeh…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Evaluating Large Language Models in a Complex Hidden Role Game

Quantifying the deceptive potential of Large Language Models (LLMs) is critical for AI safety, yet difficult to achieve in uncontrolled env…

2026-05-25 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation

In large-scale AI systems, allocating scarce resources such as GPU compute time and bandwidth among multiple agents is a critical challenge…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

Multimodal Retrieval-Augmented Generation (RAG) has emerged as an effective paradigm for enhancing Large Language Models (LLMs) with extern…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis

Chronic osteomyelitis presents substantial prognostic challenges due to its high recurrence risk and complex postoperative recovery traject…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

The Cognitive Kardashev Scale: Quantifying the Material Envelope of Civilisational Computation

How much thinking can a civilisation do? Kardashev's (1964) typology ranks civilisations by total power: planetary (Type I, ~10^16 W), stel…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test

What happens when the strongest alliance member pressures a weaker member over territory and strategic control? We examine the Greenland so…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

Multi-agent AI pipelines typically assume that agent misconduct originates from model misalignment. We identify a structural failure in thi…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse

Prefix KV caching has become a key mechanism in LLM serving: it reduces time to first token (TTFT) by avoiding redundant computation across…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Expressive Power of Deep Homomorphism Networks over Relational Databases

The expressive limitations of message-passing Graph Neural Networks (GNNs) have motivated a wide range of more powerful graph learning arch…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

Personalized pricing negotiations are a challenging testbed for LLM agents because successful interaction does not guarantee profitable dec…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

PilotWiMAE: Pilot-Native Representation Learning for Wireless Channels

Channel foundation models assume access to fully observed channels, an assumption that fails in deployment. We introduce PilotWiMAE, a self…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent resea…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance.…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manip…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

MedExpMem: Adapting Experience Memory for Differential Diagnosis

Experienced physicians develop diagnostic expertise through clinical practice, acquiring not only disease knowledge but also the ability to…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental q…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political in…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

Autoregressive Transformer KV caches grow linearly with context length; sliding-window caching bounds memory but discards evicted tokens en…

2026-05-25 13:00 JSTarXiv cs.AIエージェントロボティクス

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

Generative Vision-Language Models (VLMs) perform well on multimodal reasoning, but how visual inputs are transformed to text remains poorly…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

Benchmark accuracy is often implicitly assumed to reflect grounded visual understanding in vision-language models (VLMs), yet it remains un…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning

Deep learning is increasingly viewed as a dynamical process in parameter space, yet many existing theories still treat training as a closed…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Graph Alignment Topology as an Inductive Bias for Grounding Detection

Large Language Models (LLMs) are optimized to produce distributionally plausible continuations rather than to explicitly verify whether gen…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

A mathematical theory of balancing relational generalization and memorization

Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsup…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

LLM Code Smells: A Taxonomy and Detection Approach

Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility,…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Memorization Dynamics of Fill-in-the-Middle Pretraining

Fill-in-the-middle (FIM) is a pretraining objective widely used to equip causal language models with infilling ability, yet its effect on v…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Test-Time Training Undermines Safety Guardrails

Test-Time Training (TTT) is an emerging paradigm that enables models to adapt their parameters during inference, improving performance on t…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations

Learning reward functions from demonstrations assumes that demonstrations provide adequate supervision over all features -- or task-relevan…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

Characteristic linguistic behaviors associated with Social Language Disorder (SLD) in autism spectrum disorder, including echoic repetition…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Whose Good, Whose Place? The Moral Geography of Agentic AI for Social Good

Agentic AI systems are increasingly proposed for social-good domains, often invoking the United Nations Sustainable Development Goals (SDGs…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models

We explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpo…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Brain-LLM Alignment Tracks Training Data, Not Typology

Brain-LLM alignment is well established in English, yet the brain's language network is neuroanatomically universal across languages. Does…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Uncovering the Latent Potential of Deep Intermediate Representations

Foundational Models pretrained on huge amount of data learn representations that evolve across depth, forming a hierarchy of embeddings wit…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in comp…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs

How do learners acquire knowledge of what is unacceptable without negative evidence? Construction Grammar proposes statistical preemption:…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

The TIME Machine: On The Power of Motion for Efficient Perception

Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of tr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

We present DreamerNLplus, a hybrid framework for modeling mental health dynamics from social media timelines in the CLPsych 2026 shared tas…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

Model Collapse as Cultural Evolution

Model collapse, the progressive degradation of LLMs trained on their own outputs, has been characterized statistically but lacks a linguist…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

Decomposing and Measuring Evaluation Awareness

Frontier language models sometimes recognize that they are being evaluated and adjust their behavior, undermining validity of benchmark res…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DRL-Driven Edge-Aware Utility Optimization for Multi-Slice 6G Networks

Virtual Reality (VR) services delivered over 6G networks demand ultra-low latency and high bandwidth to ensure seamless user experiences. T…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification

Empirical claims about autonomous Kubernetes operations agents are largely unfalsifiable. Published work reports observational results with…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Anytime Training with Schedule-Free Spectral Optimization

Standard neural network training relies on learning-rate schedules tied to a fixed horizon, leading to strong path dependence and costly re…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

Vision foundation models are widely used as frozen backbones across many downstream tasks, making them a single point of failure under adve…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis

Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical metho…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics

Model-based reinforcement learning improves sample efficiency by learning a world model. However, existing latent world models such as Drea…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Security of LLM-generated Code: A Comparative Analysis

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025

Generative augmentation is often proposed as a remedy for small medical-image datasets, but synthetic images are only useful when they impr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

I present Lepton (Letter Prediction), a fine-tuned BERT classifier that predicts whether a title in a Classical Chinese wenji table of cont…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study

AI-assisted code review tools typically operate as generic "expert reviewer" agents, producing homogeneous findings regardless of the analy…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Existing Video Anomaly Detection (VAD) methods typically rely on task-specific training, leading to strong domain dependency and high train…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking

Tracking tumor lesions across serial CT scans is essential for oncological response assessment. Existing automated methods face a fundament…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Defining AI Fatigue in Academic Contexts: Dimensions, Indicators, and a Stage-Based Model Using Grounded Theory

The integration of AI tools in academic settings has introduced a distinct form of strain that existing frameworks like technostress and di…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning

Variational Quantum Algorithms (VQAs) potentially offer a pathway to practical quantum advantage, but their optimization is heavily hindere…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CALAD: Channel-Aware contrastive Learning for multivariate time series Anomaly Detection

Multivariate time series anomaly detection has become increasingly important in real-world applications, where labeled data are often scarc…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness

Classical reinforcement learning assumes the agent interacts with a fixed environment whose behavior does not depend on the agent's policy.…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs

Role prompts of the form As X, do Y admit a clean linear decomposition at one specific site in the residual stream: the prompt-to-answer tr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Generative AI and the Reorganization of Labor Demand

Generative artificial intelligence (AI) is expected to transform work, but less is known about how firms reorganize labor demand as the tec…

2026-05-25 13:00 JSTarXiv cs.AIエージェントロボティクス

Autonomous Frontier-Based Exploration with VLM Guidance

Autonomous robotic exploration of unknown and hazardous environments, a long-standing challenge, can be significantly improved by leveragin…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: insert…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達研究/論文

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks u…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids

Fast and reliable optimal power flow (OPF) approximation is essential for reliable smart-grid operation, yet many learning-based surrogates…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Adaptive Mass-Segmented KV Compression for Long-Context Reasoning

The linear growth of the Key-Value (KV) cache is a critical bottleneck in long-form LLM inference. Existing KV compression methods mitigate…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Lipschitz Optimization for Formal Verification of Homographies

The adoption of vision neural networks in regulated industries requires formal robustness guarantees, especially in safety-critical domains…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェントハードウェア/半導体研究/論文

FastKernels: Benchmarking GPU Kernel Generation in Production

LLM-based agents for GPU kernel generation are advancing rapidly, yet their progress is fundamentally constrained by the benchmarks they op…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows

Time series forecasting plays a central role in many real-world applications and has been extensively studied. Most existing approaches rel…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detecti…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement.…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Enhancing Deep Neural Network Reliability with Refinement and Calibration

Although deep neural networks (DNNs) achieve high predictive accuracy, their confidence estimates are often unreliable, potentially comprom…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Multi-Gate Residuals

While Attention Residuals has shown some effectiveness in addressing the widespread issue of unbounded activation growth across deep residu…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

6G Communication Networks Enabling Embodied Agents: Architecture and Prototype

Embodied agents, which couple intelligent decision-making with physical actuation in the real world, impose far more stringent and heteroge…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Generative priors in Image Super-Resolution (SR) often compromise faithful restoration, we attribute this limitation to a fundamental spect…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

ChainFlow-VLA: Causal Flow Planning with Vision-Language Models

Current end-to-end autonomous driving systems are fundamentally limited by a mismatch between temporal causal reasoning and global trajecto…

2026-05-25 13:00 JSTarXiv cs.AI画像/動画生成研究/論文

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

The rapid evolution of generative video foundation models has propelled the field toward professional-grade cinematic synthesis. To achieve…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

When Good Equations Get Bad Scores: Improving Symbolic Regression Through Better Parameter Optimization

Symbolic Regression (SR) plays a central role in scientific knowledge discovery by distilling mathematical equations from observational dat…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Reinforcement Learning for Microcanonical Graph Ensemble with Assortativity Constraints

How network structure determines function is a fundamental question, and it can be investigated by graph ensembles with precisely controlle…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

Large language models trained under diverse objectives and architectures have been shown to develop increasingly similar internal represent…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

Sparse Compositional Flow Matching by geometric assembly from motion primitives

Embodied trajectories, such as the executable motion sequences of robotic manipulators, underwater vehicles, and mobile robots, are a funda…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs

Large Vision-Language Models have shown strong multimodal reasoning capabilities, yet they remain susceptible to object hallucinations when…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms

AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up. Grid expansion comes with h…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Score-Based One-step MeanFlow Policy Optimization

Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Curriculum reinforcement learning with measurable task representation learning

In curriculum reinforcement learning (CRL), an agent incrementally accumulates knowledge over a sequence of tasks (i.e., a curriculum), and…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals

Recent RL methods have substantially improved the reasoning abilities of LLMs. Existing reward designs mainly follow two paradigms: (1) Rei…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computatio…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Parametric Prior Mapping Framework for Non-stationary Probabilistic Time Series Forecasting

Effectively modeling non-stationary dynamics in probabilistic multivariate time series(MTS) forecasting requires balancing expressiveness w…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Online Hand Gesture Recognition Using 3D Convolutional Neural Networks

In human computer interaction, real-time detection and classification of dynamic hand gestures is challenging as: 1) the system must run in…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Reflex: Reinforcement Learning with Reflection Symmetry Exploitation in State-Based Continuous Control

Reinforcement learning has long struggled with poor sample efficiency. One promising approach to mitigate this problem is leveraging group-…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Socially fluent AI decouples conversational signals from source identity in online interaction

Socially fluent agentic AI can now participate in online interaction in ways that resemble ordinary human conversation, potentially weakeni…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

Joint Entity and Relation Extraction (JERE) is highly susceptible to weak generalization due to low-quality training data. Data augmentatio…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

AI Security Research Should Better Incentivize Defense Research

This work examines an imbalance in artificial intelligence (AI) security research: the field tends to produce more work on attacking AI sys…

2026-05-25 13:00 JSTarXiv cs.AI画像/動画生成

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems

Enterprise AI systems, built on large language models, retrieval pipelines and autonomous agents, introduce a class of risks that tradition…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Learning Individual Dynamics from Sparse Cross-Sectional Snapshots

Predicting how a dynamical unit evolves over time - how an individual ages, an epidemic spreads, or a physical system degrades - typically…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CBANet: A Compact Attention-Based CNN-BiLSTM Network for Aggressive Driving Event Detection

Aggressive driving is a major cause of traffic accidents and poses a serious threat to road safety. Although deep learning methods have sho…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension

Bayesian optimization is widely employed for optimizing complex black-box functions but struggles with the curse of dimensionality. Random…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction

Accurate crop yield prediction is crucial for sustainable agriculture and global food security. While existing methods are predominantly de…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Multimodal Distribution Matching for Vision-Language Dataset Distillation

Dataset distillation compresses large training sets into compact synthetic datasets while preserving downstream performance. As modern syst…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through execu…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection

Anomaly detection in multivariate time series is a critical task across a wide range of real-world applications, where abnormal behaviour i…

2026-05-25 13:00 JSTarXiv cs.AI画像/動画生成

DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing t…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching g…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs

We study nonsmooth difference-of-convex programs whose subtracted convex term is a finite maximum of smooth convex functions. In this setti…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Goal-Conditioned Agents that Learn Everything All at Once

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

Whole-slide image visual question answering (WSI-VQA) frames pathology as an extreme-context search problem: to answer a free-form clinical…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning

Sparse rewards are a major bottleneck in multi-agent reinforcement learning (MARL), where simultaneous learning induces non-stationarity an…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Understanding Goal Generalisation in Sequential Reinforcement Learning

Reinforcement learning agents often exhibit unintended goal-directed behaviour outside their training distribution, but we currently lack a…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

In the competitive landscape of sponsored search, balancing retrieval quality with production latency is a critical challenge. While large…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

Cost-Effective Model Evaluation with Meta-Learning

The rapid growth of machine learning has produced an ever-expanding ecosystem of models, making it increasingly challenging to verify the r…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Preisach Attention: A Hysteretic Model of Sequential Memory

We introduce the Preisach Attention Layer (PAL), a novel sequence modelling architecture grounded in the classical Preisach hysteresis oper…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

Diffusion language models intrinsically fail to capture correlations between decoded tokens, which leads to a harsh trade-off between sampl…

2026-05-25 13:00 JSTarXiv cs.AI画像/動画生成

EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation

Multi-shot video generation requires maintaining a consistent appearance of recurring entities across shots while remaining faithful to sho…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection

Open-world object detection (OWOD) requires detectors to localize known classes while identifying unknown objects for future incremental le…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Learning Through Noise: Why Subliminal Learning Works and When It Fails

In the context of artificial neural networks, subliminal learning refers to the transfer of task-relevant knowledge or unintended biases fr…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

Although large language model (LLM) conversational systems process millions of multi-turn dialogues daily, they remain fundamentally reacti…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Weierstrass Positional Encoding for Vision Transformers

Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encod…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking

Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidel…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

PhotoFlow: Agentic 3D Virtual Photography Missions

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot fr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot

A central question in computational vision is whether human-like visual representations are better explained by discriminative or generativ…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

It has generally been assumed that geopolitical bias in language models originates from the training data used during the pre-training phas…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Leveraging Foundation Models for Causal Generative Modeling

Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While exist…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Human Decision-Making with Persuasive and Narrative LLM Explanations

Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fai…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks.…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edg…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

ETCHR: Editing To Clarify and Harness Reasoning

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions t…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomen…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Robust Counterfactual Inference in Markov Decision Processes

This paper addresses a key limitation in existing counterfactual inference methods for Markov Decision Processes (MDPs). Current approaches…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Epistemic Skills: Reasoning about Knowledge and Oblivion

This paper presents a class of epistemic logics that captures the dynamics of acquiring knowledge and descending into oblivion, while incor…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Interactive Query Answering on Knowledge Graphs with Soft Entity Constraints

Methods for query answering over incomplete knowledge graphs retrieve entities that are likely to be answers, which is particularly useful…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning

Continual Learning (CL) models, while adept at sequential knowledge acquisition, face significant and often overlooked privacy challenges d…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation

Retrieval-augmented generation (RAG) has emerged as a promising paradigm for improving factual accuracy in large language models (LLMs). We…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders

Artificial intelligence in healthcare requires models that are accurate and interpretable. We advance mechanistic interpretability in medic…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

MUSEKG: A Knowledge Graph Over Museum Collections

Digitisation in the cultural heritage sector has produced large but fragmented repositories of museum collection data, spanning structured…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design un…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment

The rise of live streaming has transformed online interaction, enabling massive real-time engagement but also exposing platforms to complex…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation

The quest for expert-level reasoning in Large Language Models (LLMs) has been hampered by a persistent \textit{reward bottleneck}: traditio…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonst…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory

Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks l…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines

Although foundation models have demonstrated remarkable success in general domains, the application of these models to electroencephalograp…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Bridging AI and Clinical Reasoning: Abductive Explanations for Alignment on Critical Symptoms

Artificial intelligence (AI) has demonstrated strong potential in clinical diagnostics, often achieving accuracy comparable to or exceeding…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

AI Evaluation Should Require Standardized Item-Level Data Releases

This position paper argues that standardized item-level benchmark data should become the default infrastructure for AI evaluation. Current…

2026-05-25 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

Computer-Use Agents (CUAs) leverage large language models to execute GUI operations on desktop environments, yet they generate actions with…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Agentivism: a learning theory for the age of artificial intelligence

Learning theories have historically changed when the conditions of learning evolved. Generative and agentic AI create a new condition by al…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content wit…

2026-05-25 13:00 JSTarXiv cs.AIエージェント研究/論文

QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems

We present \textbf{QED}, an open-source multi-agent system that turns human-provided research questions into complete mathematical proofs w…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Model Spec Midtraining: Improving How Alignment Training Generalizes

Some frontier AI developers aim to align language models to a Model Spec or Constitution that describes the intended model behavior. Howeve…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

How Mobile World Model Guides GUI Agents?

Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but r…

2026-05-25 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or in…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systemat…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

We document inverse scaling in LLMs on forecasting problems whose underlying time series exhibit superlinear growth and tail risk of regime…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Moonwalk: Inverse-Forward Differentiation

Backpropagation's main limitation is its need to store intermediate activations (residuals) during the forward pass, which restricts the de…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicate…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達規制/政策

XAttnMark: Learning Robust Audio Watermarking with Cross-Attention

The rapid proliferation of generative audio synthesis and editing technologies has raised serious concerns about copyright infringement, da…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting st…

2026-05-25 13:00 JSTarXiv cs.AI画像/動画生成

Diffusion and Flow Matching Models for Tabular Data: A Survey

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structur…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

How can cognitive science build generalizable theories that span the full scope of natural situations and behaviors? We argue that progress…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals

Frequent and long-term exposure to hyperglycemia increases the risk of chronic complications, including neuropathy, nephropathy, and cardio…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Spectral-inspired Operator Learning with Limited Data and Unknown Physics

Learning PDE dynamics from limited data with unknown physics is challenging. Existing neural PDE solvers either require large datasets or r…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG

Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved context, even at lo…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

PLACE: Prompt Learning for Attributed Community Search in Large Graphs

In this paper, we propose PLACE (Prompt Learning for Attributed Community Search), an innovative graph prompt learning framework for ACS. E…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction

Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex l…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values

The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Online Learning with Multiple Fairness Regularizers via Graph-Structured Feedback

There is an increasing need to enforce multiple, often competing, measures of fairness within automated decision systems. The appropriate w…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

A drone-based framework for coral habitat mapping via weakly supervised segmentation

Obtaining pixel-level annotations over large spatial extents remains a major bottleneck for deploying machine learning in ecological applic…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models

Large Language Models are increasingly used to simulate human opinion dynamics, yet the effect of genuine interaction is often obscured by…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning

Knowledge-graph retrieval-augmented generation (KG-RAG) couples large language models (LLMs) with structured, verifiable knowledge graphs (…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers

Reinforcement Learning with Verifiable Rewards (RLVR) replaces costly human labeling with automated verifiers. To reduce verifier hacking,…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

Graph Neural Networks (GNNs) are powerful tools for processing relational data but often struggle to generalize to unseen graphs, giving ri…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Controlled Personalization in Legacy Media Online Services: A Case Study in News Recommendation

Personalized news recommendations have become a standard feature of large news aggregation services, optimizing user engagement through aut…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

RAG-Pull: Turning Retrieval into a Code-Injection Channel via Invisible Unicode Perturbations

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by elimina…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Sparser Block-Sparse Attention via Token Permutation

Scaling the context length of large language models (LLMs) offers significant benefits but is computationally expensive. This expense stems…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation

Learning generalizable policies for robotic manipulation increasingly relies on large-scale models that map language instructions to action…

2026-05-25 13:00 JSTarXiv cs.AIエージェントロボティクス

Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures

Imitation learning-based robot control policies are enjoying renewed interest in video-based robotics. However, it remains unclear whether…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA

Document visual question answering requires models not only to answer questions correctly, but also to precisely localize answers within co…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource co…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Bridging Data and Physics: A Graph Neural Network-Based Hybrid Twin Framework

Simulating complex unsteady physical phenomena relies on detailed mathematical models, simulated for instance by using the Finite Element M…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Operator-Based Generalization Bound for Deep Learning: Insights on Multi-Task Learning

This paper presents novel generalization bounds for vector-valued neural networks and deep kernel methods, focusing on multi-task learning…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning

The paper establishes generalization bounds for multitask deep neural networks using operator-theoretic techniques. The authors propose a t…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives

Growing reliance on LLMs for psychiatric self-assessment raises questions about their ability to interpret qualitative patient narratives.…

2026-05-25 13:00 JSTarXiv cs.AIロボティクス

V-VLAPS: Value-Guided Planning for Vision-Language-Action Models

Vision-language-action (VLA) models provide strong action priors for robotic manipulation, but their reactive behavior can fail under distr…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification

Reinforcement learning drives recent advances in LLM reasoning and agentic capabilities, yet current approaches struggle with both explorat…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Information Access of the Oppressed: Freirean Design for Emancipatory Information Access

Online information access (IA) platforms are targets of authoritarian capture. We explore the question of how to safeguard our platforms an…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks. How…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling

While Mixture-of-Experts (MoE) architectures substantially bolster the expressive power of large-language models, their prohibitive memory…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compoun…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

CoFrGeNet: Continued Fraction Architectures for Language Generation

Transformers are arguably the preferred architecture for language generation. In this paper, inspired by continued fractions, we introduce…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

Large language models (LLMs) are increasingly deployed as educational agents for automatic short answer grading (ASAG) in real-world educat…

2026-05-25 13:00 JSTarXiv cs.AIエージェントビジネス/資金調達

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

The design of environments plays a critical role in shaping the development and evaluation of cooperative multi-agent reinforcement learnin…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primar…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport

Watermarking is an important tool for promoting the responsible use of large language models (LLMs). Existing watermarks insert a signal in…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

On the Infinite Width and Depth Limits of Predictive Coding Networks

Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP) that minimises an energy function with resp…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

In long-video understanding, conventional uniform frame sampling often fails to capture key visual evidence, leading to degraded performanc…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computa…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) reasoni…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a su…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefi…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

BarrierSteer: LLM Safety via Learning Barrier Steering

Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

Crystal modeling spans a family of conditional and unconditional generation tasks, including crystal structure prediction (CSP) and de novo…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogo…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Adapting Dijkstra for Buffers and Unlimited Transfers

In recent years, RAPTOR based algorithms have been considered the state-of-the-art for path-finding with unlimited transfers without prepro…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Anatomy-Guided Vision-Language Learning with Angular Prototype Separation for Multi-Label Video Capsule Endoscopy Classification Under Class Imbalance

This work presents a multi-label temporal event detection framework for video capsule endoscopy (VCE) that addresses the extreme class imba…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified mod…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Reinforcement learning has emerged as a powerful paradigm for improving large language model (LLM) reasoning, where rollouts are sampled fr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Safe Reinforcement Learning with Preference-based Constraint Inference

Safe reinforcement learning (RL) is a standard paradigm for safety-critical decision making. However, real-world safety constraints can be…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIビジネス/資金調達

Tabular PDF Information Extraction with Local LLMs and Layout-Aware Parsing: A Reliability Evaluation

Extracting structured information from academic PDF documents is non trivial: a single page typically combines free text metadata with tabu…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Visually-Guided Policy Optimization for Multimodal Reasoning

Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning ability of vision-language models (VLMs). Ho…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dop…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language prob…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtim…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

Skill Retrieval Augmentation for Agentic AI

As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks be…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

A Comparative Analysis on the Performance of Upper Confidence Bound Algorithms in Adaptive Deep Neural Networks

Edge computing environments impose strict constraints on energy consumption and latency, making the deployment of deep neural networks a si…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

SUDP: Secret-Use Delegation Protocol for Agentic Systems

Agentic systems increasingly act with user secrets for APIs, messaging platforms, and cloud services. Today's agent runtimes typically impl…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Ceci n'est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

AI-powered language learning tools increasingly provide instant, personalised feedback to millions of learners worldwide. However, this fee…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達研究/論文

ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation

Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain dif…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learne…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Recent advances in foundation models have transformed LLMs from passive conversational systems into autonomous agents capable of reasoning…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

VISD: Enhancing Video Reasoning via Structured Self-Distillation

Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assig…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント研究/論文

Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

The AI-Native Large-Scale Agile Software Development Manifesto

Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain larg…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達

Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols

The Model Context Protocol (MCP) has become a widely adopted interface for LLM agents to invoke external tools, yet learned monitoring of M…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilien…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Bridging Silicon and the Hippocampus: Algebro-Deterministic Memory "VaCoAl" as a Substrate for Vector-HaSH and TEM

Vector-HaSH and the Tolman-Eichenbaum Machine (TEM) propose the hippocampal-entorhinal circuit factorizes memory via a grid-cell scaffold f…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV…

2026-05-25 13:00 JSTarXiv cs.AIエージェント

Towards Trustworthy and Explainable AI for Perception Models: From Concept to Prototype Vehicle Deployment

Deep Neural Networks have become the dominant solution for Autonomous Driving perception, but their opacity conflicts with emerging Trustwo…

2026-05-25 13:00 JSTarXiv cs.AIビジネス/資金調達

Can the Recovery Mechanism Survive AI? Skill Formation, Labor, and What Current Measurement Misses

Throughout the modern era, when new technologies displaced workers, societies adapted through the same mechanism: education raised the cogn…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Cross-Domain Molecular Relational Learning: Leveraging Chemical Structure-Activity Analysis

Recent advances in molecular representation integrates molecular topological and visual modalities, opening new avenues for precise Molecul…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェント

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

We address concurrency control for LLM agents sharing mutable state over HTTP, where agents cannot be modified to declare read sets. S-Bus…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

DynMuon: A Dynamic Spectral Shaping View of Muon

In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential d…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer

Approximate nearest neighbor (ANN) indexes deployed against streaming corpora silently lose recall over weeks. The standard diagnosis is di…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

An Interpretable Closed-Loop Intelligent Tutoring System for Multimodal Affective Feedback in Asynchronous Presentation Training

This paper presents an interpretable closed-loop Intelligent Tutoring System (ITS) that supports feedback-guided practice for developing on…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AIエージェントビジネス/資金調達研究/論文

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Does Your Wildfire Prediction Model Actually Work, or Just Score Well?

Wildfire prediction is important for early warning and resource allocation, yet existing Earth foundation models (Earth FMs) are pretrained…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unl…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Code-switching -- the natural alternation between two languages within a single utterance -- remains one of the most challenging and under-…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Recent large language models support inputs of up to 10 million tokens, yet they perform poorly on long-context tasks that require complex…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantizatio…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

Codec-Robust Attacks on Audio LLMs

Prior attacks on Audio Large Language Models (Audio LLMs) demonstrated that carefully crafted waveform-domain perturbations can force targe…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

GenAI-Driven Threat Detection with Microsoft Security Copilot

Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tr…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI研究/論文

Fine-grained Claim-level RAG Benchmark for Law

The rapid progress of large language models (LLMs) is shifting semantic search toward a question-answering paradigm, where users ask questi…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Variance Reduction for Expectations with Diffusion Teachers

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data at…

2026-05-25 13:00 JSTarXiv cs.AILLM/生成AI

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Atom-level Protein Representation Learning Improves Protein Structure Prediction

Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment ta…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained disti…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

Proxy-Based Approximation of Shapley and Banzhaf Interactions

Shapley and Banzhaf interactions capture the complex dynamics inherent in modern machine learning applications. However, current estimators…

2026-05-25 06:39 JSTTechCrunch AIその他

Everyone is navigating AI security in real time — even Google

We're in the transition period -- all of us.

2026-05-25 00:00 JSTTechCrunch AIその他

I tried Amazon’s Bee wearable and am both intrigued and slightly creeped out

Like other AI wearables, Amazon's Bee offers an odd combination of convenience and privacy anxiety.