NVIDIA

企業記事 14 件2026-05-25 〜 2026-05-30

2026年5月

2026-05-30 02:27 JSTTechCrunch AIハードウェア/半導体ビジネス/資金調達

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M

Chipmaker Groq is looking to raise $650 million in internal funding as it pivots from hardware to focus more on AI inference, the process o…

2026-05-28 13:00 JSTarXiv cs.AILLM/生成AIエージェントハードウェア/半導体Claude

いつ最適化すべきかを学ぶ: GPU カーネル系統の専門家による検証済みの最適化スキル

LLM ベースのエージェントは、GPU カーネルの生成にますます使用されていますが、多くの場合、それらの最適化がいつ適切であるかは分からずに、どのような最適化を試みるべきかはわかっています。 KLineage を導入します。KLineage は、この欠落している「いつ」の知識をエキスパートカーネルから学習します。KLineage は、前方ロールアウトに依存するのではなく、検証ゲートによる簡略化を通じてエキスパート実装を後方に導き、受け入れられた各ステップを逆に再利用可能な最適化スキルに変換します。各スキルは、最適化の意図だけでなく、それがコード内のどこに適用されるか、どのような条件で最適化が有効になったか、どのような効果があったのか、その前提によってどのような失敗が回避されたのかも記録します。ダウンストリーム LLM は、同じコンパイル/正確性/プロファイルゲートの下で新しいコードサーフェス上でこれらのスキルを具体化します。 2 つの NVIDIA アーキテクチャにわたる 5 つのエキスパートワークロードでは、これらの系統由来のスキルが効果的な最適化カリキュラムとして機能し、同じ固定予算の下で最終的なカーネル品質と最適化効率の両方において最近のメモリベースの LLM カーネルベースラインを上回ります。さらに、ソースケースの記憶に対する健全性テストとして、別個の 22 インスタンスのホールドアウトチェックを使用します。

原文 (English)

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

LLM-based agents are increasingly used to generate GPU kernels, but they often know what optimizations to try without knowing when those optimizations are sound. We introduce KLineage, which learns this missing "when" knowledge from expert kernels: instead of relying on forward rollouts, KLineage walks expert implementations backward through validation-gated simplifications and reverses each accepted step into a reusable optimization skill. Each skill records not only the optimization intent, but also where it applies in code, what conditions made it valid, what effect it had, and what failures its assumptions avoid. A downstream LLM materializes these skills on new code surfaces under the same compile/correctness/profile gate. On five expert workloads across two NVIDIA architectures, these lineage-derived skills serve as an effective optimization curriculum, exceeding recent memory-based LLM-kernel baselines in both final kernel quality and optimization efficiency under the same fixed budget. We additionally use a separate 22-instance held-out check as a sanity test against source-case memorization.

2026-05-28 13:00 JSTarXiv cs.AIエージェントハードウェア/半導体

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Agentic AI workloads - where a single user goal triggers multi-step orchestration, tool calls, retries, and failure recovery - are being ta…

2026-05-28 05:10 JSTTechCrunch AIハードウェア/半導体

In more good news for Amazon, Snowflake signs $6B deal with AWS for AI CPU chips

Snowflake has signed a new, enormous five-year deal with Amazon to secure chips for AI usage. Nvidia is once again being put on notice.

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

モデルを超えたエッジ AI 導入: 産業用組み込みプラットフォーム向けの BSP 対応システムフレームワーク

インダストリアルエッジ AI プログラムは多くの場合、モデルから始まり、後でプラットフォームに直面します。このシーケンスは、早期のデモンストレーションを可能にするため魅力的ですが、展開ターゲットが、長い製品ライフサイクル、ベンダー固有のカーネル、異種アクセラレータ、安全上の制約、および重要な I/O パスを備えた組み込みシステムである場合には機能しません。その環境では、モデルは、センサーから始まり、ボードサポートパッケージ (BSP) を通過して、運用サービスループで終わる、より大きな実行チェーンの 1 つのコンポーネントにすぎません。この論文では、堅牢な Edge AI の導入は、後期段階のアプリケーションパッケージングの実践ではなく、システムの問題として扱う必要があると主張しています。この論文では、ハードウェア、BSP/オペレーティングシステムの適応、ランタイムとアクセラレーション、アプリケーション/推論、運用/検証の 5 つの層を中心に構成された産業用組み込みプラットフォーム向けの BSP 対応フレームワークを紹介します。この議論は、Android、NXP i.MX、NVIDIA Jetson、ONNX Runtime、TensorRT のベンダーアーキテクチャドキュメントと、組み込み AI ベンチマーク、デバイスの不安定性、および異種エッジフリートに関するシステム文献に基づいています。その結果、低レベルのプラットフォーム作業を、再現性、診断可能性、持続的なスループット、フィールドの信頼性などの測定可能な展開結果に結び付ける実用的なフレームワークが生まれます。

原文 (English)

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

Industrial Edge AI programs often begin with the model and only later confront the platform. That sequencing is attractive because it allows early demonstrations, but it breaks down when the deployment target is an embedded system with long product lifecycles, vendor-specific kernels, heterogeneous accelerators, safety constraints, and nontrivial I/O paths. In that environment, a model is only one component of a larger execution chain that begins at the sensor, traverses the board support package (BSP), and ends in a production service loop. This paper argues that robust Edge AI deployment must be treated as a systems problem rather than a late-stage application packaging exercise. The paper presents a BSP-aware framework for industrial embedded platforms organized around five layers: hardware, BSP/operating-system adaptation, runtime and acceleration, application/inference, and operations/validation. The discussion is grounded in vendor architecture documentation for Android, NXP i.MX, NVIDIA Jetson, ONNX Runtime, and TensorRT, and in systems literature on embedded AI benchmarking, device instability, and heterogeneous edge fleets. The result is a practical framework that connects low-level platform work to measurable deployment outcomes such as reproducibility, diagnosability, sustained throughput, and field reliability.

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse vis…

2026-05-27 13:00 JSTarXiv cs.AI研究/論文

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

We introduce JetViT, a novel family of hybrid-architecture Vision Transformer (ViT) models that match the accuracy of state-of-the-art full…

2026-05-27 13:00 JSTarXiv cs.AILLM/生成AIハードウェア/半導体

From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs

Large-scale AI training is now fundamentally a distributed systems problem, and hardware failures have become routine operating conditions…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文Llama Mistral AI Qwen

ウクライナ法文書における基礎モデルのトークナイザーの多産性とゼロショットのパフォーマンス: 比較研究

トークナイザーの充実度は、ウクライナの法律文書の基礎モデル間で 1.6 倍異なりますが、このコスト重視の側面はモデル選択の実践には欠けています。私たちは、ウクライナの国家登録簿 (EDRSR) からの 273 件の検証済み裁判所判決に基づいて 5 つのプロバイダーの 7 つのモデルをベンチマークし、3 つのタスクでトークナイザーの可能性とゼロショットパフォーマンスを測定しました。 4 つの発見が得られます。 (1) Qwen 3 モデルは、同一の入力で Llama ファミリモデルよりも 60% 多くのトークンを消費するため、コスト効率の高い導入にはトークナイザー分析が前提条件となります。 (2) NVIDIA Nemotron Super 3 (120B) は最高の複合スコア (83.1) を達成し、API コストモデルスケールの 3 分の 1 で Mistral Large 3 (合計パラメータの 5.6 倍) を上回りますが、ドメインパフォーマンスの代理としては不十分です。 (3) ショット数が少ないと、パフォーマンスが最大 26 パーセント低下します。層別および即時感受性アブレーションにより、これがサンプルの選択による産物ではなく、ウクライナ語のデモンストレーションに固有のものであることが確認されました。 (4) 時間横断的な一般化実験により、戦前の裁判判決 (2008 年から 2013 年) で訓練された分類子は、本格的な侵略時代の判決 (2022 年から 2026 年) に適用すると、顕著な前後非対称性を伴って 27.9 パーセントポイント低下することが明らかになりました。新しいモデルは後方に移行します (前方移行よりも +14.6 pp) が、古いモデルは戦時中の法律用語で壊滅的に失敗します。実践者向け: トークナイザー分析はモデルの選択に先行する必要があり、形態素豊かな言語ではゼロショットの方が少数ショットよりも信頼性の高いデフォルトです。再現性をサポートし、法的 NLP ベンチマークにウクライナ人が含まれていないことに対処するために、私たちは、2008 年から 2026 年までの 14,452 件の裁判所判決の公開データセットを公開します。このデータセットには、司法手続きに対する武力紛争の影響を捉える 3 つの時間的エポックにわたる 7 つの結果ラベルが注釈されています。

原文 (English)

Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Tokenizer fertility varies 1.6x across foundation models on Ukrainian legal text, yet this cost-critical dimension is absent from model selection practice. We benchmark seven models from five providers on 273 validated court decisions from Ukraine's state registry (EDRSR), measuring tokenizer fertility and zero-shot performance on three tasks. Four findings emerge. (1) Qwen 3 models consume 60% more tokens than Llama-family models on identical input, making tokenizer analysis a prerequisite for cost-efficient deployment. (2) NVIDIA Nemotron Super 3 (120B) achieves the highest composite score (83.1), outperforming Mistral Large 3 (5.6x more total parameters) at one-third the API cost model scale is a poor proxy for domain performance. (3) Few-shot prompting degrades performance by up to 26 percentage points; stratified and prompt-sensitivity ablations confirm this is intrinsic to Ukrainian-language demonstrations, not an artifact of example selection. (4) A cross-temporal generalization experiment reveals that classifiers trained on pre-war court ecisions (2008-2013) lose 27.9 percentage points when applied to full-scale invasion era decisions (2022-2026), with a pronounced forward-backward asymmetry: newer models transfer backward (+14.6 pp above forward transfer), but older models fail catastrophically on wartime legal language. For practitioners: tokenizer analysis should precede model selection, and zero-shot is a more reliable default than few-shot for morphologically rich languages. To support reproducibility and address the absence of Ukrainian from legal NLP benchmarks, we release a public dataset of 14,452 court decisions spanning 2008-2026, annotated with seven outcome labels across three temporal epochs that capture the impact of armed conflict on judicial proceedings.

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Cross-Domain Generalization Limits of Vision Foundation Models in Facial Deepfake Detection

The rapid evolution of generative models has enabled the creation of hyper-realistic facial deepfakes, exposing a critical vulnerability in…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical bene…

2026-05-26 13:00 JSTarXiv cs.AI研究/論文

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

Genomic Foundation Models (GFMs) typically rely on Masked Language Modeling (MLM) or Next-Token Prediction (NTP) to learn the "Laws of Natu…

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

認知カルダシェフスケール: 文明の計算の物質的な範囲を定量化する

文明はどれだけの思考ができるでしょうか？ Kardashev (1964) の類型学では、惑星 (タイプ I、約 10^16 W)、恒星 (タイプ II、約 10^26 W)、銀河系 (タイプ III) の総力によって文明をランク付けしています。この論文では、各層がどれだけの持続的な AI グレードの計算をサポートできるかという、類似の認知カルダシェフスケールを構築します。計算には 4 つの要素が含まれます: 総電力 P (ワット)、認知に割り当てられる電力の割合 f、エネルギーが計算される効率 $\eta$ (ジュールあたりの演算数)、参照単位としての脳自体の処理速度 $C_{\mathrm{brain}}$ です。 2024 ～ 2026 年のハードウェア (El Capitan、NVIDIA Blackwell、Vera Rubin) に固定すると、$\eta_{2026} = 10^{12}$ FLOP/J になります。現代の人類は、タイプ I への道のりの 4 分の 3 である $K \約 0.73$ に位置しています。タイプ I および $f = 1\%$ では、利用可能なコンピューティングは、人間の住民 1 人あたり 1 台の個人 AI に相当する認識能力となります。タイプ II では、それは本質的に理解できません。 2035 年までのフロンティアコンピューティングの 3 つの軌跡は、予測ではなく条件付き予測として報告されます。長期的な拘束制約がエネルギーであるか効率であるかは、まだ行われていない工学上の選択によって決まります。誰がアクセスできるかという政治経済の方が、どちらよりも重要である可能性があります。

原文 (English)

The Cognitive Kardashev Scale: Quantifying the Material Envelope of Civilisational Computation

How much thinking can a civilisation do? Kardashev's (1964) typology ranks civilisations by total power: planetary (Type I, ~10^16 W), stellar (Type II, ~10^26 W), galactic (Type III). This paper builds an analogous Cognitive Kardashev Scale: how much sustained AI-grade computation each tier could support. Four ingredients enter the calculation: total power P (watts), the share f of it devoted to cognition, the efficiency $\eta$ at which energy becomes compute (operations per joule), and the brain's own processing rate $C_{\mathrm{brain}}$ as a reference unit. Anchoring on 2024-2026 hardware (El Capitan, NVIDIA Blackwell, Vera Rubin) gives $\eta_{2026} = 10^{12}$ FLOP/J. Contemporary humanity sits at $K \approx 0.73$, three-quarters of the way to Type I. At Type I and $f = 1\%$, available compute is, within an order of magnitude, one personal AI's worth of cognition per human inhabitant; at Type II it is essentially incomprehensible. Three trajectories for frontier compute through 2035 are reported as conditional projections, not predictions. Whether the long-run binding constraint is energy or efficiency depends on engineering choices not yet made; the political economy of who has access may matter more than either.

2026-05-25 13:00 JSTarXiv cs.AI研究/論文

HARNESS-LM: スポンサードサーチ検索で SLM を活用するための 3 段階のトレーニングレシピ

スポンサー付き検索の競争環境においては、検索品質と本番レイテンシーのバランスをとることが重要な課題です。 Qwen3-Embedding-4B/8B などの Small Language Model (SLM) に基づく大規模な検索モデルは、公開ベンチマークに強力な上限を設定していますが、高スループットで遅延に敏感な環境での展開は依然として非現実的です。この論文では、大型レトリバーの能力をコンパクトでコスト効率の高いモデルに移すための 3 段階のトレーニングフレームワークである HARNESS-LM (HLM) を紹介します。このアプローチは次の内容で構成されます。(1) 10 億パラメータ規模の SLM を微調整することにより、高性能のリファレンス (「教師」) レトリバーをトレーニングします。 (2) L2 目標を介してクエリ表現を調整し、知識をサブ 600M パラメータのスチューデントエンコーダに抽出します。 (3) 最終的な対照的洗練段階を適用して、生徒の検索パフォーマンスを最適化します。また、生産環境で最も効果的な構成を特定するために、位置合わせの目的、埋め込み次元、モデルスケール、アーキテクチャ、最適化戦略など、主要な設計の選択肢に関する包括的な実証研究も紹介します。実際の Bing Ads 評価ベンチマークでは、HLM は複数の設定にわたって参照取得精度の 98% 以上を回復し、NVIDIA A100 GPU ではオンラインクエリエンコーダのレイテンシが最大 27 倍低く、スループットが 20 倍高くなります。さらに、Bing Ads でのオンライン A/B テストでは、デプロイされた 1 億 9,000 万パラメータモデルを使用して運用環境で実行されている現在のレトリバー群と比べて、収益が +1%、インプレッションが +0.6%、クリック数が +0.4% 増加したことが示されており、現実世界のスポンサー付き検索設定における HLM レシピの実用的な有効性が明確に強調されています。

原文 (English)

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

In the competitive landscape of sponsored search, balancing retrieval quality with production latency is a critical challenge. While large retrieval models based on Small Language Models (SLMs) such as Qwen3-Embedding-4B/8B set strong upper bounds on public benchmarks, their deployment in high-throughput, latency-sensitive environments remains impractical. In this paper, we present HARNESS-LM (HLM), a three-phase training framework for transferring the capabilities of large-scale retrievers into compact, cost-efficient models. The approach comprises: (1) training a high-performance reference ("teacher") retriever by fine-tuning a billion-parameter-scale SLM; (2) aligning query representations via an L2 objective to distill knowledge into a sub-600M parameter student encoder; and (3) applying a final contrastive refinement stage to optimize the student for retrieval performance. We also present a comprehensive empirical study of key design choices, including alignment objectives, embedding dimensionality, model scale, architecture, and optimization strategies, to identify configurations that are most effective in production settings. On a real-world Bing Ads evaluation benchmark, HLM recovers over 98% of the reference retriever's precision across multiple settings, while delivering up to 27x lower online query-encoder latency and 20x higher throughput on NVIDIA A100 GPUs. Online A/B testing on Bing Ads further shows a +1% Revenue, +0.6% Impression, and +0.4% Click uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model, clearly highlighting the practical efficacy of the HLM recipe in a real-world sponsored search setting.