Business

GSI Technology Reports 3-Second Time-to-First-Token for Edge Multimodal LLM Inference on Gemini-II

Benchmark Results Demonstrate Fast Multimodal Edge Inference with Up to ~300% Better Performance per Watt versus Competitive SolutionsSUNNYVALE, Calif., Jan. 29, 2026 (GLOBE NEWSWIRE) -- GSI Technology, Inc. (Nasdaq: GSIT), the inventor of the Associative Processing Unit (APU), a paradigm shift in artificial intelligence (AI) and high-performance compute processing, providing true compute-in-memory technology, today announced preliminary benchmark results for the Gemini-II Compute-in-Memory proc

articleGsi Technology, Inc.January 29, 20266/company/gsi-technology-inc/news/gsi-technology-reports-3-second-133000066

GSI Technology Reports 3-Second Time-to-First-Token for Edge Multimodal LLM Inference on Gemini-II

About this update from Gsi Technology, Inc.

[{"type":"image","alt":"GSI Technology, Inc.","displaySize":"","headline":null,"caption":"GSI Technology, Inc.","className":"","disableSlideshowImg":false,"size":{"original":{"width":233,"height":84,"url":"https://media.zenfs.com/en/globenewswire.com/baa56cf8e076230a371dab15a6d1b2b7"},"resized":{"url":"https://s.yimg.com/ny/api/res/1.2/Jq24F2Sz3fHVs59b47eZyQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTQyMDtoPTE1MTtjZj13ZWJw/https://media.zenfs.com/en/globenewswire.com/baa56cf8e076230a371dab15a6d1b2b7","width":233,"height":84}},"lazy":false},{"type":"text","content":"Benchmark Results Demonstrate Fast Multimodal Edge Inference with Up to ~300% Better Performance per Watt versus Competitive Solutions","length":134,"tagName":"p"},{"type":"text","content":"SUNNYVALE, Calif., Jan. 29, 2026 (GLOBE NEWSWIRE) -- GSI Technology, Inc. (Nasdaq: GSIT), the inventor of the Associative Processing Unit (APU), a paradigm shift in artificial intelligence (AI) and high-performance compute processing, providing true compute-in-memory technology, today announced preliminary benchmark results for the Gemini-II Compute-in-Memory processor. These results demonstrated 3-second time-to-first-token (“TTFT”) performance for multimodal large language models operating at the edge with video and text inputs.","length":536,"tagName":"p"},{"type":"text","content":"Using the Gemma-3 12B vision-language model on GSI’s production Gemini-II processor, GSI achieved the 3-second TTFT while consuming approximately 30 watts at the AI sub-system, including the chip. To GSI’s knowledge, this 3-second TTFT at approximately 30 watts at the AI sub-system is the lowest publicly reported result for a multimodal 12B model running on an embedded edge processor.","length":387,"tagName":"p"},{"type":"text","content":"Independent third-party testing of the same workload on competitive embedded platforms reported TTFT measurements of roughly 12 seconds on Qualcomm Snapdragon X Elite with 30W power, and 3 seconds on NVIDIA Jetson Thor with over 100W power. With performance on par with or superior to competitive platforms at lower power usage levels, GSI concludes that Gemini-II offers a favorable responsiveness and power-efficiency profile for power- and thermally-constrained edge environments.","length":483,"tagName":"p"},{"type":"text","content":"“These benchmark results highlight what compute-in-memory can enable for phy...

More updates from Gsi Technology, Inc.