Business

Penguin Solutions Introduces Industry's First Production-Ready CXL-Based KV Cache Server

Penguin Solutions MemoryAI KV cache server, an 11TB memory appliance, enables efficient deployment of enterprise-scale AI inference FREMONT,

articlePenguin Solutions, Inc.March 16, 20263/company/penguin-solutions-inc/news/penguin-solutions-introduces-industrys-first-production-ready-cxl-based-kv-cache-server

Penguin Solutions Introduces Industry's First Production-Ready CXL-Based KV Cache Server

About this update from Penguin Solutions, Inc.

[{"type":"text","content":"\nPenguin Solutions MemoryAI KV cache server, an 11TB memory appliance, enables efficient deployment of enterprise-scale AI inference\n\n\n FREMONT, Calif.--(BUSINESS WIRE)--\nPenguin Solutions, Inc. (Nasdaq: PENG), the AI factory platform company, today announced the industry's first production-ready KV cache server that utilizes CXL memory technology to address the critical \"memory wall\" challenge in AI inferencing—Penguin Solutions MemoryAI™ KV cache server. This innovative solution delivers up to 11 TB of CXL-based memory engineered to optimize performance of enterprise scale inference, including agentic AI. The result is lower latency, higher throughput, increased efficiency of GPU clusters, consistent achievement of stringent service-level agreements (SLAs), and faster time-to-first-token (TTFT).\n\nThis press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20260316416248/en/Penguin Solutions MemoryAI KV cache server is the industry's first production-ready KV cache server that utilizes CXL memory technology to address the critical \"memory wall\" challenge in AI inferencing. The innovative solution delivers up to 11 TB of CXL-based memory engineered to optimize performance of enterprise scale inference, including agentic AI.\nWhile model training and tuning is primarily compute-bound and occurs episodically, the continuous memory-bound and latency-sensitive inference workloads required for inference and agentic AI are complex and fundamentally different. Inference demands are typically 30% compute driven (GPU) and 70% memory driven (RAM), elevating the need for greater memory capacity and causing performance bottlenecks and GPU idle time. Accelerating memory-dependent AI processes, Penguin’s MemoryAI KV cache server increases memory capacity by integrating 3 TB of DDR5 main memory and up to eight 1 TB CXL Add-in Cards (AICs).\n\n\n“CXL-enabled KV cache technology delivers faster time-to-first-token, reduced time per output token, and increased overall end-to-end token throughput,” said Phil Pokorny, chief technology officer at Penguin Solutions. “These critical performance improvements enable enterprise-scale inferencing across many users who expect low latency and timely access to AI-generated insights. The introduction of Penguin’s MemoryAI KV cache server is designed to...

More updates from Penguin Solutions, Inc.