Business

F5 and NVIDIA Advance AI Factory Economics With New Capabilities for Accelerated AI Inference

F5 BIG-IP Next for Kubernetes accelerated with BlueField DPUs improves token throughput, reduces cost per token, and enables secure multi-tenant AI

articleF5, Inc.March 17, 20264/company/f5-networks-inc/news/f5-and-nvidia-advance-ai-factory-economics-with-new-capabilities-for-accelerated-ai-inference
F5 and NVIDIA Advance AI Factory Economics With New Capabilities for Accelerated AI Inference

About this update from F5, Inc.

[{"type":"text","content":"\nF5 BIG-IP Next for Kubernetes accelerated with BlueField DPUs improves token throughput, reduces cost per token, and enables secure multi-tenant AI infrastructure, transforming AI factories for the agentic era\n\n\n SEATTLE--(BUSINESS WIRE)--\nF5 (NASDAQ: FFIV), the global leader in delivering and securing every app and API, today announced expanded capabilities in its ongoing collaboration with NVIDIA to accelerate and optimize AI inference infrastructures.\n\n\nThe expanded integration combines F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs, creating an intelligent, telemetry-aware infrastructure layer that increases token throughput with better GPU utilization, reduces latency, and enables secure multi-tenant AI platforms at scale.\n\n\nIn AI systems, tokens represent the measurable unit of AI output—the words, symbols, or data fragments generated and processed during inference. The volume and velocity of token production ultimately determine user experience, infrastructure efficiency, and revenue per accelerator.\n\n\nAs enterprises and GPUaaS providers race to monetize AI and move from AI experimentation to revenue-generating services, infrastructure efficiency has become a defining metric. Success is increasingly measured not simply by deployed GPU capacity, but by token economics, sustained token throughput, time to first token (TTFT), cost per token, and revenue per GPU accelerator. The F5 and NVIDIA joint solution is designed to directly address these metrics.\n\n\nOptimizing tokenomics through intelligent AI infrastructure\n\n\nThe shift from application-centric inference to agent-driven AI workflows demands new architectural approaches to optimize token throughput and reduce costs. BIG-IP Next for Kubernetes now leverages NVIDIA NIM statistics, Dynamo runtime signals, and GPU telemetry to make inference-aware routing decisions before execution. By matching workloads to the most appropriate accelerators in real time, the solution increases sustained utilization while reducing latency and re-compute.\n\n\n“AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximizing economic output per accelerator,” said Kunal Anand, Chief Product Officer, F5. “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric...

More updates from F5, Inc.