Ultra Ethernet gains momentum as tech giants join for AI and HPC network innovation
Ultra Ethernet gains momentum as tech giants join for AI and HPC network innovation Newcomers to the group include IBM, Nokia, Dell, Baidu, Huawei, Lenovo, Supermicro, and Tencent By Erika Morphy TechSpot means tech analysis and advice you can trust. Forward-looking: Ultra Ethernet means to deliver a comprehensive architecture that optimizes Ethernet for high performance in AI and HPC networking, surpassing the capabilities of today’s specialized technologies. As data centers continue to evolve and the push for AI becomes universal, tech companies have been flocking to join the Ultra Ethernet Consortium, which launched last summer hosted by The Linux Foundation. UEC focuses on enhancing Ethernet to meet the low latency and high bandwidth requirements of advanced AI and HPC (High-Performance Computing) applications, making it a competitive alternative to other high-performance networking technologies. Forty-five new members have joined the Ultra Ethernet Consortium since November 2023 when the organization began accepting new members, underscoring the industry demand for a complete Ethernet-based communication stack architecture for high-performance networking. The interest of all these tech companies highlights a need that the UEC is meeting in the industry, says J Metz, Chair of the UEC Steering Committee. UEC’s membership originally consisted of 10 steering members, bringing the total number of members today to 55 following its fivefold burst in growth this last few months. Its founding members are AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft. The newcomers to the group include Baidu, Dell, Huawei, IBM, Nokia, Lenovo, Supermicro, and Tencent. Since its founding last year, the Ultra Ethernet Consortium has built up a considerably deep talent bench. There are a total of 715 industry experts engaged in eight working groups: physical layer, link layer, transport layer, software layer, storage, compliance, management, and performance & debug. UEC notes that many large clusters including hyperscale deployments of GPUs used for AI training are already operating on Ethernet-based IP networks, due to their significant advantages, which include a broad, multi-vendor ecosystem of interoperable Ethernet switches, NICs, cables, transceivers, optics, management tools and software and a proven history of the routing scale of IP networks, as well as the established IEEE Ethernet standards. “We expect these advantages to become table-stakes requirements, and that Ethernet networks will increasingly dominate AI and HPC workloads of all sizes in the future” The UEC wants to minimize communication stack changes while maintaining and promoting Ethernet interoperability. To that end, it is developing specifications, API interfaces, and source code to define protocols, electrical and optical signaling characteristics, link-level and end-to-end network transport protocols and management mechanisms, software, storage, and security constructs. In short, it wants to optimize AI and HPC workloads by modernizing remote direct memory access (RDMA) operation over Ethernet. It is pushing to replace the legacy RoCE protocol with Ultra Ethernet Transport, an open protocol specification designed to run over IP and Ethernet. The industry will soon see exactly what the UEC has been developing. Work on the spec has followed a very aggressive timeline, with version 1.0 slated to be released by Q3 2024.