Welcome

News

>

The requirements of the AI data center network architecture: 400/800G optical modules

The requirements of the AI data center network architecture: 400/800G optical modules

With the continuous development of AI technology and related applications, the importance of large models, big data and AI computing power in the development of AI is increasingly prominent. Large models and data sets constitute the software foundation of AI research, while AI computing power is the key infrastructure. In this article, we will explore the impact of AI development on the network architecture of data centers.

Fat-Tree Data Center Network Architecture

With the wide application of AI large model training in various industries, traditional networks cannot meet the bandwidth and latency requirements of large model cluster training. Distributed training of large models requires communication between GPUs, and its traffic pattern is different from that of traditional cloud computing, which increases the east-west traffic in AI/ML data centers. Short-term and high-volume AI data lead to network latency and reduced training performance in traditional network architectures. Therefore, in order to meet the requirements of short-term and high-volume data processing, the emergence of Fat-Tree networks is inevitable. In the traditional tree-like network topology, the bandwidth converges layer by layer, and the network bandwidth at the bottom of the tree is much smaller than the total bandwidth of all leaf nodes. In contrast, Fat-Tree looks like a real tree, and the branches near the root are thicker. Therefore, the network bandwidth gradually increases from the leaves to the root, improving network efficiency and accelerating the training process. This is the basic premise of the Fat-Tree architecture, which can achieve a non-blocking network.

Data Center Network Rate Upgrade and Evolution

With the continuous increase in the complexity of data center applications, the demand for network speed is also constantly growing. From the past 1G, 10G, and 25G to the widely used 100G nowadays, the speed of data center network upgrade and evolution is accelerating. However, in the face of large-scale artificial intelligence workloads, 400G and 800G transmission rates have become the next key processes in the evolution of data center networks.

AI Data Centers Drive the Development of 400G/800G Optical Modules

Reasons for the Continuous Growth in Demand for 400G/800G Optical Modules Demand for large-scale data processing The training and reasoning of AI algorithms require a large number of data sets. Therefore, data centers must be able to efficiently handle the transmission of a large amount of data. The emergence of 800G optical modules provides greater bandwidth and helps solve this problem. The upgraded data center network architecture typically includes two levels, extending from the switch to the server, with 400G as the bottom layer. Therefore, upgrading to 800G will also drive the growth in demand for 400G. Real-time requirements In some AI application scenarios, the demand for real-time data processing is crucial. For example, in autonomous driving systems, the massive data generated by sensors needs to be quickly transmitted and processed, and optimizing system latency becomes a key factor in ensuring timely responses. The introduction of high-speed optical modules quickly meets these real-time requirements by reducing the latency of data transmission and processing, thereby improving the system’s response capability. Concurrent multi-tasking Modern AI data centers often need to handle multiple tasks simultaneously, including activities such as image recognition and natural language processing. Adopting high-speed 800G/400G optical modules can enhance support for such multi-task workloads. The 400G/800G Optical Module Market Has a Broad Prospect Currently, the demand for 400G and 800G optical modules has not shown significant growth, but it is expected to increase significantly in 2024 driven by the growth in AI computing demand. According to Dell’Oro’s prediction, the demand for 400G optical modules will increase in 2024. The growing demand for high-speed data transmission driven by AI, big data, and cloud computing is expected to accelerate the growth of the 800G optical module market. This trend highlights the bright prospects of the 800G/400G optical module market, and its applications will gradually increase in the process of responding to the changing demands of advanced computing applications.

Typical 400G/800G Optical Module Solutions for Data Centers

This figure shows the solution for upgrading to an 800G data center. The QDD-FR4-400G optical module forms a high-bandwidth link between the MSN4410-WS2FC switch in the backbone layer and the high-performance 800G switch in the core layer, operating at a 400G interface rate. Because these optical modules adopt the high-density QSFP-DD package, they can be deployed in high-density configurations. This increases the transmission capacity and provides a greater bandwidth rate. In addition, by adopting PAM4 modulation and re-timing technology, these optical modules achieve a faster data transmission rate, while significantly reducing latency and improving the overall system performance.

The New Era of 800G/400G Optical Modules

With the continuous growth of the demand for faster and more efficient data transmission, the era of 800G/400G optical modules has fully arrived. These optical modules are favored for their outstanding bandwidth capabilities, advancements in LPO technology, and economic benefits, and are expected to transform the AI field and redefine data centers. With high-speed optical modules, the full development and training of AI is no longer just an idea.