This video explains Tofu interconnect ("6-Dimensional Mesh/Torus" Topology Network Technology).
(6 minutes 20 seconds)
In the K computer, a tremendously large system containing more than 80,000 CPUs, the network that exchanges data such as computational results between CPUs plays a very important role. The K computer's network, called Tofu, uses an innovative structure called "6-dimensional mesh/torus" topology. This enables the mutual interconnection of more than 80,000 CPUs.
How fast can CPUs exchange data?
Each CPU in the K computer can perform numerous computations in a short period of time. There are cases where data exchanges take place between CPUs during the computations. If these data exchanges are slow, the high computational power of all those CPUs cannot be fully utilized. The design of the 6-dimensional mesh/torus topology in the K computer provides many communication routes between neighboring CPUs. Execution of data communications between CPUs via the shortest route and over the shortest period of time is enabled to ensure the network can fully draw out the world top-class CPU computational power.
Immediate detection of CPU failure and organized data traffic
For the K computer to always maintain its highest performance, it is important that failures do not occur. Further even if a partial failure does occur, its overall impact must be minimized. The K computer is configured with alternate routes in the network between CPUs, and with a mechanism that bypasses failed CPUs so data exchange can continue. This means computational processing cannot be stopped.
Management functions to ensure efficient and maxim CPU performance
The K computer is expected to process computations (jobs) from many users simultaneously. The K computer therefore assigns the required CPUs to each respective job, based on their process contents from its 80,000 or more CPUs. But when assigning CPUs to many jobs, it is more efficient to ensure there is no unnecessary data communication between the CPUs. In the K computer, the job management software assigns respective jobs to the CPUs and controls the order of processes. This resembles the situation where you pack as many boxes as possible in a warehouse efficiently, leaving as little wasted space as possible, even when the boxes are of different shapes and sizes. Because the 6-dimensional mesh/torus topology network provides many communication routes between neighboring CPUs, the shapes of the jobs assigned to each CPU group can be flexibly changed. In other words multiple jobs can be assigned very flexibly and efficiently within the overall CPU population.
The K computer, using the features of the 6-dimensional mesh/torus topology, can maximize the computational power of its 80,000 or more CPUs without waste.