1. Introduction
A Network-on-Chip (NoC) application is a collection of tasks that run in different processing elements (PE) and communicate with each other as defined by a directed acyclic graph (DAG).
Once PEs are allocated to tasks, messages over the network must be routed so that they can communicate with each other. Transmission of data requires an amount of energy proportional to the distance of the PEs where the sending and receiving tasks are.
The network can handle several message transmissions at the same time if they use different nodes, and the transmissions can also go over different routes. The network manager should look for the best routes for the required transmissions at a given time.
In this network, a (communication) task is transmitting a message from its source node to its destination, and a resource is the route followed by the message packets to reach its destination.
Choosing the best routes to consume the least power possible requires assigning the pending communication tasks at a given instant to the resources that minimize the overall distance of the transmission routes. In other words, minimizing the energy consumption of communications in a NoC requires solving an
assignment problem every time a new set of tasks arises. This problem consists in assigning communication tasks to resources so that the cost of the assignment
C is minimum:
where
is the cost of assigning task
i to resource
j; and a task is a transmission from a source to a destination, according to the DAG of the application. The assignments must be unique, i.e:
In this case,
, since there is at least one route per each task. The right values for
can be found iteratively by looking for the minimum value in each row of
A. This greedy approach works pretty well for
, but the results strongly depend on row ordering. However, the Hungarian algorithm (HA) [
1] has no dependencies on the order of the rows at the price of solving the same problem in
. Fortunately, with
, complexity gets closer to that of the greedy approach.
For example, an application with the DAG shown in Fig.
Figure 1 (left) has 9 edges, thus 9 possible communication tasks. The application tasks can be placed at a 3x3 NoC as shown in Fig.
Figure 1 (right), which allows them to communicate in a hop. Anyway, in this example, we shall assume that the NoC uses a wormhole type of communication, that the PE’s network interfaces have dual ports to connect to the horizontal and the vertical lines close by, and that the routers configure the corresponding crossbars in accordance with the communication tasks’ paths. Note that the entire horizontal or vertical bus lines are occupied for the duration of the transmissions, regardless of which sections are actually used.
For this example, there are 28 different paths that can be grouped into 13 resources, each being a set of paths that represent the possible routes that use some of the lines that also use the longest in the set.
At some given moment, the application requires communication tasks , , , and , thus the network manager needs to map them to appropriate routes to minimize bus usage.
Table 1 shows the number of lines, that is, the cost, that are used to perform each of these tasks with the available resources (only the significant ones are shown). If a given resource does not contain a communication path for a given task, the cost is set to the maximum value (the number of lines) plus one (7, in this case). The
assignment problem is thus to bind these communication tasks to resources such that the sum of the cost is minimum. The bindings with cost equal to 7 imply that the corresponding task will remain pending for a further assignment.
The greedy approach and the HA assign tasks and to the same resources, but they differ in the rest of the assignments. While the greedy algorithm assigns to , the HA puts the task on hold by making an assignment that costs 7 to . In doing so, the HA can assign to , which saves 2 lines for all communications. In effect, the transmission power required for these communications is proportional to in the case of using the greedy algorithm and to when using the HA.
Obviously, running the greedy procedure on all row permutations would lead to finding the minimum at the cost of exponential time complexity, which makes the HA better.
In this work, we shall explore how the HA can be used by the network manager to dynamically route communication tasks so that they are made as efficient as possible. In this case, efficiency is measured in terms of the number of lines used per communication frame and the number of waits.
Because pending communication tasks vary over time, the matrix A varies and so does the assignment matrix . However, the calculation of A for all possible tasks can be performed offline (Section II), and only the assignment problem (submatrix of A with the rows corresponding to the pending tasks) must be solved in real time.
Calculating B might require parallel implementations or hardware accelerators. In the last case, a customized processor can run a program or specific hardware can be used.
In our case, we shall build specific hardware on FPGA from a state-machine version of the HA (Section III). Indeed, modeling the HA with state machines enables an early verification of the system and makes generation of hardware description straightforward.
The result model can be synthesized on a Field-Programmable System-on-Chip (FPSoC) and further refined to cut constants in execution time, so to suit stringent real-time requirements (Conclusion).