2. Materials and Methods
To comprehend BONN’s architecture, one should possess a good understanding of the backpropagation algorithm [
12,
13], which is used in neural networks.
Figure 1 presents a visual representation of this fundamental algorithm. It depicts a neural network comprising two input nodes and four output nodes interconnected by synapses, along with the mathematical representations of both types of nodes. The weighted input
is introduced since it will be used for theoretically explaining the backpropagation algorithm, and
, referred to as sensitivity, represents the gradient of the cost function with respect to the weighted input. The sensitivity of a layer is linked to the sensitivities of subsequent layers through the weights connecting them. Hence, the sensitivities of the last layer propagate backward until they reach the first layer. The computed sensitivities with respect to the weighted input are used to compute the sensitivities with respect to the weight and bias. The sensitivities are then used to adjust the current values of weights and biases in order to minimize the errors between the final output values and the target values.
Figure 2(a) shows the concept of the proposed BONN. The neural network depicted in
Figure 1 has been transformed into a hardware schematic, incorporating components such as laser diodes (LDs), lenses, an SLM, detectors, and electronics. However, the order of the output nodes has been reversed, and the number of output nodes has been reduced to two. BONN shares similarities with the LCOE, and the main distinction lies in the inclusion of additional LDs adjacent to the photodiodes (PDs) on the second substrate, and extra PDs adjacent to the LDs on the first substrate. These supplementary LDs and PDs facilitate the creation of backward light paths that are required for implementing the backpropagation algorithm.
For the forward direction, the light paths and functions of components are similar to those of the LCOE architecture proposed in Ref. [
11]. An LD replaces the input node, and it emits two rays directed at lens array 1. The rays are collimated by the lens array, and the collimated rays reach the SLM. Each SLM pixel transmits a ray with a preset weight that passes through lens array 2. This lens array focuses the incident rays, with the angle of each ray depending on the position of the pixel that emitted the ray. Lens 3 sorts incoming rays into spots according to the equal-inclination rule. If the distance between the SLM and lens array 2 equals the focal length of lens 2 and the detector plane is at the focus of lens 3, the SLM and detector planes are conjugate. In this case, ray illumination areas can be clearly defined and channel crosstalk is reduced. Detectors collect rays with equal angles from different LDs or inputs with preset weights, forming an output of a linear combination of input signals. This optical system, known as the LCOE, performs parallel, one-step calculations at the speed of light for inputs with preset weights. In the neural network community, this forward computation is termed “inference.” While the SLM-based LCOE is reconfigurable, it is better suited for inference tasks when its switching speed is lower.
For the backward direction, light is emitted by the LDs located on the second substrate, which also contains the PDs for the forward light propagation. To differentiate the LDs and PDs used for the backward direction from those used for the forward direction, we denote them by
and
in
Figure 2(a). The rays for the backward direction are depicted by thicker dashed lines (magenta and green). The LDs receive information such as
from the nearby output node or, more specifically, the electronic processor. The ray emitted by the LDs passes through lens 3, lens 2, and the SLM pixel in the case of backward directions. The SLM pixels for backward direction are denoted by a primed weight parameter, and they are positioned next to those for the forward direction with the same transmission values or weight factors. It should be noted that the backward rays pass through the SLM pixels at an angle, while the forward rays are horizontal. This characteristic can be explained by considering the properties of Köhler illumination, since BONN can be considered as a modified version of LCOE, which is based on Köhler illumination [
14,
15]. In BONN, lenses 2 and 3 form a projection system where the SLM and the second substrate are conjugate. Since the images of SLM pixels are
and
, the distance between SLM pixels is magnified into the distance between the LDs and PDs on the second substrate. Suppose that two rays
and
start from
and that they cross at the pixel labeled as
. An auxiliary or fictitious ray
is introduced to explain the actual light path from
. Since
follows a path similar to that of forward rays, it is horizontal at the SLM pixel. The angle between the two rays at the SLM plane and the second substrate is related to the magnification of the projection system. Therefore, if
has a certain angle from
on the second substrate, it will have a certain fixed slope at the SLM plane, resulting in a fixed position of
on the first substrate and maintaining a fixed distance from the corresponding
. In this manner, backward rays from different
can be collected by
on the first substrate. Through the backward light paths, the sensitivity information can be transferred to the electronic processors in the previous layer. This backward information transfer renders BONN suitable for use in the backpropagation algorithm.
Since the values of and are required for the calculation of , the detectors and electronic processor are better located near or pixels on the SLM substrate. The portion of light transmitted through these two pixels can be used to obtain the values of and . The values obtained by the detectors are provided to the electronic processor for the calculation of .
More detailed features of the LD for the backward direction are shown in
Figure 2(b). The use of a grating and a prism facilitate the emission of multiple rays and the control of their directions. These controls of the light source can also be achieved by using diffractive optical elements (DOEs) [
16], which allow for the generation of an arbitrary number of beams in arbitrary directions. For controlling the beam divergence, we can insert a lens between the LD and the grating. The function of a lens can also be incorporated into a DOE.
The proposed BONN system can be extended to implement multilayer neural networks in the direction of beam propagation since multiple BONN systems can be cascaded. For both forward and backward directions, the signal from the output node is directly provided to the corresponding input of the LD. Thus, two LDs for the two directions, two PDs for the two directions, and the corresponding electronic processor can form a synaptic node. The red dashed oval in
Figure 2(a) shows a synaptic node. An example of a multilayer BONN is depicted in
Figure 2(c). If the system has
M inputs,
N outputs, and
L layers,
M ×
N ×
L calculations can be performed in parallel in a single step. For forward calculations, this significantly increases the throughput, which is beneficial for real-time inference applications with continuous input flow. The increase in throughput resulting from the introduction of additional layers also applies to backward calculations, provided the input is continuous.
In
Figure 2(d), a 3D view of a sample system with a 2 × 2 input and a 3 × 3 output is presented for easy comprehension of the BONN setup by the reader.
Notably, incorporating incoherent light in BONN does not allow for the representation of negative weights, which are necessary for inhibitory connections in neural networks. While the use of coherent light and interference effects can enable the system to represent subtraction between inputs, such use may render the system complex and increase noise. To address this challenge, we employed two optical channels for one output, and we separated inputs associated with positive weights from those linked to negative weights, as depicted in
Figure 3. This method for handling negative weights has been previously used in forward direction optical computers such as the LCOE [
11] and lenslet array processors [
17]. Each channel optically sums the products of input values and their weights, which is followed by electronic subtraction between the two channels. It is noteworthy that when the positive weight is used, the corresponding weight in the negative channel is set to zero, and when the negative weight is used, the corresponding weight in the positive channel is set to zero. This subtraction approach streamlines the structure at the cost of an additional channel. The use of separate channels for positive and negative weights is termed “difference mode,” while the use of a single channel (seen in
Figure 2(a)) is termed “nondifference mode.” Since two channels are used for the forward direction and two more channels are used for backward directions, four channels or four SLM pixels are required to implement a single bidirectional channel, as shown in
Figure 3.