On the role of information transfer’s speed in technological and biological computations

Information is commonly considered as a mathematical quantity that forms the basis 1 of computing. In mathematics, information can propagate instantly, so its transfer speed is 2 not the subject of information science. In all kinds of implementations of computing, whether 3 technological or biological, some material carrier for the information exists, so the information’s 4 propagation speed cannot exceed the speed of the carrier. Because of this limitation, for any 5 implementation, one must consider the transfer time between computing units. We need a 6 different mathematical method to take this limitation into account: classic mathematics can only 7 describe infinitely fast and infinitely small computing system implementations. The difference 8 between the mathematical handling methods leads to different descriptions of the behavior of the 9 systems. The correct handling also explains why biological implementations can have lifelong 10 learning and technological ones cannot. The conclusion about learning evidences matches others’ 11 experimental evidence, both in technological and biological computing. 12


Introduction
The computing model proposed by von Neumann in his famous "First Draft" [1], 17 is bio-inspired, despite the common fallacies: he discussed the computing process im- 18 plemented in both technological (vacuum tubes) and biological (neurons) computing 19 systems. However, because von Neumann made omissions validated only for [the tim- 20 ing relations of] vacuum tubes, his simplified classic paradigm cannot be applied to other 21 technologies [2,3]. Because of the omission that the transfer time can be neglected apart 22 from the computing time, the implementations based on the classic paradigm are not 23 biology-mimicking ones. Von Neumann, of course, could not foresee the dawn of mod-24 ern computing technologies, but he warned that the computing paradigm must be revised 25 when the technology changes and that it would be unsound (sic) to apply his simplified 26 paradigm (not to be confused with his model of computation!) to neural computing. Given 27 that he clearly outlined that his proposal was about the logical structure of a computing 28 implementation, it is not a very usable classification criterion whether (the otherwise 29 undefined) von Neumann architecture is biomorphic (for a review see [4]) or not. 30 31 No doubt that von Neumann's model is valid for all kinds of computations: the 32 operand(s) (in the form of some physical carrier) must be delivered to the operating 33 unit where the computation takes place. The the computation cannot even start (as 34 pointed out by von Neumann [1]) until the operands are completely delivered to the 35 technology must adapt itself to the interface defined by von Neumann a three-quarter 48 century ago, and for vacuum tubes only. 49 We surely know that the simplified model is not valid for our current technology. 50 As the technology develops, it becomes evident that the classic paradigm cannot describe 51 real-world implementations, neither technological (electronic) or biological (neural) ones. 52 Furthermore, in technology, it leads directly [6] to the idea of unlimited computing 53 capacity and workload-independent processing time. In electronics, mainly the issues 54 experienced in connection with building so-called neuromorphic computers led the 55 researchers to the idea that "More physics and materials needed. Present-day electronics are 56 not enough" [7]. We can add: Present-day computing science is not enough: more physics 57 needed. 58 However, computing science does not want to admit that computing's physical im-59 plementations must also include physics, despite the existence of the ready-made mathe-60 matics [8]. In some sense, hundred years after inventing the Minkowski-mathematics, it 61 is still a scandal [9] to consider that the theory of technological implementation of computing 62 must also include some physics. The important consequences include (but not limited to) the 63 inefficient processor chips [10], the enormous power consumption [11], the experience of 64 "dark silicon" [12], the stalled supercomputer performance [13,14], the stalled Artificial 65 Intelligence (AI) development [15] [6], the failed brain simulation [16]. Interesting, that 66 the that-time "disciplinary analysis of the reception of Minkowski's Cologne lecture reveals an 67 overwhelmingly positive response on the part of mathematicians, and a decidedly mixed reaction 68 on the part of physicists" [9] has turned to the exact opposite: the description is generally 69 accepted in physics (and resulted in the birth of a series of modern science disciplines), 70 but completely refused in mathematics-based computing science. 71 In biology, it was evident from the beginning that the transfer (conduction) time 72 cannot be neglected aside from the computing (synaptic) time. The name "spatiotempo-73 ral", and a (separated) time dependence is commonly used [17], in the sense that Precise 74 Firing Sequence (PFS) "tended to be correlated with the animal's behavior"; furthermore, that 75 "the results suggest that relevant information is carried out by the fine temporal structure of 76 cortical activity" [18]. The "neural dynamics" was studied and "spatiotemporal spreading 77 of population activity was mapped" [19] by methods used to describe the static behavior: 78 interpike intervals histograms, autocorrelation and crosscorrelation. The correct method 79 of description is still missing, given that the major item of the behavior is missed: the  etc. The kind of the carrier and the transfer mechanism limit the speed of the information 89 transfer, so the physical size of the computing system matters. 90 In electronic technology, the transfer speed is in the order of 10 8 m/s; in biology 91 (speed of neural transfer), it is 10 1 . . . 10 2 m/s; that is, the speed of electronic signals is 92 several million times higher than the speed of neural transfer. At dozens of centimeters 93 physical size, the transfer time is also several million times lower than in neural systems 94 (such as our brain). Because of this difference, in biological computing systems, a 95 "spatiotemporal behavior" was assumed from studying the neural operation. In contrast, 96 in (electronic) technological computing systems, the "instant interaction" seemed to be 97 a good approximation initially. However, we have good reasons to introduce a finite 98 interaction speed both in science and computing technology [20]. 99 Initially, both technological and biological computing worked on the same msec 100 time scale [21]. In contrast, the information transfer speed was several million times 101 higher for technological computing than for the biological one. This difference made the 102 classic paradigm a proper approximation for technological computing. However, the 103 evolution of technological computing systems quickly invalidated the classic paradigm's 104 assumption: the processing time moved to the sub-nanosecond region, significantly 105 increasing the transfer-to-computing ratio [14]. Although the component density of proces- computers are much closer to those of our brain than those assumed in the abstract 116 model. According to the classic paradigm's inventor, it is unsound to use his simplified 117 paradigm for those timing relations.

118
Given that a physical carrier delivers the information (in all implementations), the 119 transfer time is crucial in all implementations of computing. As the computing model 120 requires, the operand must reach the operator unit's input section, and for all physical 121 carriers, the transfer speed is finite. Whether the transfer time can be neglected apart 122 from the processing time depends on the implementation technology.

123
In general, in a somewhat simplified view (see Fig. 1), the information must be 124 transferred between places (x 1 , y 1 ) and (x 2 , y 2 ), that requires transfer time. After that, the 125 computation can be done at place (x 2 , y 2 ), that takes computing time. The known speed of  [20,23].

135
The figure shows the picturesque difference between the discussed paradigms:  That is, the arrival of a spike is a synchronization signal as well: the zero time of the 155 synaptic conductance function gsyn(t) [25] is set to the arrival of the spike. Notice that 156 while receiving a spike, conductance gsyn(t) of the receiver synapse changes: after After the membrane reaches its threshold potential, the firing period begins, and the 164 computing time ends. However, it takes time (the spike's duration) until the signal 165 reaches the neuron's output section. This mechanism provides the auto-synchronization of 166 biological computing. This aspect is consequently neglected in spiking neural networks' 167 technological implementations; for a review, see [26]. In contrast, in technological implementations, due to the lack of start signal, a gate 199 (or another digital object) starts to "compute" its result as soon as any of its inputs changes, internally undefined states (and is the main reason for the inefficient processor chips [10] 207 and the enormous power consumption [11]), as shown in the operating diagram of 208 a one-bit adder in Fig. 2. The corresponding code in SystemC is shown in Listing 1.

209
Notice how the pure logical dependence, a consequence of the time-unaware paradigm 210 (called also "von Neumann style" [27]), is converted to temporal dependence by the 211 technological implementation, and that the adder performs payload (computing) work 212 only in the periods denoted by thick green arrows; the rest is non-payload (transfer) time.

213
Notice also that both computing and transfer times are partly parallel (overlapping).  Figure 2. The operation of a technological one-bit adder, with "pointless" synchronization (red circles), see Listing 1. The input signals a, b and c i are aligned along axis y (the input section), the computation takes part in gates aligned alog axis x, and the output signals c o and sum aligned again along axis y (the output section).
At the red circles at the bottom of the red arrows, one operand (the result of a 215 previous computation) arrives at its target gate (green arrowhead in the red circle), and 216 (after the computing time) it may change the state of the target gate (depending on its 217 previous state). However, the state, may be or may not be the final result of the operation: 218 the second operand is still missing (although the input section has a well-defined signal 219 level). After some time (the red arrowhead in the upper red circles), the second operand 220 also arrives, and it may change the state of the target gate (depending on its previous 221 state). In the time fraction corresponding to the red vector, the gate's output value is 222 undefined, and so is the result of the one-bit addition. Also, notice that upon arrival 223 of the second operand and performing the computation the gate represents, the signal 224 still must arrive at the adder's output section. Without synchronization, especially in 225 a several-bit adder, the information available in the output section of the adder may 226 change several times during the operation [20]. Also, notice that unlike in biological 227 implementation, the gates take power even if their input signals are missing. Year of production Compared to processing time Proc transfer Bus transfer Cache transfer Proc dispersion  [20,28,29]). In current technological implementations is already 243 noticed that the advantages of using central synchronization are lost, so recently, the idea 244 of asynhronous operation was proposed [30]. In our current technologies, the dispersion is 245 not negligible even within processors, and especially not in computing systems, from the 246 several centimeter long buses in our PCs [11] through the wafer-scale systems [31] to the 247 hundred meter long cables in supercomputers [13].  249 In biology, it is obvious from the beginnings that the measurable quantities change 250 with time and space: their behavior is called "spatiotemporal". It is similarly evident that 251 the brain, unlike technological computing systems, does not feature a unique, perfectly 252 synchronous clock to regulate communication and computing [32]. The common experi-253 ence [25] shows that the outputs of biological neurons depend not only on their inputs 254 (they compute with their inputs) but also on their internal state (they store informa- of learning: use present information to adjust a circuit, to improve future performance" [33]. 258 However, the definition needs to explain also, what is the "present information" which  The total operation cycle of the neuron is given as

Time, information storage and learning in biological implementation
Here we assume that the computation will not fail. T Triggering is the time the Target needs to collect the potential contributions from its synapses to reach the threshold, T Charging is the time needed to charge the membrane to its maximum potential, T Re f ractory is needed to reset the neuron to its initial state. Usually, some idle time T Idle (no neural input) also follows between the "computing operations", which can make T Computing longer. That is, the operating frequency (its firing rate, the inverse of T Computing ) of the cyclic operation is Given that T Charging and T Re f ractory are defined anatomically, the only way to adjust neu-  One mechanism is to collect more charge from the input spike that hits its synapse:  The "unused" (not transmitted to its membrane) ions must be pumped out, which is a 300 rather energy-consuming action. However, tuning gsyn() is a fast process: increasing the 301 transmitters' concentration can be implemented in a few msec periods. As observed [36],

302
"elementary channel properties can be rapidly modified by synaptic activity".

303
Another mechanism is to decrease the time needed to transfer a spike from A i to 304 Target: biology can change the corresponding axon's thickness. Given that the transfer 305 speed (conduction velocity) on thicker axons gets higher, the spike arrives in a shorter 306 time and in this way contributes to the membrane's potential at an earlier time (i.e., it 307 reaches its threshold potential earlier, too). This mechanism is less expensive in energy 308 consumption but needs significantly longer times to implement it. The aspect that 309 changes in conduction velocity "could have profound effects on neuronal network function in 310 terms of spike-time arrival, oscillation frequency, oscillator coupling, and propagation of brain 311 waves" [37], and that "Node of Ranvier length [can act] as a potential regulator of myelinated 312 axon conduction speed" [38,39] have been noticed, but the role of time in storing information 313 and learning has not yet been discussed.

314
Given that both mechanisms result in shorter T Triggering times, they cause the same 315 effect: the computing time (or in other words: the firing rate) changes. When the firing 316 rate changes, one cannot tell for sure which mechanism caused it. This equivalence is 317 why nature can combine the two mechanisms: the neuron can increase its firing rate 318 quickly (as a trial), and (on success, that is, if that learned condition is durable) it may 319 decide to reimplement ("remodel" [40]) its knowledge in a less expensive way. It makes 320 the corresponding axon thicker. After that, the needed weight W i , which was increased 321 for the short-term learning (experimental) period, can be decreased to conserve the 322 learned long-term (stable) knowledge to reduce the energy consumption. The effect (the 323 learned knowledge) remains the same.

324
Our findings seem to be supported by anatomical evidences that "individual anatom- processing" and that "the internode length decreases and the node diameter increases progres-327 sively towards the presynaptic terminal, and . . . these gradations are crucial for precisely timed 328 depolarization" [38]. However, the change of axons' thickness is measurable only after 329 weeks or months; presumably, so is the decrease in the transfer time to the synapse. It was 330 observed [40] that "neuronal activity can rapidly tune axonal diameter" and "activity-regulated 331 myelin formation and remodeling that significantly change axonal conduction properties are most 332 likely to occur over time-scales of days to weeks".

333
This mechanism reveals that short-term and long-term learning perform the same 334 action: they reduce processing time using two different biological implementations.   2 An interesting historical parallel that computer EDVAC used delay lines with msec processing time for information storage 3 The major goal of that investigation was to prove that learning to some measure is "pre-wired" The lack of possibility of storing information through changing timing adequately 377 led to the need to use a separated "memory" unit, where special signals (unlike "stored 378 and retrieved directly" [33] in biology) are used to store and retrieve the information.

379
The input and output sections of the model are implemented as numbered storage cells.

380
Given that computing systems are assembled from pre-fabricated functional blocks [43], 381 those sections must be wired to the processing unit. The synaptic weights W i are stored 382 in those memory cells and accessed through a shared medium: the bus. The shared 383 medium must be made "private" for the time of transferring data. Most of the time is 384 spent with contenting for the right of owing the bus, see Fig. 5, and in detail [20]. This Storing information in synapses has two major advantages: it is directly wired to the 388 computing unit, and all synaptic weights can be reached quickly and simultaneously [33].

389
In technological implementation, it would need storing all synaptic weights in processor 390 registers, which not only can be accessed "instantly" (without addressing contention), but is shared between all participating synapses, whether they contributed to learning or 431 not. This latter effect greatly contributes to the experienced unreasonably long training 432 times and over-fitting [48]; accompanied by the lack of explicit time parameters and 433 mis-fitting. does not enable implementing learning mechanisms similar to those known in biology, 439 and machine learning requires introducing "training" or "test" mode switches, for a 440 review see [4]. (Changing the weights of synaptic inputs is arbitrary and has no effect pointed out both experimentally [49] and theoretically [6], and is a major reason why 454 AI development stalled [15]. As the model requires, the computation (with a vector or 455 a matrix) cannot even start until the last element delivered to its corresponding input cycle" [53]. In brackets, however, fairly added, that (not counting the clock cycles that may 472 be required to fetch and store the input and output data). Yes, all operands of the memristor 473 array must be transferred to its input (and previously, they must be produced), and . This handling is believed to enhance the apparent efficiency of the system.

486
Given that the physically distant artificial neurons will have a longer delivery time, they 487 5 Five decades ago, even memristance has been introduced as a fundamental electrical component, meaning that the memristor's electrical resistance is not constant but depends on the history of current that had previously flowed through the device [35]. There are, however, some serious doubts as to whether a genuine memristor can actually exist in physical reality [52]. In the light of our analysis, some temporal behavior definitely exists; the question is how much it is related to material or biological features, if our time-aware computing method is followed.
will go to the end of the queue of the input events, so they have good chances to be 488 dropped (in biology, many times, distant neurons are controlling local neural assemblies; 489 so in their technical implementation the remote control will be missing).

490
If they are not dropped because of their late technological delivery, they will arrive 491 at a simulated time that has already passed. Given that the simulated time at the time of (several weeks on modern computers, for some problems), the potential for over-fitting (whereby 502 the learned function is too specific to the training data and generalizes poorly to unseen data), 503 and more technically, the vanishing gradient problem" [48]. In the light of our analysis, we  In biology, the observer starts his/her observation "in medias res": in a living 519 organism, where a stationary, stable "parameter space" exists. However, learning may 520 definitely change the pre-wired settings [41], both in the synaptic strengths and in the 521 number of participating network nodes. Studying learning (and especially pre-wired 522 states) needs extreme caution and is challenging not only from the technical point of 523 view.

524
In technology, the synaptic weights, connections, and other factors influencing 525 neural operation must be set up by the experimenter, requiring pre-wiring the system.

526
The lack of information about the initial parameter space settings, and the need to 527 produce a valid initial space setting for the beginning of the training, needs to make a 528 bargain. Usually, the system is allowed to learn (i.e., set its weights according to the 529 rules) from an initial random or uniform or some otherwise pre-set state. Given that no 530 synchronization signal is provided, the system assumes all signals to be valid, including 531 the different feedback, recurrent relations, and other signals (for a review see [4,26]).

532
This also means that the computed feedback may reach the neurons in the previous layer is topped by the difference that biology naturally implements a limitation for the speed 539 of change for its biological weights. The "need for speed" in technological computing 540 does not implement such moderation.

541
The role of time in learning comes to light demonstratively when analyzing video 542 recordings. This corresponds to the case that the training is based on a series of slightly 543 different samples, where different objects are varying at different speeds from frame to 544 frame. One time constraint is how frequently the sample object changes, the other one 545 how quickly its analysis is performed. One can expect that a slow change gives more 546 time for the system to learn; the rapid changes cannot be learned: seen too few times.

547
Our expectation on the role of time (mismatching) is confirmed directly via making 548 investigations in the time domain. "The CNN models are more sensitive to low-frequency 549 channels than high-frequency channels" [55]: the feedback can follow the slow changes with 550 less difficulty compared to the faster changes.