Power and Obstacle Aware 3 D Clock Tree Synthesis

Clock Network Design (CDN) is a critical step while designing any Integrated-Circuits (ICs). It holds vital importance in the performance of entire circuit. Due to continuous scaling, 3D ICs stacked with TSV are gaining importance, with an objective to continue with the Moore's law. Through-SiliconVia (TSV) provides the vertical interconnection between two die, which allows the electrical signal to flow through it. 3D ICs has many advantages over conventional 2D planar ICs like reduced power, area, cost, wire-length etc. The proposed work is mainly focused on power reduction and obstacle avoidance for 3D ICs. Various techniques have already been introduced for minimizing clock power within specified clock constraints of the 3D CND network. Proposed 3D Clock Tree Synthesis (CTS) is a combination of various algorithms with an objective to meet reduction in power as well as avoidance of obstacle or blockages while routing the clock signal from one sink to other sink. These blockages like RAM, ROM, PLL etc. are fixed during the placement process. The work is carried out mainly in three stepsfirst is Generation of 3D Clock tree avoiding the blockages, then Buffering and Embedding and finally validating the results by SPICE simulation. The experimental result shows that our CTS approach results in significant 9% reduction in power as compare to the existing work.


Introduction
In the past few decades, most of the integrated circuits (ICs) have become more complex so the problem of supplying accurate and synchronized clocks to all the circuits on the chip has become a major issue .This demand for high-performance and complex functionality in integrated circuits is primarily met through uncompromising device scaling.The advances in the semiconductor technology have led to the deep impact in the performances of VLSI circuits.In the present scenario, when the 7nm transistors are under development, the trend of transistor scaling seems to nearing the saturation.The further scaling of the transistors may cause unavoidable physical limitations which might not be cost effective.Also, the fact that device scaling results in large interconnect delay.The large interconnect delay i.e. the RC delay impacts the overall performance or execution of the circuits and hence causes increment in the power consumption.Power consumption is a very big challenge in modern day VLSI design which has pushed the performance to a secondary level.
Three-dimensional integrated circuit (3D-IC) is one of the most promising technologies with the objective to continue with the Moore's Law.A Typical view of a 3D-IC is shown in the figure 1.In many of the design criteria, 3D integration technology provides better performance as compare to the current 2D integration.In 3D-design the entire chip is divided into number of blocks and each of these blocks are placed on silicon layer, which provides the electrical connection between two different layers and hence, allows the clock signal to distribute among different blocks.Clock tree Synthesis (CTS) is an important factor which holds vital importance in any integrated circuits, and hence, controls the performance of entire circuit.The main objective of CTS is to distribute the clock signal among all the clock points of the sequential elements present in the die with minimum wire-length, reduction in the area of chip, constrained timing parameters, and low power.3D ICs consist of multiple dies with Through-Silicon Via (TSV) stacked in between.TSV provides vertical interconnection between two dies and thus, allows the electrical signal or the clock signal to flow from one die to other die [11].The area occupies by a TSV on die is much larger than any gates.Also there are reliability issues concern with TSV, which is very important factor for the industrial use [1,2].There are only very few industry standard available for TSV based ICs, manufacturing and packaging.
In 3D clock distribution network, it is very imperative to understand how TSV resistance and capacitance affect the performance of the network.More number of TSVs result in less wire-length, but at the same time TSVs holds higher capacitance, which causes increase in total power dissipation [3,4].For this reason, we have focused on only single TSV.
2. We propose a efficient methodology, to design 3D clock distribution network with obstacle avoidance.3. Buffering and embedding is performed on clock network for calculation of different timing parameters and total power dissipation [8,12,13].(considering TSV resistance and capacitance) 4. Ng-Spice simulation is done after the buffer insertion process to check for the slew and skew constraints violation.

Results
The study of displayed work is mainly focused on designing of 3D-clock tree network i.e. two dies stacked 3D-ICs based on single-TSV model.As our essential concern is power reduction, so as a matter of first importance we demonstrate the efficiency of our proposed procedure.Additionally we investigate the impact on clock power when TSV is added to 3D-IC package with multiple dies.Also ,the benchmark with blockages are considered while performing routing of clock signal.A check is mandatorily to be done for all the nodes or sinks lying near to the region of blockages placed.An effective procedure is conveyed to route through these rectangular blockages or obstacles.After that insertion of buffer is done depending upon length of the wire segment.Figure [2][3][4][5][6][7][8][9] are the designed 3D Clock Tree Network and transient waveform for benchmark ISPD-09f22, ISPD-09f33, ISPD-09f34 and ISPD-09f35 respectively.return root(Z)

Different Equations used for construction of clock tree network
The modeling of connecting wire is completed using π model.As indicated by Tsay, [25]the condition that ensures skew to be zero for the above issue as demonstrated in the figure is given by the equation ( 1) In the above equation, r1 and r2 are the resistance and c1 and c2 are the capacitance of the connecting wires of two leaf nodes .These parameters are found out by using the following equations (2) which is given below.
The time delay taken for signal to go from the tapping point to the relative leaf node is represented by the letter t1, t2.This delay is computed by the Elmore delay method as clarified earlier.In the above equation ( 2), symbol ' α ' represents per-unit length-resistance and symbol ' β ' represents the per-unit length-capacitance.This are industry specified standards accessible from the standard benchmark circuits.The location of tapping point on wire segment interfacing the two leaf nodes is figured by equation (3)-

Discussions
All the approval of results given, depend on the SPICE simulations.The benchmarks taken under the test analysis are standard ISPD-09 benchmark circuits.As these benchmark circuits are fundamentally designed on the basis of 2D ICs.So for undertaking the study and analysis of 3D-systems, sinks in these benchmark circuits are arbitrarily distributed to two dies.

Simulation Settings
The parameter utilized as a part of the outline which depend on technology node are taken from the 45nm-Predictive-Technology-Model [29].The clock-frequency used in this simulation is 1 GHz.These are: the per-unit-length-resistance is indicated by symbol α and its value is 0.1 Ω/um.The per-unit-lengthcapacitance is represented by the symbol β and its value is 0.2 fF/um.The buffer from the buffer library is parameterized by the components like the capacitance contribution of 35 fF, Resistance of 61.2 Ω.The parametric information of TSV are available in wide range.So for the analysis part we pick one with 53 mΩ of resistance and 27.9nF of capacitance.The voltage supply of 1.2 V is used for simulation.As mentioned in the above sections, the parameter CMAX is fixed to 250 fF for controlling the slew and skew value.

Observation
Comparison of proposed work is done with the reference- [10].For the convincing comparison of work done, we have utilized the same benchmark circuits.The work is compared for parameters like total wire-length, total power dissipation, clock-skew and delay .From the table (1), it is observed that wirelength has increased as we have used single TSV in our proposed work as compare to the multiple TSV in Reference [10].From table (2), it can be observe that delay has reduce due to use of Capacitance Driven Buffering (CDB) methodology.From table (3), it is observed that skew value is also increased at the cost of power reduction.But the value of skew is within the range specified by ITRS.From table (4), it can be observe that power is reduced to 9 % as compared to the reference work.As, TSV has its own resistance and capacitance which counts for increment in power dissipation.The slew is calculated for all the benchmark used in the proposed work and it is well under the limit specified by the ISPD.

Overview
In our paper, 3D clock tree synthesis mainly completed in three steps.1) Generating 3D abstract clock tree topology using single TSV, 2) Routing avoiding the obstacles, 3) Buffering and embedding for clock skew and slew control.Generation of 3D clock tree topology with obstacle avoidance is completed in further Let us consider set of sinks 'Z' given in a benchmark circuit.Initially, all the sinks are divided in two equal numbers for two dies.All the sinks will have x-coordinate, y-coordinate and z-coordinate.Here zcoordinate corresponds to the die number in which the particular sink belongs.Now, according to the MMM algorithm, the sinks are partitioned initially and then merged in bottom-up manner using the tapping points calculated using exact zero skew algorithm process.This wires are connected taking care of minimum wire-length.As interconnecting wires have its own resistance and capacitance value which counts to total power dissipation.And then the final tapping point of each die are connected through TSV.

Obstacle aware algorithm
The blockages or obstacles present on the dies are nothing but the macro blocks like RAM, ROM, PLL etc.
which are fixed during the placement process.These obstacles becomes unavoidable obstacle while routing the clock signal from one sink to other sinks.The provided blockage information from ISPD benchmark circuit is put into the matrix through structures of data.To avoid the blockage a sorting technique called binary search is performed.The sinks located nearer as well as around the obstacles or blockages should be steered in such a way to the point that their routing path do not smash into the limits of obstruction [27][28].

MACROS (obstacle/ Blockage
Figure 12 : Inter-connection of sequential components avoiding the Obstacle Clock tree network including the obstacles involves three primary steps.First step is storage of blockage information and then the second stage is to apply binary search algorithm to search for nodes nearer to the obstacle and finally the routing of wire avoiding the blockages.While using the binary search algorithm the sink locations are investigated and the blockage information is put into the matrix.Now at a particular point the restriction of the obstacle are compared with the routing path of various sinks and a ultimate choice is made whether this computed separation of routing is more than the limits of the blockages keeping in mind the end goal to keep any kind of clash with the considered blockages [7].
Figure 3 shows how the blockage/obstacle is avoided while routing the clock signal.

Buffer Insertion algorithm
After the generation of abstract clock tree with final tapping point, buffering is also a very essential procedure of designing any clock tree network.In the proposed methodology, the proposed method for buffering depends on capacitive-load and the strategy is named as Capacitance-Driven-Buffering (CDB) strategy.The work of CDB if to estimate the ideal location of buffers.The info given to the CDB procedure is clock tree with the exact location of the internodes.The CDB technique depends on the subject of setting the CMAX value to some value.For our work we have fixed the value at 250 fF.For each root in a top-bottom manner.If load value crosses the set value of CMAX, insert a buffer and for the respective sub tree update the solution set for that subset add these information to the tree else traverse towards child and repeat the same process [8,23].Now, update the tree for that node and for the updated clock tree route sinks to the clock source in bottom-up manner.
The buffering strategy continues in a top down style.The strategy is that visualized crossing a tree arrange from the parent node to its child node, and keeping the count on load of parasitic as we move along the wire segment.This load will increase as move down the wire segment.So, there will be a point, where the parasitic load check will go more than the fixed value of CMAX, then there we embed a buffer.Now from that location, reset the count for parasitic load to zero and follow a similar procedure from that node location.This procedure is done till the we reach to all the sink node.Hence for the total tree, buffer location are found.The buffer location information is given to the clock tree and then the clock tree is updated.Hence, in this way 3D clock tree is updated again which contains information about the exact location of the buffer.

Conclusion
The proposed work is a solution methodology for the issue concerned with the designing of the clock tree network for 3D integrated circuits with single Through-Silicon-Via (TSV) and avoidance of unavoidable obstacles.In this work, the procedure utilized is a combination of different algorithm.The work performed is in well ordered manner to get the desired result.The input given to this work flow is the Standard-benchmark circuits i.e.ISPD benchmark and technology parameters.The output obtained following the proposed work is a completely well connected 3D clock network with major concern on power and obstacle aware.The result of the work is validated by the SPICE simulation.The methodology used in this particular work has resulted in significant reduction in over-all power dissipation, which is one of the major concern now a days.Additionally, the quality or strength of the signal is also maintained using buffer insertion process.The timing parameter like skew value, slew value and delay is also observed which is well under control and as per the standard defined by ITRS [30].Along with this, we have concentrated on the impact of CMAX value.

Future Work
The work implemented is mainly focused on the single TSV model of the 3D clock systems and also the obstacles are placed only on one of the die.The exploration study demonstrates that multiple TSV helps in reducing total wire-length [8,19].But in the meantime, the multiple TSVs acquire alternate difficulties as well.The TSV being a hole created in the silicon material occupies large silicon-region, and also the fact that TSVs have its own resistance and capacitance which counts to increase in total power consumption.
When coming to fabrication of TSVs , the chances of error is very high.In this way, the TSV check and its area is a critical issue.This issues will be pondered on our future work for different TSV-model of a 3Dclock-arrange.Also the work will be done by placing the obstacles on both the dies or multiple dies.

Figure 1 :
Figure 1: Typical view of two-die stacked 3D ICs CDN

Table 1 :e n c h m a r k # S i n k s # O b s t a c l e s E x i s t i nTable 2 :Figure 10 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 1 : 2 :
Figure 10: Power comparison (Reference power and Proposed power) for different benchmark 2.3 Algorithm and Equations 2.3.1 Methodology used for the proposed work