# Design of On-Chip Testing Memory for High Speed Circuits

M.V. Sushumna CVR College of Engineering/ECE, Hyderabad, India Email: sushumna451@gmail.com

Abstract— Mixed-signal processing systems especially data converters can be reliably tested at high frequencies using on-chip testing schemes based on memory. In this thesis, an on-chip testing strategy based on shift registers/memory (2 k bits) has been proposed for digital-to-analog converters (DACs) operating at 5 GHz. The proposed design uses word length of 8 bits in order to test DAC at high speed of 5 GHz. The proposed testing strategy has been designed in standard 90 nm CMOS technology with additional requirement of 1-V supply. This design has been implemented using Cadence IC design environment.

The additional advantage of the proposed testing strategy is that it requires lower number of I/O pins and avoids the large number of high speed I/O pads. It therefore also solves the problem of the bandwidth limitation that is associated with I/O transmission paths. The design of the on-chip tester based on memory contains no analog block and is implemented entirely in digital domain. In the proposed design, low frequency of 1 MHz has been used outside the chip to load the data into the memory during the write mode. During the read mode, the frequency of 600 MHz is used to read the data from the memory. A multiplexing system is used to reuse the stored data during read mode to test the intended functionality and performance.

In order to convert the parallel data into serial data at high frequency at the memory output, serial converter has been used. By using the frequencies of 1.25 GHz and 2.5 GHz, the serial converter speeds up the data from the lower frequency of 600 MHz to the highest frequency of 5 GHz in order to test DAC at 5 GHz.

*Index Terms*— CMOS, on-chip memory, Testing of high-speed MSPS circuits, Shift register, Clock divider, Clock multiplexing, Serializer, Multiplexer, Synchronous sequential circuits.

# I. INTRODUCTION

The Complementary metal-oxide semiconductor (CMOS) technology has been considered as the

dominant technology for the very large scale integration (VLSI) chip design [1],[2] because CMOS technology has become the basis of the modern digital integrated circuits because of the increased performance in terms of high speed due to continuous scaling. Moreover, it provides high speed with low cost transistors on the same chip.

As digital circuits cost is very low or almost free in ultra-deep submicron CMOS technologies, therefore nowadays the interest in on-chip testing is increasing by increasing the complexity of the VLSI digital circuits. The on-chip testing is less costly than the testing based on external instrumentation because of the increased performance requirement of the chip in terms of high speed operation. Moreover, it is becoming less practical that the tester be manufactured on other semiconductor chip when the device and the tester can easily be manufactured on the same single chip, especially in ultra-deep submicron technologies where the transistor has been largely scaled down. Furthermore, system-onchip (SoC) allows designing and fabricating digital, analog, and mixed-signal integrated circuits on the same chip.

As the complexity of the mixed-signal processing systems (MSPSs) is increasing, the challenges of testing are emerging. It is difficult to connect the external test instrument to the chip without loss or distortion in high speed testing. So, the testing of integrated circuits (ICs) by using external instruments has become very complicated due to the high performance requirements.

The measurement of very high frequency suffers from degradation of core circuit performance because of the bandwidth limitations. The bandwidth limitations are caused by the physical nature of the I/O pads and physical length of the transmission path [3].

The on-chip memory tester overcomes these problems. The high frequency is generated from another on-chip circuit. The proposed design provides clock divider to divide this clock frequency into four frequencies which can perform high frequency tests. In such case, the bandwidth limitation problem imposed by I/O transmission paths is solved. Hence, the on-chip memory (2 k bits) is included in the proposed design in order to avoid a large number of very high speed I/O pads and the proposed design can use the lower number of possible pins [4]. The low frequency during write operation can be driven from outside the chip to write the data into memory. This reduces the cost and complexity of the design. In addition, the serializer is included in the design in order to test the intended device at high frequency.

When choosing the word length and depth of the memory, many aspects have been taken into account, such as total area consumption, power consumption and design complexity. These aspects are affected by the memory, serializer and clock distribution network needed in the design. In order to choose the optimum design parameters a tradeoff between the frequency of the first stage of the serializer and the word length of the memory should also be taken into account. The higher the frequency of the first stage of the serializer, less bits are needed to be taken out from the memory unit (lower word length). But at the same time, increasing the frequency of the first stage of the serializer increases the overall area and power consumption of the design. It will also increase the design complexity. Therefore, lower frequency of 625 MHz and higher word length of 64 bits with depth of 32 have been chosen.

## **II. OVERVIEW OF THE DESIGN**

The proposed design is composed of six main units: serial to parallel conversion, memory, serializer, clock divider, control, and clock distribution network as shown in Figure 1. Serial to parallel conversion unit consists of 64 memory elements. The memory unit consists of 2048 memory elements. The multiplexing system is also presented in the memory unit design. The memory unit stores the data at low frequency and reuses the stored data at high frequency (625 MHz). The serializer unit uses different frequencies 625 MHz, 1.25 GHz, 2.5 GHz, and 5 GHz in order to speed up the stored data from 625 MHz to the 5 GHz to test at high frequency. The clock divider unit is responsible for dividing the frequency (5 GHz) into four frequencies (clocks) in order to use them in the different parts of the design. The control unit is responsible for generating all the control signals that the design needs. The clock distribution network consists of multiple stages of buffer in order to drive the signal from its source to the terminal port.

The word length of the memory (64 bits) can be considered as 8 groups of 8 bits. Therefore, the output of each group is connected to one of 8:1 serializer unit. So, the entire serializer design consists of 8 pages of 8:1 serializer unit.



Figure 1: Block diagram of proposed design

# A Serial to Parallel Conversion Unit

The serial to parallel conversion technique is needed to convert the serial input data into parallel output data. Thereby, the output data of the serial to parallel conversion unit are the input data of the memory unit. This technique is active during the write mode. The conversion from the serial format to the parallel format is done by using serial-in/parallel-out (SIPO) shift registers. These registers work at the frequency of 1 MHz (clk1M signal). According to the proposed design, the memory unit has 64 bits as a word length with depth of 32. Thus, 64 registers are needed to convert the data into parallel format. Each data bit is shifted during one clock cycle of clk1M signal. After 64 clock cycles, the 64 bits of data are stored into these registers. It is important to ensure that each new set of 64 bits data should be stored first into these registers and afterwards, the data are shifted into the memory.

The advantage of this operation is that each set of 64 bits data is going to be valid at the output of the memory at the same time. After the memory is filled, the clock signal (clk1M) of the serial to parallel conversion unit is turned off. This means that the serial to parallel conversion unit is off and thus no more data enter into the memory.

# B Memory Design

The memory unit is one of the most important units in the proposed design. From the specification requirement of the design, 2k bits memory should be designed to store the data at low frequency and read these data at high frequency in order to test the intended device. Integrated circuit design, involves tradeoffs among many factors, such as speed, power consumption, chip area, and cost. In this paper, the main objective is to generate the on-chip input data at 5 GHz for testing of high-speed mixed-signal circuits. Therefore, in this particular case, the most important design requirements on the memory are the speed and operational robustness. As will be explained in the following, robust and low-power shift registers are utilized as memory cells, and the high-speed memory readout is enabled with a 3-step successive multiplexing of 64 bits at 625 MHz to 8 bits at 5 GHz.

The proposed memory (2k bits) has two dimensional array of shift registers (64x32 cells). One register denotes one cell memory (one bit). The transistor size of the register can be small. The proposed memory operation concept is very simple. The data transfer from one DFF to the adjacent DFF during single clock cycle and so on. Thus, the first data bit entered into the memory, is the first output data bit from the memory. To perform write and read operations this memory requires two clocks. The clock port of the memory passes the write clock (low frequency) during the write mode. After the memory is filled, the clock port passes the read clock (high frequency) in order to read the stored data during the read mode. Therefore, clock multiplexer should be designed in order to perform the multiplexing from the write clock to the read clock. When the whole memory is filled, multiplexing system should be provided in the memory unit in order to reuse the stored data for testing during the read mode. So, 64 multiplexers should be designed. Moreover, control unit to manage the multiplexers for writing and reading operations should be also designed. The whole memory unit with the serial to parallel conversion unit is shown in Figure 2. The memory unit and serial to parallel conversion unit use static traditional master-slave positive edge-triggered registers using multiplexers. They are distributed as shown in Figure 2. Note that there is a buffer stage after each register in order to fulfill the timing requirements between two adjacent registers. In contrast to the serial to parallel conversion unit, the memory has two modes of operation, write and read modes.



Figure 2: The whole memory unit with the serial to parallel conversion unit.

When the memory works at write mode, clk1M-625M signal is high all the time (64 clock cycles of clk1M) except the last half clock cycle (low). This means that after each set of 64 bits data has been converted from serial to parallel, these data are shifted simultaneously into the memory. Note that the stored data inside the memory will not be shifted to the next column until the new set of 64 bits data is stored again after the serial to parallel converter. The memory is filled after 32x64 times the clock cycles of clk1M. The write mode is ended and the serial to parallel conversion unit is deactivated while the read mode is started by activating the multiplexing system.

The multiplexer consists of two pass transistors forming transmission gate, inverters, and buffers. In order to overcome the charge-sharing problems, inverters are added before and after the transmission gates. Besides, the output signal of the multiplexer is going to be more robust because the transmission gates produce degraded signal. The buffers can be needed to satisfy the timing requirements for the circuit. This type of multiplexer consumes very low power and offers high speed performance [5]. So, it is faster than the multiplexer based on gate level approach because of the slower charging and discharging operations in the later approach. Thereby, the use of the proposed multiplexer in a high speed circuit design will be very helpful than the other. In addition, the proposed multiplexer has less number of transistors than the other. This schematic is not used only in the memory unit; it is also used in other different units in the proposed design, such as control unit and serializer.

## C Clock Divider Unit

The clock divider is needed to generate the required clocks from the original frequency (EXclock signal) which is 5 GHz. The output signals (clocks) of the clock divider are clk5G, clk2.5G, clk1.25G, and clk625M, respectively.

There are two design topologies for the clock divider unit; asynchronous counter and synchronous counter. Using the asynchronous ripple counter, presented in [6],[7] reduces the power consumption due to the small capacitance at high frequency node. However, the jitter problems are increased because the jitter will be accumulated stage by stage. On the other hand, using the synchronous counter increases the power consumption due to the large capacitance at high frequency node, but the advantage is that the jitter accumulation problems are reduced [7]. In addition, the clock synchronous divider topology eliminates any cumulative time delay because all the DFFs are connected together to the same single clock. Therefore, all output clocks change simultaneously at the rising edge of clock, and the maximum frequency of the synchronous counter will be significantly higher than the asynchronous ripple counter. The proposed clock divider schematic is based on a synchronous 3-bit counter as shown in Figure 3.



The clock divider schematic consists of three T flipflops (TFFs), one AND gate, and buffers. The TFF consists of one DFF and one non-inverting buffered 2:1 multiplexer unit, as shown in Figure 4.



Figure 4: T flip-flop.

The DFFs are operating at the high frequency (5 GHz). The buffer before (clk5Ga) node is used to drive the gates after this node and makes the signal more robust, especially when the rise time and fall time of the output signal (clk5G) should be less than 20 ps across different process corners. In addition, the rise time and fall time of the output signals (clk625M, clk1.25G, and clk2.5G) should be less than 100 ps, 80 ps, and 20 ps, respectively, across different process corners. The rising edge of all signals (clcks) should happen at the same time. To achieve this condition, the  $\bar{Q}$  node of the first TFF is connected to the input of the second TFF, and the  $\bar{Q}$  node of the second TFF is connected to the input of the clock divider unit are shown in Figure 5.



Figure 5: Output signals of the clock divider unit.

In order to achieve proper synchronization between the clocks, additional buffers are introduced. The synchronous buffers of the different clocks are very useful to use them in this phase of designing to facilitate minimization of the skew effect after the clock distribution. Thereby, four different clock signals (clk5G, clk2.5G, clk1.25G, and clk625M) have been generated, and the required performances have been achieved. The clk5G, clk2.5G, and clk1.25G are connected directly to the serializer unit while clk625M (Rclk signal) is connected to the clock multiplexing unit inside the control unit.

#### D Control Unit

The control unit is one of the most important units in the design because it is responsible for fetching the CLOCK, ENABLE, RESET, and DATA signals (from outside chip) and uses them with the clock divider unit to generate all the control signals that are needed in the design. All the input signals of the control unit are fetched from the outside chip except the Rclk signal (read clock) which is fetched from the clock divider unit.

The proposed control unit design consists of the following units: interface, clock multiplexing, non-inverting buffered 2:1 multiplexers, pulse generation, and enable element units. The overview of the control unit is shown in Figure 6.



Figure 6: Control unit components.

#### *i) Interface Unit*

The interface unit consists of master-slave negative edge-triggered registers using multiplexers, drivers, and buffers. It is responsible for fetching the CLOCK, DATA, ENABLE, and RESET signals (from outside chip) to the proposed design. The interface unit schematic is shown in Figure 7. The Wclk signal is derived from the CLOCK signal by using driver, and it is used to synchronize the other control signals. Thus, the output signals of the interface unit are synchronized at the falling edge of the Wclk signal. The driver before the reset signal is used to increase the fan-out of the reset signal and to drive the gates in the pulse generation unit. All the timing regions of the interface unit have also been shown in the same Figure.



Figure 7: Interface unit schematic.

#### ii) Clock Multiplexing Unit

The memory has two operation modes, write and read mode. During write mode, WRclk signal is considered as the Wclk signal (1 MHz), while during read mode, WRclk signal switches to the (Rclk) read clock signal (625 MHz). Thus, the clock multiplexing unit is needed to generate the write-read clock signal (WRclk). The advantage of generating the WRclk signal is to generate the memory clock signal (clk1M-625M). During the clock multiplexing operation, the clock multiplexer should be switched from clock to another without introducing any glitch at the output of the multiplexer.



Figure 8: The glitch free clock switching for unrelated clocks technique

The clocks may be multiples of each other, or totally unrelated to each other. These two different methods of implementing a glitch free clock multiplexing are presented and discussed in detail in [8]. The write and read clocks may not be related, because they are generated from different sources. So, the glitch free clock Switching for unrelated clocks is used as shown in Figure 8. Hence, no data are missed during the clock switching operation.

The glitch may happen at the output of the multiplexer in case that the output signal switches from the current clock to the next clock directly when the select signal changes. The two negative edge-triggered DFFs are added first in the selection path in order to prevent any kind of glitch at the output of the clock multiplexer where the clocks are multiples of each other by using the feedback from the selection of one clock to the other forward clock. This operation disables the current clock to propagate directly to the output and waits for the next clock before the propagation. Thereby, any glitches are avoided when the clocks are multiples of each other. In order to use this criterion to avoid the glitches when the clocks are completely unrelated, two positive edge-triggered DFFs are added in the selection path. The selection signal or the feedback selection signal may be applied in asynchronous manner. The meta-stability caused by these signals is avoided after adding these DFFs [8].

The read mode is activated after the memory is This means, after (64x32xTWclk), the filled. ENABLEOUT signal goes low. Then, the output clock (WRclk) is switched to the read clock after a certain time. In order to avoid missing data, during clock multiplexing operation caused by the waiting state, the ENABLEOUT signal should be taken from the node. The output clock (WRclk) is considered as a write-read clock. In other words, the WRclk signal is equal to Wclk signal during the write mode, while the WRclk signal is equal to Rclk signal during the read mode.

## iii) Pulse Generation Unit

After each 64 clock cycles of clk1M signal, the 64 bits data are stored first into the serial to parallel conversion unit before shifting these data into the memory.



Figure 9: Schematic of the pulse generation unit

In order to create single pulse signal in each 64 clock cycles of clk1M, pulse generation unit is presented. The pulse generation unit is based on the 6-bit counter with synchronous reset in order to reset the counter. In addition, 6-input NAND gate is used to generate the needed (counter pulse) signal from the output of the 6bit counter. The schematic of the pulse generation unit is shown in Figure 9.

## iv) Enable Element Unit

The enable circuit element is used in order to generate the enable signal of the memory unit. It consists of one DFF and buffer as shown in Figure 10. The enable signal is derived from the enable SER signal in each clock cycle of the WRclk signal. The enable signal is responsible for activating the multiplexing system in the memory unit during the read mode. During the write mode, the enable signal is high, so the multiplexing system of the memory unit is deactivated until all the data are stored into the memory. While during the read mode, the enable signal is low, so the multiplexing system in the memory unit is activated and allows the memory to reuse the stored data to test the intended device after the serializer unit.

The most important issue is that, before the multiplexing system is activated, the data should be completely stored into the memory. The multiplexing system may loose one bit data in each row of the memory when enable selection goes from high to low. This happens if the enable signal goes low before the clk1M-625M signal goes high (after the distribution circuits). Thus, the unwanted data are entered into the memory from the multiplexing system. This may happen because of the propagation delay of the enable signal is much less than the propagation delay of the clk1M-625M signal. In order to solve this problem, buffers are introduced before the enable distribution in order to be sure that the enable signal goes low after the rising edge of the clk1M-625M (after the distribution circuits).



Figure 10: Schematic of the enable element.



Figure 11: Output signals of the control unit.

The control unit is used to generate the control signals of the proposed design. The design of control unit is successfully implemented, and the functionality of the control unit is achieved. The input signals of the control unit are CLOCK, ENABLE, RESET, DATA & Rclk. The output signals of the control unit are DATAOUT, clk1M, clk1M-625M, enable & enable SER. Finally the control signals of the control unit are shown in Figure 11.

# E Serializer Unit

The serializer unit is used to convert the parallel data at low frequency into serial data at high frequency. In other words, it is responsible for speeding up the data from low frequencies to high frequencies. Thereby, the high speed issues and the robustness are considered as the most important keys of designing the serializer unit.

The proposed architecture of the serializer unit is based on the traditional tree structure technique and was implemented as an arrangement of 2:1 serializer units [9],[10]-[12]. In order to convert 64 bits parallel data at low frequency to 8 bits data streams at high frequency, 8 pages of 8:1 serializer with 5 GHz serial data at the output are presented. Each 8:1 serializer consists of seven 2:1 serializer units (four Unit625M, two Unit1.25G, and one Unit2.5G). Each set of 8 bits data is separated into even and odd data. At the output of the serializer unit, these data are serialized separately into 1 bit streams at 2.5 GHz. Then, the data are resynchronized at 5 GHz at the output. So, each 8 bits stream is interleaved into one bit streams at 5 GHz. In order to get the correct sequence as [b7, b6, b5, b4, b3, b2, b1, b0] at the output, the input data sequence should be as b0, b4, b2, b6, b1, b5, b3, and b7, respectively. The proposed 8:1 serializer unit is shown in Figure 12.



Figure 12: Proposed 8:1 serializer unit.

## **III. RESULTS**

In the proposed design (implemented in 90 nm process), the transistors were set to the minimum channel length, and the used width ratios of NMOS: PMOS were 1:2.5 and 1:2 ratio for the inverter and the transmission gate, respectively. This results in equalization between the rise time and fall time as much as possible at the output node. The simulation results satisfy the process corners TT @ 50 °C, SS @ 100 °C, and FF @ 0 °C temperatures, respectively. For simulation purposes, write frequency used is equal to 156.25 MHz instead of 1 MHz in order to reduce the simulation time of the design. The parameters of the testbench are the following: TEXclock = 200 ps, tr = tf = 100 ps, TCLOCK = 6.4 ns, TDATA = 12.8 ns, TRESET = 6.4 ns, TENABLE = 64x32xTCLOCK = 13107.2 ns.

The test-bench of the complete system is shown in Figure 13. The parameters are the following: TEXclock

= 200 ps, tr = tf = 100 ps, TCLOCK = 6.4 ns, TDATA = 12.8 ns, TRESET = 6.4 ns, TENABLE = 64x32xTCLOCK = 13107.2 ns. The data input sequence is the following: [0 1 0 1 0 1 0 1 0 .....]. The functionality of the output data is shown in Figure 14.



Figure 13: Test-bench of the complete system.



Figure 14: Simulation results of the complete system.

According to the different  $\delta$  calculations Tables for different clocks in the design, the acceptable range of positive and negative  $\delta W$  is shown in Table 1.

TABLE I ACCEPTABLE RANGE OF POSITIVE AND NEGATIVE  $\Delta W$  FOR DIFFERENT CLOCKS

| Between      | Positive δW | Negative <b>\delta</b> W |  |
|--------------|-------------|--------------------------|--|
| Different    | (ps)        | (ps)                     |  |
| Clocks       |             |                          |  |
| Low clock =  | (clk1M      | 625M)                    |  |
| (clk1M)      |             |                          |  |
| High clock = | 625M)       | (clk625M)                |  |
| (clk1M       |             |                          |  |
| (clk625M)    | (clk1.25G)  | 663.5 35                 |  |
| (clk1.25G)   | (clk2.5G)   | 265.5 33.5               |  |
| (clk2.5G)    | (clk5G)     | 115 34                   |  |

| TABLE II                          |
|-----------------------------------|
| POWER DISSIPATION OF THE PROPOSED |
| DESIGN ACROSS DIFFERENT PROCESS   |
| CORNERS                           |

| Power Dissipation    | Different Process Corners |         |       |
|----------------------|---------------------------|---------|-------|
| of the main parts of | TT,50oC                   | SS,100o | FF,0o |
| the design           |                           | С       | С     |
| Iav => Pdissipation  |                           |         |       |
| = Iav • vdd $=$ Iav  |                           |         |       |
| (mW)                 |                           |         |       |
| Control unit         | 0.10                      | 0.08    | 0.16  |
| Clock Divider Unit   | 1.13                      | 1.07    | 1.15  |
| Serial to Parallel   | 3.65                      | 2.93    | 6.59  |
| Conversion Unit      |                           |         |       |
| +                    |                           |         |       |
| Memory Unit          |                           |         |       |
| Serializer Unit      | 0.50                      | 0.39    | 1.04  |
| Total power          | 5.38                      | 4.47    | 8.94  |
| Dissipation          |                           |         |       |

# CONCLUSION

The on-chip memory based tester can be considered as a new circuit builder for testing devices of MSPSs at high frequency in SoC environment. The enormous technology evolution of CMOS has largely improved the performance of digital ICs in terms of low cost, high speed operation, and area consumption. Moreover, during high speed operation, the new on-chip tester methodology overcomes the bandwidth limitations problem imposed by I/O transmission paths due to the old instrumentation tester methodology. In order to avoid a large number of very high speed I/O pads, the on-chip memory is implemented in the tester [4].

Thereby, the on-chip tester has become the dominant choice in deep submicron technologies as compared to the old instrumentation tester. But there exists a tradeoff between many factors, such as area, power consumption, timing requirements, robustness, and simplicity for tester design in 90 nm CMOS process. In addition, there are big challenges facing digital circuit's designers, such as fetching the data from the outside chip into the on-chip memory, designing the on-chip memory with low area consumption, speeding up the data from low frequencies to high frequencies without losing any data, and designing the serializer with low power consumption.

To overcome these challenges, the serial to parallel conversion unit based on shift register has been designed in order to fetch the data into the on-chip memory at low frequency, shift registers based on master-slave DFF are used as a memory elements in the on-chip memory design, an arrangement of 2:1 serializer units based on a traditional tree structure [9],[10]-[12] is designed in order to speed up the data, and different clocks are generated from the clock divider unit to use them in the different parts of the

design. This thesis has addressed the implementation of on-chip memory design in order to test MSPS devices with a word length of 8 bits at a rate of 5 GS/s. The simulation result in schematic level design shows that the on-chip tester design has been successfully implemented, and the specification requirements of the design have been achieved.

## REFERENCES

- P. Larsson-Edefors, "High-Speed CMOS Design Bit-Serial Arithmetic Applications and Technology Mapping of Combinational Boolean Equations," Ph.D. dissertation, Physics and Measurement Technology Dept., Linkoping Univ., Linkoping, Sweden, 1995.
- [2] N. H. E. Weste and K. Eshraghian, *Principles of CMOS VLSI Design: A Systems Perspective*, MA: Addison-Wesley, Second Edition, 1993.
- [3] R. Rashidzadeh, "An Embedded Tester Core for Mixed-Signal System-on-Chip Circuits," Ph.D. dissertation, Electrical and Computer Engineering Dept., Windsor Univ., Windsor, Canada, 2007.
- [4] J. Muller, B. Stefanelli, A. Frappe, L. Ye, A. Cathelin, A. Niknejad, and A. Kaiser, "A 7-Bit 18<sup>th</sup> Order 9.6 GS/s FIR Up-Sampling Filter for High Data Rate 60-GHz Wireless Transmitters," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 7, pp. 1743-1756, July 2012.
- [5] K. Runge and P. B. Thomas, "5Gbit/s 2:1 multiplexer fabricated in 0.35μm CMOS and 3Gbit/s 1:2 demultiplexer

fabricated in 0.5µm CMOS technology," *IEEE Electronic Letters*, vol. 35, no. 19, pp. 1631-1633, September 1999.

- [6] F. Tobajas, R. Esper-Chaín, R. Regidor, O. Santana, and R. Sarmiento. A Low Power 2.5 Gbps 32:1 Serializer in SiGe BiCMOS Technology [Online]. Available: http://www.iuma.ulpgc.es/publicaciones/2005.html
- [7] M. Wong, "A 2.5Gbps CMOS Serial Link Transceiver Design," M.S. thesis, Electrical Engineering Dept., National Central Univ., Taiwan, China, 2002.
- [8] R. Mahmud. (2003, June 26). Techniques to make clock switching glitch free [Online]. Available: <u>http://www.eetimes.com/document.asp?doc\_id=1202359</u>.
- [9] D. F. Tondo, and R. R. López, "A Low-Power, High-Speed CMOS/CML 16:1 Serializer," *Proceedings of the Argentine School of Micro-Nanoelectronics, Technology and Applications,* San Carlos de Bariloche, pp. 81-86, October 2009.
- [10] J. H. Shim, S. Byun, J. C. Lee, K. Kim, and C. S. Kim, "A low-power 10-Gb/s 0.13-μm CMOS Transmitter for OC-192/STM-64 Applications," 50th Midwest Symposium on Circuits and Systems, Montreal, Que., pp. 1165-1168, August 2007.
- [11] K. Ishii, H. Nakajima, H. Nosaka, M. Ida, K. urishima, S. Yamahata, T. Enoki, and T. Shibata "Over 40 Gbit/s 16:1 multiplexer IC using InP/InGaAs HBT technology," *IEEE Electronic Letters*, vol. 39, no. 12, June 2003.
- [12] F. Znidarsic, E. Mullner, and R. Strunz "16:1 retiming multiplexer for 10 Gbit/s in Si production technology," *IEEE Electronic Letters*, vol. 32, no. 3, pp. 207-209, February 1996.