Full Custom Design of an Arbitrary Waveform Gate Driver With 10-GHz Waypoint Rates for GaN FETs

Active gate driving of power devices seeks to shape switching trajectories via the gate, for example, to reduce EMI without degrading efficiency. To this end, driver ICs with integrated arbitrary waveform generators have been used to achieve complex gate signals. This article describes, for the first time, the implementation details of a digitally programmable arbitrary waveform gate driver capable of a 10-GHz waypoint rate, including comprehensive design considerations for critical high-speed subsystems that codify the tradeoff in flexibility, speed, and area. The design, which is taped out in a 180-nm high-voltage CMOS process, utilizes buffers that switch up to ten times in a single clock cycle to overcome the limited achievable clock speed of high-voltage silicon integrated circuits and a fully digital architecture to provide robustness under high slew rates of the ground rail. The driver IC has networks of 100-ps delay elements that are configured prior to a switching transient, to selectively control an array of fast, parallel-connected drivers with different output impedances. Key to the high timing resolution are high-speed asynchronous circuits for memory readout, output buffering, and pulse generation. The driver IC is experimentally evaluated to have a 100-ps resolution and to operate reliably in a 400-V gallium nitride (GaN) bridge leg, under ground-rail voltage slew rates peaking at over 100 V/ns. Design rules are provided to obtain an architecture with the least area for a given set of timing and impedance resolution requirements. The reported design methods enable complex driving waveforms to be applied during nanosecond-scale transients of GaN power devices and demonstrate how digitally programmable active gate drivers for GaN power FETs can be designed to meet a given set of application requirements.


I. INTRODUCTION
I N HARD-SWITCHED power converters, fast power-circuit transients are targeted in order to reduce switching loss, but faster transients tend to increase undesirable circuit behaviors such as overshoots, ringing, and EMI. Traditionally, this has led to a tradeoff between efficiency and these undesirable circuit behaviors: with conventional driving, the gate-drive strength is reduced in order to bring overshoots and ringing to an acceptable level, but this increases loss. Active driving seeks to break this tradeoff to deliver the fastest possible transients in hard-switched converters to reduce loss, whilst simultaneously minimizing overshoots and ringing. In recent years, gate drivers with predefinable output waveforms that can shape switching transients in silicon and SiC power electronic converters have been reported [1]- [5] to reduce, for example, EMI or ringing. During the switching transients of each power device, these active gate drivers modulate their output impedance according to a sequence of preprogrammed waypoints (see Fig. 1).
For gallium nitride (GaN) based circuits where power-circuit transients are measured in units of nanoseconds, a key specification of these drivers is the rate at which waypoints are processed and output. Most drivers are capable of providing only two [6], [7] or three [8], [9] waypoints during a 10-ns transient. A sub-GHz waypoint rate results in limited waveform shaping capability, but rates greater than 1 GHz are beyond achievable clock speeds in high-voltage (HV) CMOS silicon processes that are typically used for gate drivers.
Active voltage control using closed-loop analog feedback drivers has been demonstrated for Si devices [10]- [12]. Here, the main challenge is to achieve sufficient analog bandwidth to make the power waveforms follow a reference waveform with nanosecond transient features, whilst maintaining an acceptable power consumption. Such closed-loop techniques are beginning to be developed for GaN [13]; however, achieving the required speeds is a challenge, as some of the unwanted features in switching transients (such as the temporary inductive drop in drain−source voltage during turn-ON) have frequency components upwards of tens of GHz [14].
Previously, we demonstrated the first active gate driver IC capable of multi-GHz waypoint rates and the resulting reduction in EMI [15] and cross-talk [16] in GaN bridge legs. The improvements presented in these papers required a subnanosecond spacing between waypoints. Whilst the primary purpose of this gate driver is to act as a research tool to inform the design This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and implementation of active gate driving in fast GaN-based converters, applications that could ultimately benefit from active gate driving range from low-voltage point of load modules to mains-voltage circuits such as power factor correction, solar PV dc:dc converters and inverters, etc.
In this article, we report the implementation details of the driver IC for the first time. The top-level architecture, circuit designs, tradeoff analyses, and experimental results presented in this article demonstrate how digitally programmable active gate drivers for GaN power FETs can be designed to meet a given set of application requirements.
Section II briefly reviews prior art of high-speed active driver architectures and then introduces the architecture of our gate driver including the high-speed asynchronous circuits for memory readout, output buffering, and pulse generation that are critical to reach 10-GHz waypoint rates. Quantified, high-level design rules and tradeoffs are provided.
Section III introduces circuit designs for the key functions that enable fast update rates, with supporting simulation results. Design insights gained over taping out and characterizing two generations of the IC have been used to optimize the high-speed circuits and produce a third-generation IC, where the stability of the driver has been greatly improved, while achieving rates of 104 waypoints in 10 ns.
In Section IV, a set of experiments validate the operation of the three high-speed subsystems as well as demonstrate the ability of the driver to perform under high ground-rail slew rates peaking at over 100 V/ns, with a voltage swing of 400 V.

A. Prior Art of High-Speed and Programmable Active Gate Drivers
Active gate drivers for GaN FETs that provide more than a simple step function can be divided into two families: threshold triggered and digitally programmable. Threshold-triggered active gate drivers have their driving strength updated through predefined behaviors at specific gate voltage levels; for example, when the gate reaches a specific threshold voltage [6], [7] or a combination of reaching the gate threshold voltage and miller plateau voltages [8]. In principle, this analog method could place gate drive profiles exactly where needed; however, in practice, the latency of the sense and trigger circuits limit switching speed, and the power consumption of the control circuitry can be prohibitively high [9].
Digitally programmable active gate drivers use predefined waypoint sequences, generated using data from previous switching transients. It is challenging, however, to implement driver ICs fast enough to switch GaN devices. This is partly due to speed limitations in HV CMOS processes and partly due to the large common-mode voltage disturbance created by the high slew rate, typically up to 100 V/ns, of the driver's ground reference [14]. These slew rates are a challenge for both families of drivers, and make accurate in-circuit sensing difficult. Due to these challenges, digitally controlled arbitrary waveform drivers   reported to date (e.g., [1]- [5]) can only be used to switch slower silicon power devices. Their architecture is illustrated in Fig. 2.
The architecture comprises an FPGA/MCU that stores the waypoint sequences and a driver IC with parallel buffers. The communication between FPGA and IC limits the speed; however, even if a sequence were preloaded onto on-chip memory, the waypoint rate would be limited to the clock period. With current HV fabrication processes, this imposes a time, between waypoints, of greater than 1 ns. In addition, the floating, highside drivers of a GaN converter require isolated level shifters, which further limit the data loading speed.

B. Requirement for High Speed, and the Timing Challenge
In the 600−700 V power device range, the move from silicon IGBTs or super-junction MOSFETS to GaN FETs has seen device switching transients drop from several 100 nanoseconds to 5−10 ns. Thus, waveform features that cause EMI contain significantly higher frequency spectral components, such as sub-ns current overshoot at turn-ON and ringing at 100 s of MHz. In order to have ten waypoints within such undesirable waveform features, a waypoint rate of at least 10 GHz is required.
This requirement for high speed presents the timing challenge illustrated in Fig. 3. An undesired drain current overshoot, lasting less than 1 ns, is depicted. To combat the overshoot, the gate-drive strength is varied up to ten times per nanosecond. Further, the gate driver output resistance needs to be modulated during the entire 5-10 ns switching transient, covering the transient periods of both current and voltage waveforms, and the delay from when gating activity starts to when this activity becomes visible in the power waveforms. However, typical clock frequencies for gate drivers lie between 100 MHz and 1 GHz. That is one to two orders slower than required, if only one change in gate-drive strength were to occur in a clock cycle. Thus, gate modulation must occur several times within one clock cycle and span several clock cycles.
The architecture presented in this article meets these speed and timing challenges. This is accomplished primarily through the use of asynchronous techniques to achieve pulse durations that are a fraction of the clock speed and custom high-speed circuits for memory readout and output buffering. It is anticipated that the speed-up techniques of this article will contribute to both families of active driver: threshold triggered and digitally programmable; this article focuses on the latter.

C. Top-Level Architecture of Multi-GHz Gate Driver
The architecture of the gate driver is shown in Fig. 4. One of the main innovations to achieve a 10× improvement in waypoint rate is a pulse-forming circuit that, once triggered, runs asynchronously during a single clock cycle. The resulting pulses act as trigger signals for parallel drivers, at a rate much faster than the internal clock. Other innovations include custom-designed on-board memory with high-speed readout and an output stage comprising buffers with different timing and impedance resolutions: A clocked "coarse" driver is augmented with a faster, asynchronous "fine" driver. The coarse and fine drivers both consist of parallel subdrivers of increasing strength. The fine driver is able to pull in either direction during a driving transient, providing the important ability to create a momentary "dip" (for turn-ON) or "peak" (for turn-OFF) in the gate voltage and/or current and the additional benefit of even finer effective drive-strength modulation than is possible if all drivers can only pull in the same direction. The coarse driver is too slow to instantaneously change pull direction; so when creating momentary dips or peaks in the output trajectory, the fine driver can be pulling in opposition, leading to internal overlapping of pull-up and pull-down. This would be avoided if the output stage consisted entirely of fine drivers, but this requires a large silicon area, as explained in the following sections. With the proposed solution, any internal shoot-through duration is extremely brief, causing currents that lie below the electron migration limit and at most mW-levels of heating. This is observable in the power consumption of the IC, as presented in Section IV-D.
The on-chip memory holds approximately 1 kb of timing and drive-strength data for both drivers, for both the turn-ON and turn-OFF transitions of the power device. Prior to power-circuit operation, the memory is written serially through the data pin SIN using an external clock SCLK [15], [16]. On each edge of the pulsewidth modulation (PWM) signal (see Fig. 4, left), data for eight consecutive gate-drive cycles are read from memory. The gate-drive settings for the eighth cycle are maintained for each subsequent cycle, until the next PWM transition. Following the edge of the PWM signal, each rising edge of the internal clock signal CLK triggers the parallel readout of one cycle worth of data from memory, comprising multiple waypoints. This includes one pull-up or one pull-down resistance value (S N P and S N N ) for the coarse driver, and an array D P of resistance, timing, and pull-direction settings for the fine driver.

D. Fine-Driver Architecture
The operation of the fine driver is illustrated in Fig. 5. The fine-driver settings for a given clock cycle are read in parallel from memory during the preceding clock cycle to trigger the pulse forming circuit. The fine-driver control logic creates multiple gating pulses, one for each parallel subdriver, with specified delay and length. Combining the single-shot output pulses of each subdriver in parallel provides the high-frequency modulated fine-driver output illustrated in Fig. 5.
The number of fine subdrivers represents a tradeoff between flexibility and chip area and should be chosen to achieve the desired profile complexity. For example, a fine driver with five subdrivers (referred to as a 5-b driver) of decreasing strength (1 Ω, 2 Ω, …, 16 Ω) provides a driving strength from 0.5 to 16 Ω (for an M-b driver, 2 M −1 strength settings are possible). Since fine drivers can only be supplied with their settings and triggered once per clock cycle, ten waypoints per clock cycle require ten drivers. In the case of a 1-GHz clock, this would provide the flexibility to select any of the strength settings every 100 ps. This is illustrated in Fig. 6(a). A total of 50 subdrivers are needed, and each requires only a binary flag per clock cycle to determine if it should be triggered, as the timing is inherent in this architecture.
An alternative with less flexibility but much less chip area is illustrated in Fig. 6(b). Here, only one 5-b driver is used, but each subdriver has 4-b delay and pulsewidth settings. As a result, all of the strength settings (i.e., 2 M −1 for an M-b driver) are still available throughout the whole clock cycle. However, once a subdriver has completed its pulse, it is no longer available for that clock cycle. This reduces the flexibility in achievable strength profiles but retains the waypoint rate achievable with 10 5-b drivers with a tenfold reduction in the chip area occupied by the output stages.
The second option was chosen to keep the driver area below 5 mm 2 and parasitic output capacitance low. Whilst this reduces the degrees of freedom in creating gate profiles, experience with testing across the three generations of chips shows that a few well-timed pulses can dampen typical noise artifacts such as 400 MHz ringing in the power waveforms (compact GaN layouts tend to ring at frequencies between 100 and 800 MHz due to small power-loop inductance and device capacitance). Further, as the application involves the charging or discharging of powerdevice gate capacitance, a desired change in gate voltage in a given time interval can be achieved by adjusting either strength levels or pulse durations.

E. Achieving the Required Waypoint Rate
For the architecture of Fig. 6(b), the exact number of subdrivers to use in the fine driver (i.e., the value of M ; the previous section used M = 5 for illustration) depends on the required waypoint rate. M subdrivers provide 2 M − 1 strength levels and M individual driving pulses per clock cycle. Each pulse provides two changes in driving strength, and adding these to the step provided by the coarse driver results in a total of where T Trans is the duration of the power device transient. This article targets transients lasting up to 10 ns, requiring N Trans = 100 steps. Substituting these values into (2) and isolating the clock frequency give This relationship between fine-driver bit-depth M and clock frequency is plotted in Fig. 7, representing the tradeoff between these two parameters for a given required driving flexibility. The figure illustrates an upper bound on the frequency due to technology, and a bound on the number of fine subdrivers due to chip area constraints, leaving the allowable design space. The area bound shown is 3 mm 2 , and the clock bound is 800 MHz, for illustration. The plot assumes the use of fine subdrivers of increasing strengths in a binary sequence starting from 64 Ω (i.e., followed by 32 Ω, 16 Ω, etc.) due to increasing output transistor size. A 6-b fine driver (i.e., having six parallel subdrivers of strengths from 64 to 2 Ω) occupies 0.48 mm 2 in the chosen technology (see Section II-F).
The number of fine subdrivers for the IC has been derived based on the following considerations.
1) The driver output voltage is specified by the power device's driving requirements, which, in turn, dictates the fabrication process. 2) This process dictates the maximum reliable clock frequency indicated by the upper bound in Fig. 7.
3) The chosen clock speed then provides the minimum required number of fine subdrivers via the curve of Fig. 7. The minimum required output voltage in this case is 5 V and the chosen fabrication technology is a 180-nm HV CMOS process by AMS. The design clock frequency for memory readout, based on experience from two previous fabrication runs, is 800 MHz. Using (3), the resulting minimum number of fine subdrivers is M = 6.

F. Final Drive-Strength Range and Memory Requirement
The 6-b fine driver with subdrivers ranging from 64 to 2 Ω has a maximum drive strength of around 1 Ω, when all drivers are simultaneously active. This is in line with commercial singlestep drivers for GaN transistors at the upper end of available transistor current ratings (around 100 A; the gate capacitance scales with power-device current rating).
The clocked coarse driver is much simpler than the asynchronous fine driver. Adding coarse-driver capacity, therefore, allows the total driving strength to be increased at a much smaller area penalty. It operates in parallel with the fine driver, allowing, for example, 1-Ω driving to be maintained continuously for a clock cycle, while the fine driver can exercise its full range in a rapid sequence during the same clock cycle. A coarse driver consisting of eight subdrivers with increasing strength in a binary sequence, 36−0.28 Ω, has been designed to permit (coarse) driving strengths up to 0.14 Ω. This enables the coarse driver to initiate faster switching transients than commercial drivers, leading to lower switching loss, whilst the fine driver combats undesirable EMI and ringing brought about by the increased switching speed. The resulting coarse driver occupies 1.11 mm 2 . Fig. 8 shows the total area required for the output stages of the architecture presented in this article, as a function of how much of the total maximum drive strength is provided by the clocked coarse driver: 100% of the drive strength being provided by the fine driver requires an area of 3.65 mm 2 , whereas 100% of the drive strength being provided by the coarse driver requires only 1.29 mm 2 . The implemented design is highlighted. For specific GaN devices and applications, smaller driver stages are viable, and it may be appropriate to trade off some of the coarse driver for fine-driver area or vice versa, depending on the degree of flexibility required.
With the subdrivers defined, and the timings known, the memory requirements can be calculated. Since the target clock period is 1.25 ns (800 MHz), eight clock cycles are required for a total active control period of 10 ns. Thus, control data for eight clock cycles is needed in memory, both for turn-ON and turn-OFF of the GaN device. A 100-ps resolution has been obtained by using a 6-b fine driver, and, therefore, each clock period allows up to 13 step changes [see (1)], resulting in a maximum of 104 impedance changes per switching transient.

A. High-Speed Memory Readout
The memory readout circuit is shown in Fig. 9. The driver IC is capable of loading switching patterns for eight complete cycles (corresponding to eight clock cycles) in one go. The data set comprising all control values for the fine and coarse drivers, for both turn-ON and turn-OFF for eight cycles, is 1024 b, stored in an on-chip sequence memory. At the start of each of these eight clock cycles, a new data set comprising 64 b is loaded to control the output stages. These 64 bits define the delays, pulse lengths, and pull direction for the fine driver's six subdrivers and the coarse driver's driving strength. In order to facilitate dynamic control of a power converter, the delay between the arrival of the PWM switching command and the beginning of the actual switching transient should be as short as possible. Therefore, the selection of data, readout, and writing to the output registers should take as few clock cycles as possible. The final write step to the output registers should take only one cycle to enable successive cycles to use different fine-driver settings. In addition, the circuits need to be designed to enable a high clock frequency approaching 1 GHz, the highest possible frequency with which the memory can be operated in the chosen process.
Using full parallel readout, the propagation delay for readout, including sampling of the PWM load command, is four clock cycles or 5 ns at 800 MHz. This compares favorably with the propagation delay of many commercially available GaN gate drivers [17]- [19], whilst other drivers with lower delay are available [20]. The implementation of this approach is illustrated in Fig. 9. The sequence selection circuit selects the appropriate eighth of the contents of the sequence memory (128 b, referred to as a segment) in each successive clock cycle, using an 8-b synchronous counter. Each 128-b segment contains the data for both switching polarities, i.e., turn-ON and turn-OFF. Therefore, the sequence stream control circuitry in Fig. 9 selects the half (64 b) for the relevant switching polarity and writes them in parallel to the output register in one clock cycle. Two clock cycles are required to detect the PWM switching command (bottom left of Fig. 9). As the PWM command invokes a simultaneous change in pull direction of all coarse subdrivers, a safety dead-time of one clock cycle is introduced where these drivers are open circuit (high impedance) to avoid overlap and driver-internal shootthrough currents (accomplished by the "High-Imp. sequence" block in Fig. 9).
A synchronous counter is chosen over asynchronous types, as it typically enables a higher clock frequency. All circuits have been designed using a full custom approach, which is detailed in Section III-D, in order to achieve high speed operation. Fig. 10 shows how the simulated main control signals for memory segment selection transition over two 8-cycle sequences, with each sequence triggered by a transition in the PWM signal. Signals S0−S7 are outputs of the segment counter in Fig. 9, which select the corresponding data segments 0−7 via an 8:1 multiplexer. The signal DT controls dead-time, setting the output to high impedance for one clock cycle and also  resetting the segment counter at the start of each switching transient.
The sequence read out starts from the first CLK cycle after the PWM signal is sampled (PWM_S). After DT is set, counter outputs S0−S7 select successive data segments, after which Segment 7 remains selected until a DT pulse is triggered by a new external PWM transition.

B. 100-ps Resolution Pulse-Forming Circuit
The pulse-forming circuit block in Fig. 4 creates six pulses in each clock cycle. Each pulse controls one fine subdriver, as illustrated in Fig. 5. The delays and durations of the pulses are contained in 56 of the 64 bits of data in the output register (see Fig. 9) and are refreshed every clock cycle. The pulse-forming circuit is shown in Fig. 11. The programmable delay blocks (local and global delay and pulse-duration control) are implemented as pipelines of 100-ps delay cells. These delay cells are composed of 1.8-V CMOS transistors and use the same topology as the 150-ps delay circuit reported in [15].
A combinational circuit Posedge Detect generates a return-tozero pulse of width 200 ps at node A on a rising edge of CLK. Delay blocks local delay and global delay allow the pulses at their inputs to be delayed by any value between 0 and 600 ps in multiples of 100 ps, set by bits D 7 −D 4 (local delay) and bits D G3 −D G0 (global delay). Fig. 12(a) illustrates the desired functionality of the local and global delay blocks, using pipelined 100-ps delay cells to control the delay from 0−600 ps. Fig. 12(b) shows the actual implementation, which uses interleaving or duplication to control the capacitive loading of the output net by the switches. Two 0−300 ps delay control circuits are cascaded to give 0−600 ps controllability. The number of control bits is increased to four; however, the loading of the output node is significantly reduced, and the pulse slew rate is more than doubled.
The turn-ON of a fine subdriver is defined by a pulse generated by the output of a NAND SR latch (node E in Fig. 13). The timing of the set (C) and reset (D) signals to this latch is controlled by selecting the appropriate tap from a delay line of the pulseduration control block. The duration is controlled by bits D 2 and D 1 via a 4:1 Mux, permitting durations of 300, 400, 600, or 800 ps to be selected. Shorter pulses have not been implemented for reasons of reliability; to obtain shorter pulses of 100 or 200 ps in the gate-drive output, two fine subdrive pulses can be overlapped accordingly. Finally, D 3 (see Fig. 11) enables or disables the driver stage, while D 0 (Fig. 11) is used to select the pull direction for this individual fine driver.

C. Fine-Driver Output Unit Cell
The output pulses for each fine subdriver require buffering to different degrees to create the required output strengths (2, 4, …, 64 Ω). The propagation delays of these different buffers should be matched as closely as possible. The pulses have a minimum duration of 300 ps and a 5-V swing, and it is important to obtain slew times that are significantly shorter than the timing resolution of 100 ps. In order to achieve this, core 1.8-V FETs and 5-V HV CMOS FETs are used to form the 64-Ω buffer unit cell shown in Fig. 14, from which the different buffers are assembled. The 64-Ω resistance is created by choosing appropriate width/length ratios of the 5-V transistors.
Each 64-Ω cell has pull-up and pull-down inputs, to pull the output up to the 5 V rail or down to ground. The pull-up signal requires level-shifting with the shortest possible propagation time. This cell has a totem-pole output in cascode configuration, where fast 1.8-V transistors (MN1 and MP1) provide the Fig. 11. Pulse-forming circuit for one fine subdriver realized with 1.8-V logic. As per Fig. 5, the control bits D 0 -D 8 and D G0 -D G3 are read from memory during the previous internal clock cycle, ready for the blocks depicted here to be activated upon the rising edge of CLK.    0 and 1.8 V). The downside of this configuration is the weaker inversion compared to directly driven transistors, leading, in part, to the 2.86 area ratio between the fine and coarse drivers for a given strength (see Section II-F). A useful characteristic of this cascode output stage is that the source nodes of MHP1 and MHN1 are low impedance and have a high output current bandwidth [21], which supports the generation of 300-ps current pulses.
The level shifter for the pull-up side is a third cascode comprising two 1.8-V NMOS FETs with a resistor load. The upper gate is biased at 3 V, resulting in each 1.8-V NMOS blocking 2.5 V under static conditions, which is well below the specified breakdown voltage of 5 V. This cascode draws current for the duration that the pull-up driver needs to remain on, but this is typically a narrow, sub-ns pulse, unlike in the coarse driver that remains on for the duration of the clock cycle. This level shifter is shown in postlayout simulations to have an 85-ps delay, providing a good margin to the required 100-ps delay. One advantage of this topology is that it eliminates the requirement for an extra power rail that is 1.8 V below the main supply rail.

D. Layout Requirements
The fabricated gate driver from the first-generation design had a maximum operating frequency of 625 MHz and unreliable fine-driver control. The limiting factor was determined to be the 8-b synchronous segment counter and associated circuits. This had been designed using Verilog HDL, and after the logic synthesis, the layout was automatically placed and routed.
In the third-generation driver, a custom layout of the memory readout and pulse-forming circuits was carried out, with a design frequency of 800 MHz. The main design goals were to increase speed and minimize interference between circuit blocks. The layout was iterated based on results from simulation using transistor-level spice models plus postlayout extracted parasitic capacitances and resistances. The resulting layout floor plan is illustrated in Fig. 15.
The circuits are built into an isolation tub that is connected to 1.8 V to reduce substrate noise injection to adjacent analog circuits, including bandgap references and biasing circuits. A guard ring is placed around the isolation ring to further reduce interference, at the expense of a small increase in layout area.
The counter is placed centrally, surrounded only by decoupling capacitors, which reduces capacitive cross-coupling from other digital circuit blocks. In order to reduce cross-coupled capacitance between adjacent selection signals, whilst maintaining the drive strength needed to select a 128-b sequence in one clock cycle, the selection signals are spaced out with interleaved ordering of S0, S3, S6, S4, S7, S2, S5, and S1 and placed on alternating metal layers, as illustrated in Fig. 15. The counter output signals are buffered and split into four, to address four 8:1 multiplexers, which together read 128 b per clock cycle. This maintains the driving ability and minimizes signal skew. The following 2:1 multiplexer and output register are placed as close to the 8:1 multiplexers as possible to keep the connection wires short.
A full custom layout is also used to guarantee that the clock tree is deployed with minimal cross-coupling between different clock buffers and with matched delays to minimize clock slew. The starting point of the clock tree is placed between the readout and pulse-forming circuits to obtain minimum routing wire lengths and to minimize parasitic capacitance.
Decoupling capacitors fill any layout voids to reduce voltage ripple and ringing, thus improving the quality of the 1.8-V power supply. This improves the ability of the pulse-forming circuit to produce fast narrow pulses. The total on-chip decoupling capacitance amounts to 2.6 nF for the 1.8-V rail and 1.4 nF for the 5-V rail.

IV. MEASUREMENT RESULTS
The chief aim of this section is to validate the three highspeed subsystems and to determine if an appropriate balance has been struck in the tradeoff between flexibility and simplicity. It also aims to demonstrate reliable operation under realistic EMI and common-mode interference due to the slewing reference potential. The results focus: 1) on the timing resolution of the fine driver, and 2) on the control of one of the most dynamic causes of EMI in GaN circuits, namely the current overshoot and ringing at turn-ON. Here, it is shown how carefully timed pull-up and pull-down transitions of the fine driver lead to improved waveforms. The derivation of switching sequences is carried out manually [15]; automated control of all switching events in a bridge leg (e.g., turn-OFF of control device) is future work that is beyond the scope of this article.  The driver was fabricated using the AMS H18 HV 180 nm CMOS process. The 2.7 mm × 1.85 mm die is shown in Fig. 16. The die was placed centrally into a QFN32 package, with multiple bond wires on each output-stage power supply and output pad to accommodate an expected peak output current capability of over 20 A.

A. Timing Resolution and Variability
The measured fine-grained 100-ps resolution achieved by the programmable delay cells in the fine driver is shown in Fig. 17. In this experiment, the driver IC is in a socket, driving an RC load (3.3 Ω in series with 1 nF) that emulates the gate of a GaN FET. A total of 15 waveforms, measured with a Rohde & Schwarz RTO1044 operating in equivalent-time sampling mode with effective sample rate of 4 TSa/s, are shown, synchronized to a common trigger point, and overlaid on top of each other. In all cases, the coarse driver is pulling up; the thick black line shows the output for the case where the fine driver is not activated. For the thinner colored lines, the fine driver is first activated in pull up, and its delay is incremented in steps of one delay cell, while the coarse driver retains the same weak pull-up setting. This is then repeated with the fine driver pulling down (in opposition to the coarse driver). Accurate and high-resolution time measurements between the overlaid captures can be obtained due to the stable trigger point early in the transient, highly repeatable circuit behavior, and high oscilloscope sample rate. The delay increments are seen to vary approximately from 70 to 100 ps. The oscillation between 12 and 14 ns in the cases where the fine driver pulls up is due to the high inductance of the socket resonating with the capacitive load. Fig. 18 shows the variability of the delay cell due to process variation (a) across a single die and (b) across randomly chosen dies. The delay lies in the range of 60−100 ps; therefore, a maximum nominal delay of 100 ps is always achieved. In practice, shorter delays are compensated for by the programmability of the pulse strength, delay, and duration, which allows for calibration based on measured readings. Reprogramming as part of a closed-loop feedback system is likely to be necessary anyway, as GaN FET characteristics (saturation current, gate voltage threshold, etc.) are known to change significantly with temperature and operating conditions.

B. High-Resolution Shaping of a GaN Device's Gate Signal
Two fabricated driver ICs are operated in double-pulse mode in a 400-V GaN bridge-leg circuit using GS66508P devices [22] (Fig. 19), where high switching dv/dt and di/dt rates subject the drivers to high radiative and conducted interference. Fig. 20 corresponds to the time interval indicated by the bold, red portions of the gate waveform in Fig. 19. It is the point at which the lower GaN FET is turned on, which causes it to  Active driving compared to fixed-strength 18-and 9-Ω driving. The top three graphs show the measured switching waveforms at the gate (v GS1 ) and drain (i D1 and v DS1 ) of the active device (Q 1 ). Note that in the bottom graph, showing a representation of the programmed impedance sequence, that the scale is logarithmic and that the impedance axis is reversed for the pull-up direction. This means that vertically higher points for the pull-up have lower resistance (higher drive strength). "High Z" refers to a driver being in a high-impedance state.
take over the load current I L = −10 A flowing into the switch node. Three situations are captured: one where the gate signal is shaped (active) and two where the output resistance of the active driver is kept constant, at 9 and 18 Ω respectively, throughout the switching transition. Fig. 21. Demonstration of high-side driver actively controlling the upper device to shape the drain current, while its ground reference is slewing with a peak of over 100 V/ns. "High Z" refers to a driver being in a high-impedance state. Fig. 20 shows the ground-referenced gate signal (measured using an R&S RT-ZP10 500 MHz passive probe), the drain current (measured using an Infinity sensor [23]), the drain−source voltage (measured using a PMK PHV1000 400 MHz passive probe), the switching loss values (calculated from de-skewed current and voltage waveforms), and a representation of the programmed driver output impedance sequence. The output sequence is plotted on a nonlinear scale with the upper bound (high Z) representing an open circuit.
The driver is seen to shape the gate voltage in a way that simultaneously reduces current overshoot by 29% and damps the ringing, whilst slightly increasing the current slew rate and slightly reducing the switching loss. As a result, the driver has curtailed an important source of EMI without incurring additional power losses. Note that the driver pull-up sequence contains coarse and fine steps, whereas the pull-down sequence only contains fine steps. The sequence uses the ability of the fine driver to transition quickly between pull-up and pull-down.

C. Operation at 10 GHz Waypoint Rate Under 100 V/ns Slewing
While the driver's reference potential remains static for the experiments used to measure the waveforms shown in Fig. 20, Fig. 21 shows the driver carrying out high-resolution waveform shaping, while its reference potential is slewing with a peak of over 100 V/ns. This test is important as common-mode currents experienced under fast slewing can cause high-speed driver circuits to fail. The measurements of Fig. 21 relate to Fig. 22. Measured gate driver power consumption, when using the gate-drive sequence from Fig. 20 to drive a 1R5+1 nF load (power delivered to load not included; so this is power consumption of the gate driver only). the upper driver and the top interval indicated in Fig. 19. Here, the high-side driver initiates switching, whilst the current I L = +10 A is flowing out of the switch node. The driver is seen to actively shape the switching waveforms to simultaneously reduce the duration of current ringing and speed up the switching transient. A relatively slow 36-Ω driving scenario has been included to emphasize that fast switching and the curbing of ringing cannot be simultaneously achieved using a fixed gate resistance.
The gate signal has been omitted as it is no longer ground referenced and therefore hard to measure. The driving sequence has been re-optimized with respect to the previous figure, as the parasitic impedances surrounding each GaN device are different, necessitating different control sequences. In this circuit configuration, drive strengths above 18 Ω resulted in very large current overshoot and ringing; therefore, compared to Fig. 20, the overall gate-drive strength used is lower. This results in a slower current transition.

D. Gate Driver Power Consumption
To measure the gate driver power consumption, a test board is used where it is possible to insert ammeters into the 1.8-and 5-V supply lines to the driver. The driver is loaded with a 1-nF capacitor in series with a 1.5-Ω resistor, to emulate the gate of the GS66508P (1 nF resulting in similar charge displacement as a turn-ON event for the GS66508P with V DS of 400 V and V GS of 5 V [22]). The gate driver power consumption is then calculated according to where V x and I x are, respectively, the voltage and current of the power rails of the gate driver, as measured with 6.5-digit multimeters; f sw is the frequency of the input PWM demand signal; and C is the value of the load capacitor (1.0 nF, measured). In this way, the power draw due to the load is removed, leaving the power consumption of the driver itself. The power consumption has been measured for two scenarios: the first using the drive sequence of Fig. 20 unaltered and the second using this sequence, but with the fine drivers turned OFF. The results are shown in Fig. 22. The second scenario confirms that pull-up of the coarse drivers with simultaneous pull-down of the fine drivers results in mW-levels of heating. Table I shows reported digital active gate drivers that adjust their output during the switching transient. The driver reported here is the only one with on-chip sequence memory and the only one that allows more than five interactions per switching transient, namely 104 changes in drive strength. It would, therefore, appear to be the only driver with a sufficiently high timing resolution to interact with unwanted features in ns-scale GaN switching transients, based on the fact that, to date, all use cases of this driver have required fine-driver pulses to obtain the desired improved switching waveforms. The driver is also the only one to permit changing of the pull direction during a single transient, which is required to maintain control during the entire turn-OFF transient [15]. The number of available drive strengths is over an order of magnitude higher than the next-best digital driver due to the binary arrangement of unit cells. This driver's output impedance is significantly lower than existing digital drivers, at the expense of chip area, in order to permit faster switching, the driving of larger devices, and cross-talk prevention without the need for negative gate-voltage bias [16]. It is worth noting that with any drivers with programmable gate signals, the most desirable sequence is likely to change with different devices, power-circuit layouts, and changing operating conditions, and, therefore, further work will be needed to make sequences a function of these parameters.

V. CONCLUSION
Key internal design rules and subcircuits for a 10-GHz waypoint rate (100-ps resolution), digitally programmable gate driver have been presented. These aspects enable an order-ofmagnitude speed-up compared to reported drivers with different architectures but could likely be applied to these other drivers.
The driver is optimized to be a research tool with high speed and flexibility, to investigate high-resolution, closed-loop, automated gate-pattern generation in GaN power converters. It is anticipated that this flexibility exceeds what a commercial implementation will need, once driving algorithms have been developed for EMI reduction, switching efficiency optimization, device stress minimization, etc. Once the required subset in flexibility is understood, the chip area could be reduced to significantly below 5 mm 2 . Experimentation has identified that the 100-ps resolution is useful when countering ringing, and almost the whole resistance range has been used. However, for a universal research driver, the resistance resolution at the higher (weaker drive strength) end could be increased for driving smaller devices.
At present, initial results show that key roadblocks to the adoption of GaN, such as parasitic ringing, could be addressed with active driving. Importantly, power switching waveforms have been shaped without increasing switching loss. Future work will include simplifying the drive sequence, researching the relation between drive sequence and GaN FET operating conditions and developing algorithms for optimal drive sequence [24], [25] to realize closed-loop self-adaptive control of switching waveforms. This work will also include identifying and dealing with the suitable feedback signals as the input parameters of the algorithms, e.g., input dc-link voltage, load current, switching current, switching voltage, and temperature. These feedback signals could be detected directly or indirectly [26], with sensing circuits integrated on-chip.