# A Novel STT-MRAM Cell With Disturbance-Free Read Operation

Safeen Huda, Student Member, IEEE, and Ali Sheikholeslami, Senior Member, IEEE

Abstract—This paper presents a three-terminal Magnetic Tunnel Junction (MTJ) and its associated two transistor cell structure for use as a Spin Torque Transfer Magnetoresistive Random Access Memory (STT-MRAM) cell. The proposed cell is shown to have guaranteed read-disturbance immunity; during a read operation, the net torque acting on the storage cell always acts in a direction to refresh the data stored in the cell. A simulation study is then performed to compare the merits of the proposed device against a conventional 1-Transistor-1-MTJ (1T1MTJ) cell, as well as a differential 2-Transistor 2-MTJ (2T2MTJ) cell. We also investigate In-Plane Anisotropy (IPA) and Perpendicular-to-Plane Anisotropy (PPA) versions of the proposed device. Simulation results confirm that the proposed device offers disturbance-free read operation while still offering significant performance advantages over the conventional 1T1MTJ cell in terms of average access time. The proposed cell also shows superior performance to the 2T2MTJ cell, particularly when the cells are targeted for read-mostly applications.

*Index Terms*—Magnetic memory, magnetic multilayers, magnetic tunnel junction, nonvolatile memory, read disturbance, spin transfer torque, VLSI memory.

# I. INTRODUCTION

ECENT YEARS have seen considerable research interest **K** in Spin Torque Transfer Magnetoresistive Random Access Memory (STT-MRAM). This technology has been presented as a universal memory [1], as it combines many of the desired characteristics of the different memory technologies currently available in the marketplace. Specifically, STT-MRAM offers non-volatility, high density, and high-speed access [1]. A number of papers have presented STT-MRAM test chips with both high speed access and high density [1]–[4]. In [2], the authors demonstrated a high-speed STT-MRAM chip fabricated in 0.13  $\mu m$  CMOS with a read access time of 8 ns and write access time of 12 ns. In [3], the authors presented a 64 Mb STT-MRAM test chip with a 30 ns cycle time. Furthermore in [5], the authors presented an analysis showing how a 1 Gb STT-MRAM chip with 10 ns read/write access is achievable in today's technology. These works have made use of the standard 1-Transistor-1-MTJ (1T1MTJ) cell. While the 1T1MTJ cell offers small area by virtue of comprising a minimal number of

S. Huda (corresponding author) and A. Sheikholeslami are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4 Canada (e-mail: safeen@eeeg.toronto.edu; ali@eeeg.toronto.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2012.2220458

Fig. 1. Conventional 1T1MTJ cell.

circuit components, this cell architecture has a number of significant drawbacks; chief among these are the cell's inherent problems with read disturbance. To alleviate concerns of read disturbance in the 1T1MTJ cell, the read sense current must be restricted, which results in reduced sense margin. On the other hand, to ensure disturbance-free read operation and large sense margin, the critical current of the MTJ must be increased, which results in the need for larger access transistors (and thus larger cell area), increased write access power, and potentially increased write access times. In this paper, we propose a novel memory cell for STT-MRAM, which comprises a penta-layer MTJ (as opposed to the conventional tri-layer MTJ) and offers differential read operation. The cell is shown to offer guaranteed disturbance-free read operation, improved performance especially for read-mostly applications, and improved tolerance to process variation. The rest of the paper is organized as follows: Section II provides a background into device physics and the problem of read disturbance, Section III describes the proposed device and cell structure, Section IV describes the comparative study and the results of the study, and finally Section V concludes the paper.

#### II. BACKGROUND

### A. Conventional 1T1MTJ Cell and Device Physics

Fig. 1 shows a conventional 1T1MTJ cell; the cell consists of a transistor in series with a Magnetic Tunnel Junction (MTJ). An MTJ is comprised of two ferromagnetic thin film layers, the Free Layer (FL) and the Pinned Layer (PL), with an oxide-tunneling barrier in between the two magnetic layers, as shown in Fig. 1. In steady state, the magnetization vector of a ferromagnetic thin film is aligned along an axis which is most favorable from the standpoint of minimum potential energy; this axis is known as the easy axis [6]. The orientation of the PL/FL easy axis with respect to the geometry of the ferromagnetic layers gives rise to two distinct types of MTJs: In-Plane Anisotropy (IPA) devices, whose easy axis lies within the plane of the ferromagnetic layers, and Perpendicular-to-Plane Anisotropy (PPA) devices, where the easy axis is oriented perpendicular to the plane of the layers. The two different directions which the FL magnetization

Manuscript received April 17, 2012; revised July 09, 2012; accepted July 24, 2012. Date of publication March 07, 2013; date of current version May 23, 2013. This paper was recommended by Associate Editor M. H. Khellah.

can assume (either parallel or antiparallel to the PL magnetization) determine the state of the MTJ. When in a parallel state, the resistance between PL and FL is low, while when in an antiparallel state, the PL to FL resistance is high; the difference between the resistances of the two states is characterized by the Tunneling Magnetoresistance (TMR) ratio of the device, which is defined as [7]:

$$TMR = \frac{R_{AP} - R_P}{R_P}.$$
 (1)

where  $R_{AP}$  and  $R_{P}$  are the resistances of the MTJ when in the antiparallel and parallel states, respectively. As opposed to previous generations of MRAM where an external magnetic field was required to switch the state of the FL magnetization (known as Field Induced Magnetic Switching (FIMS) [8]) in STT-MRAM, the spin transfer torque effect [9] is employed. Fig. 2(a) shows Write-0 operation, which aligns the FL magnetization in parallel to the PL magnetization (antiparallel-to-parallel switching). Fig. 2(b) shows Write-1 operation, where the FL magnetization is switched to become antiparallel to the PL magnetization (parallel-to-antiparallel switching). In antiparallel-to-parallel switching, a positive current is passed from the FL to the PL. This causes electrons, which have become spin polarized to the magnetization of the PL, to tunnel to the FL and exert a torque on the FL magnetization, thus causing switching. In parallel-to-antiparallel switching, a current is passed from the PL to the FL; as electrons tunnel from the FL to the PL, the minority spin electrons, which are of opposite spin to the PL magnetization, are reflected back to the FL, and subsequently exert a torque on the FL magnetization, causing it to switch. In either case, it is only when the torque exceeds some critical value (governed by the magnetization parameters and geometry of the device), that the FL magnetization switches. The dynamics of the FL magnetization vector are governed by the Landau-Lifshitz-Gilbert-Slonczewski (LLGS) equation [9]:

$$\frac{d\vec{\mathbf{m}}}{dt} = -\frac{\gamma}{1+\alpha^2}\vec{\mathbf{m}} \times \vec{\mathbf{H}}_{EFF} 
-\frac{\gamma\alpha}{1+\alpha^2}\vec{\mathbf{m}} \times (\vec{\mathbf{m}} \times \vec{\mathbf{H}}_{EFF}) 
-\frac{\gamma\hbar\eta(\theta)}{(1+\alpha^2)2eM_SV}i(t)\vec{\mathbf{m}} \times (\vec{\mathbf{m}} \times \vec{\mathbf{m}}_{PL}). \quad (2)$$

where  $\alpha$  is the Gilbert damping parameter,  $\gamma$  is the gyromagnetic ratio,  $\vec{\mathbf{H}}_{EFF}$  is the effective field within the magnetic film,  $M_S$ is the saturation magnetization, V is the volume of the FL,  $\vec{\mathbf{m}}_{PL}$  is a unit vector describing the magnetization of the FL,  $\vec{\mathbf{m}}_{PL}$  is a unit vector describing the magnetization of the PL, and i(t)is the current passing from the PL to the FL. The spin torque transfer efficiency term,  $\eta(\theta)$ , is given by [10]:

$$\eta(\theta) = \frac{P_S}{1 + P_S^2 \cos(\theta)} \tag{3}$$

where  $\theta$  is the relative angle between the FL and PL magnetization vectors, and  $P_S$  is the tunneling Spin Polarization factor (TSP) and is given by  $\sqrt{TMR/(TMR+2)}$ [11]. Under the macrospin approximation [12], the effective field in (2),  $\vec{\mathbf{H}}_{EFF}$ , is primarily comprised of the anisotropy field  $\vec{\mathbf{H}}_{ANI}$  and for IPA devices, the out-of-plane demagnetizing field  $\vec{\mathbf{H}}_D$  as well.

SL SL Net **Electron Flow** ransmitted PL Electrons Reflected electrons tunnelling Electrons from PL from PL tunnelling to FL FL from FL to PL BI (a) (b)

Fig. 2. Write operations for conventional cell (a) "Write-0" operation for conventional cell (b) "Write-1" operation for conventional cell.

#### B. Read-Write Tradeoff in STT-MRAM

During a read operation, the current drawn by an STT-MRAM cell—resulting either from the current due to the application of a fixed voltage across the cell, or from the application of a fixed current to the cell—can potentially disturb the data stored in the cell. In order to ensure that a read operation is non-destructive, the current applied through the MTJ during a read operation is limited to be significantly less than the write critical current. As shown in [13], even if the applied current is less than the critical current, the data in the cell may still be destroyed as a consequence of thermal noise processes. A stochastic model for predicting the likelihood of switching the state of an MTJ given a read current,  $I_{read}$ , which is less than the MTJ's critical current,  $I_{C0}$ , is:

$$P_{write} = 1 - \exp\left[-\frac{t_P}{\tau_{(P \to AP)}}\right].$$
 (4)

where  $t_P$  is the duration in time when  $I_{read}$  is applied to the MTJ, while  $\tau_{(P \rightarrow AP)}$  is given by:

$$\tau_{(P \to AP)} = \tau_0 \exp\left[\frac{K_U V}{k_B T} \left(1 - \frac{I_{read}}{I_{C0}}\right)\right].$$
 (5)

where  $\tau_0$  is the nominal switching time when a current of magnitude equal to  $I_{C0}$  is applied to the cell,  $K_U$  is the anisotropy constant, V is the volume of the MTJ's FL,  $k_B$  is the Boltzmann constant, and T is the temperature given in Kelvin. The term  $K_U V / k_B T$  is also known at the *thermal stability factor*,  $\Delta$ , and is the ratio between the magnetic energy stored in the cell  $(K_U V)$  and the thermal energy  $(k_B T)$ . Equation (4) indicates that at some finite temperature T, and for some read current  $I_{READ} < I_{C0}$ , there exists a finite probability for the cell to be switched, or in other words, for the data to be destroyed. It is here where the fundamental tradeoff between read stability and critical write current lies. For instance, if a high performance read operation is desired, a natural way to achieve this is to increase the sensing margin, which requires increasing the read sense current. Using (4) and (5), it can be shown that there is a rapid increase in read disturb rate as the read sense current is increased (and  $\Delta$  is kept constant). Therefore, to increase read sense margin while keeping the read disturb rate constant,  $I_{C0}$  must be increased (which can be achieved by altering the FL size and/or magnetization parameters), such that the ratio  $I_{READ}/I_{C0}$  is kept constant. As such, improving read performance by increasing the read sense current inevitably results in increasing the critical current, and thus degrading write power and potentially write speed. Furthermore, since the critical write current decreases for smaller process nodes, the read sense current must therefore also decrease. This therefore potentially makes read sensing more difficult at smaller process nodes.

Several cells have been proposed previously, where circuit and/or device level innovations were pursued in a bid to reduce the risk of read disturbance. Circuit level solutions, such as in [4] and [14], attempt to specifically restrict the read current below some threshold (which can be determined for some targeted read disturb rate using (4)), although this will result in degraded read performance. In [14], the authors propose a novel 2-Transistor 1-MTJ (2T1MTJ) STT-MRAM cell, where the two access transistors are turned on in parallel during a write operation, thereby allowing for large access currents through the MTJ during a write operation, while only one access transistor is turned on during a read operation, which limits the current thus reducing the risk of read disturbance. Note that while this approach allows for both read and write access at a low error rate, it does not decouple the inherent read-write performance tradeoff characteristic of the conventional 1T1MTJ cell. Recent device level approaches to solving the problems of read disturbance include the device and cell proposed by [15], where a three terminal MTJ was proposed which allows for separate read and write ports; the tunnel junctions corresponding to the two different ports can then be independently optimized to allow for read and write access to be independently optimized. However, while this cell offers improved read performance and read stability, the cell suffers from degraded write access performance due to the fact that the FL of the proposed device was enlarged to accommodate multiple contacts. In contrast, in this work we present a cell which offers unconditional immunity to read disturbance-regardless of the magnitude of the applied read sense current-and furthermore the proposed cell employs various device and circuit level techniques to optimize write performance.

#### **III. PROPOSED DEVICE**

The structure of the proposed device is shown in Fig. 3, with Fig. 3(a) and 3(b) illustrating the IPA and PPA versions of the device, respectively. As shown in the figure, the MTJ is comprised of two PLs: Top Pinned Layer (TPL) and Bottom Pinned Layer (BPL). These layers are stacked vertically above and below the FL, respectively, with a tunneling barrier in between each PL and the shared FL. This device is envisioned to make use of the same processing steps described in [15], [16] to allow for the fabrication of the metallic contact attached to the FL. The fixed magnetizations of the two PLs are antiparallel to one another. In contrast to the device presented in [13], this device also requires metal contacts to all three ferromagnetic layers. Note that when the FL magnetization is parallel to TPL, the resistance between TPL and FL is low, while the resistance between BPL and FL is high. The opposite is true when the FL magnetization is parallel to BPL. In this work, we assign a state of logical "1" when the FL magnetization is parallel to TPL, and a state of logical "0" when the FL magnetization is parallel to BPL. Circuit level symbols for the IPA and PPA devices are shown in Fig. 4(a) and 4(b), respectively, while Fig. 4(c) shows the proposed cell. The cell consists of two transistors,  $M_1$  and



Fig. 3. Structure of proposed device (a) IPA version of proposed device (b) PPA version of proposed device.



Fig. 4. Proposed device symbols and proposed cell schematic (a) Circuit symbol for IPA version (b) Circuit symbol for PPA version (c) Proposed cell shown with IPA device.



Fig. 5. Top level chip diagram.

 $M_2$ , which are connected to the TPL and BPL of the proposed device, respectively. Note that the FL of the proposed device is connected to the Select Line (SL), and the two transistors are connected to the same Word Line (WL). Finally, Fig. 5 shows the top level organization of a hypothetical memory chip comprising of the proposed cells.



Fig. 6. Write operations for proposed cell (a) "Write-0" operation for proposed cell (b) "Write-1" operation for proposed cell.

## A. Cell Write Operation

The proposed cell's write operation is illustrated in Fig. 6(a) and 6(b). During a write operation, current is passed from the FL to either TPL or BPL; as is shown in Fig. 6(a) and 6(b), this write operation allows us to *always* perform a *parallelizing* write operation in switching the state of the cell. By selecting which path to carry current during a write operation, the FL magnetization is switched to be parallel to either TPL or BPL. Since a parallelizing write operation gravite operation requires less current than an antiparallelizing write operation [10], this scheme will allow for reduced critical current on average. This statement may be quantified by examining the *spin torque efficiency gain*, which we define here as being the ratio between the *spin torque transfer efficiency factor*,  $\eta$ , for an antiparallel to parallel write operation, and the *spin torque efficiency factor* for a parallel to antiparallel write operation. The *spin torque efficiency gain* is therefore:

$$G_{\eta} = \frac{\eta_{AP \to P}}{\eta_{P \to AP}} = \frac{1 + P_S^2}{1 - P_S^2}.$$
 (6)

Since  $0 < P_S \leq 1, G_{\eta} \geq 1$ , implying that the worst case current required to switch the magnetization vector in the proposed device is always less than that of a conventional MTJ with similar FL dimensions and magnetization parameters. For a typical MTJ with a  $TMR \approx 150\%$ ,  $P_S \approx 0.65$ , and so the  $G_{\eta} \approx 2.46$ ; this implies a reduction in worst-case critical current over a conventional MTJ of almost 60%. Timing diagrams for a "Write-0" and a "Write-1" operation are shown in Fig. 7(a) and 7(b), respectively. During a "Write-0" operation, since current must pass from FL to BPL, SL is raised to  $V_{DD}$  while  $BL_2$ is grounded. Since  $M_1$  and  $M_2$  share the same WL, BL<sub>1</sub> must be raised to V<sub>DD</sub> also to prevent current passing from FL to TPL, which ensures that current flow is only from FL to BPL during a "Write-0" operation. Due to the symmetry of the cell, a "Write-1" operation is similar, except now  $BL_1$  is grounded while  $BL_2$  is raised to  $V_{DD}$ . For both cases, the WL signal is intentionally delayed until after the SL, BL<sub>1</sub>, and BL<sub>2</sub> signals have settled as this prevents the case where a current from the FL flows to the wrong PL during the write operation. This delay is not expected to represent any additional timing overheard, since the WL signal is naturally delayed with respect to other signals due to delays incurred during word line decoding.



Fig. 7. Timing diagrams for write operations for proposed cell (a) Timing diagram for a "Write-0" operation (b) Timing diagram for a "Write-1" operation.



Fig. 8. Proposed read scheme.

### B. Cell Read Operation

During a read operation, both  $M_1$  and  $M_2$  are turned on, FL is connected to GND (through SL), and a sense amplifier is used to compare the TPL-to-FL resistance to the BPL-to-FL resistance. In this way, the cell behaves much like a differential memory element. Fig. 8 shows a current-based read operation for the proposed cell. Here (in steady state) identical currents are applied from TPL to FL and from BPL to FL. By comparing the resulting voltages at TPL and BPL (or the drains of  $M_1$  and  $M_2$ ), the resistance difference between the two paths can be detected, and thus the state of the FL magnetization can be inferred. Timing diagrams for a read operation are shown in Fig. 9(a) and 9(b). As shown in the timing diagram, the Bit Line (BL) voltages are precharged to  $V_{DD}$ ; when the WL is raised to  $V_{DD}$ , the nodes  $BL_1$  and  $BL_2$  are discharged through the the TPL-to-FL and BPL-to-FL paths, respectively-the different resistances of these two paths also give rise to different time constants for these nodes. The steady state voltages of these two nodes are given by the product of the current source magnitude,  $I_{READ}$ , and the different path resistances between  $BL_1$  and GND and  $BL_2$  and GND. As such we see that in steady state, when a logical "0" is stored in the cell, the steady state voltage of  $BL_1$  is larger than  $BL_2$ , and when a logical "1" is stored in the cell, the steady state voltage of  $BL_2$  is larger than  $BL_1$ . Also note that as shown in the timing diagrams, due to the fact that the time constants for the nodes  $BL_1$  and  $BL_2$  are different, the transient current waveforms for  $I_{TPL}$ , the current flowing from TPL to FL, are different from the current waveforms for  $I_{BPL}$ , the current flowing from BPL to FL. However in steady state, during a read operation  $I_{TPL} = I_{BPL} = I_{READ}$ .



Fig. 9. Timing diagrams for read operations for proposed cell (a) Timing diagram for a "Read-0" operation (b) Timing diagram for a "Read-1" operation.

There are two key advantages to the proposed read scheme presented here. First, since the read scheme is differential in nature, it inherently provides a two-fold improvement in sense margin as compared to a conventional 1T1MTJ given the same read current and the same ratio between  $R_P$  and  $R_{AP}$ . Second, the cell has the advantages of an improvement to process variation and guaranteed immunity to read disturbance. We show these benefits in the next two subsections.

1) Increased Tolerance to Process Variation: In a conventional 1T1MTJ based STT-MRAM, when a voltage/current stimulus is applied to the cell to measure the cell's resistance, the resulting current/voltage which is sensed is compared to that of a reference cell, which is typically not in close proximity to the cell being read. As a result of inherent process variation, there could be a degradation in read sense margin. In cases of excessive process variation, the cell may be incorrectly read. The nominal Read Sense Margin (RSM) of the conventional cell,  $RSM_{CONV0}$ , which excludes the effects of process variation on the MTJ resistances of the cell being read and the reference cell, is equal to  $(I_{READ}R_PTMR)/2$ . If the effects of process variation are included, the degradation to the RSM of the conventional cell is equal to  $I_{READ} |r_{CELL} - r_{REF}|$ , where  $r_{CELL}$  and  $r_{REF}$  are the random offsets to the MTJ resistances of the cell being read and the reference cell, respectively. For a differential cell on the other hand, the RSM under process variation is equal to  $RSM_{DIFF0} - I_{READ} |r_{CELL1} - r_{CELL2}|$ , where  $RSM_{DIFF0}$  is the nominal read sense margin of a differential cell under zero process variation, while  $r_{CELL1}$ and  $r_{CELL2}$  are the random offsets to the tunnel resistances of the two MTJs comprising a differential cell. If we assume that the variation in tunnel resistance is spatially correlated, then in the worst case for the conventional cell,  $r_{CELL}$  and  $r_{REF}$ are uncorrelated-this would correspond to the case when the reference cell and the cell being read are distant from one another. For the differential cell however,  $r_{CELL1} \approx r_{CELL2}$ , since the two tunnel junctions comprising the cell are in close proximity to one another. As such, the worst case RSM for the conventional cell is:

$$RSM_{CONV} = RSM_{CONV0} - I_{READ} \left( |r_{CELL}| + |r_{REF}| \right).$$
(7)

while for the differential cell, the worst case RSM is simply:

$$RSM_{DIFF} = RSM_{DIFF0}.$$
 (8)



Fig. 10. Transient current waveforms during a read operation.



Fig. 11. Transient currents and voltages during a read operation.

As such, under the assumption that variation in tunneling resistance is spatially correlated, the read operation for the proposed cell is expected to show immunity to variation—owing to the fact that the proposed cell is differential in its structure and is *self-referenced*—while in the worst case, the conventional cell is expected to have degraded RSM as a result of variation between the tunneling resistances of the cell being read and the reference cell.

2) Read-Disturbance Immunity: The advantages of improved RSM and greater tolerance to process variation are inherent to the fact that the cell is differential in nature. Indeed, a differential 2-Transistor 2-MTJ (2T2MTJ) cell, which effectively comprises two 1T1MTJ cells which store complimentary data, would have the same advantages listed above. However, perhaps the most significant advantage of the proposed cell which would not be offered in a 2T2MTJ cell is the possibility of absolute immunity to read disturbance, regardless of the magnitude of the applied read sense current. To show this immunity, let us consider the net torque acting on the FL during a read operation. If we consider the current waveforms during a read operation for  $I_{TPL}$  and  $I_{BPL}$ , as depicted in Fig. 10, we observe that we may divide the waveforms generally into two phases: the *transient phase*, where  $I_{TPL}$  and  $I_{BPL}$  are both increasing with time and  $I_{TPL} \neq I_{BPL}$ , and the *steady* state phase, where  $I_{TPL}$  and  $I_{BPL}$  have reached (or for all practical purposes are near) their steady state values, and  $I_{TPL} = I_{BPL} = I_{READ}$ . Thus, for our analysis, we begin by considering the net torque acting on the FL during the transient phase of the read operation where  $I_{TPL}$  and  $I_{BPL}$  are not equal, and then we consider the net torque acting on the FL in the steady state phase during a read operation. Fig. 11 shows the transient voltages and currents of relevant nodes and branches during a read operation for a cell storing a logical "1", i.e., the TPL magnetization vector is parallel to the FL magnetization vector. In the figure,  $V_{INT1}$  is the voltage between TPL and FL,  $V_{INT2}$  is the voltage between BPL and FL,  $V_1$  is the voltage of  $BL_1$ ,  $V_2$  is the voltage of  $BL_2$ ,  $I_1$  is the current through  $M_1$ ,  $I_2$  is the current through  $M_2$ , and as before,  $I_{TPL}$  is the current from TPL to FL while  $I_{BPL}$  is the current from BPL to FL. During the transient phase of a read operation, given that the initial precharge voltages of  $BL_1$  and  $BL_2$  (i.e.,  $V_1(0)$  and  $V_2(0)$ ) are equal, it can be shown that  $V_2(t) \ge V_1(t)$  for t > 0. It can further be shown that this implies that  $V_{INT2}(t) \ge V_{INT1}(t)$ . Intuitively, this is true because the path resistance from  $BL_2$  to SL is larger than the resistance from  $BL_1$  to SL, because the cell is storing a logical "1" and the BPL-to-FL resistance is larger than the TPL-to-FL resistance. We are now in a position to comment on the net torque on the FL during the transient phase of a read operation. We first note that the torque acting on a magnetic body subjected to a current induced spin transfer torque is given by [10]:

$$\frac{d\tau_{\parallel}}{dV} = \frac{\hbar}{4e} \frac{2P_S}{1+P_S^2} \sin(\theta) G_P.$$
(9)

where  $\tau_{\parallel}$  is the component of the torque acting on the magnetic body which is in the same plane as the magnetization vectors of the magnetic body and STT source,  $P_S$  is the TSP factor,  $G_P$  is the conductance of the parallel state MTJ,  $\theta$  is the angle between the magnetization vector of the magnetic body and the source of the spin transfer torque, while V is the applied voltage across the FL and PL of an MTJ. Equation (9) indicates that the torque applied by a layer is a monotonically increasing function of applied voltage; this is supported by experimental results which show the dependence of spin transfer torque on voltage [17], [18], as well as measurements of the hysteresis characteristics of MTJs which show that  $P \rightarrow AP$  and  $AP \rightarrow P$  switching occur at approximately the same applied voltage [19]-[21]. Therefore, since the applied voltage across the BPL to FL path,  $V_{INT2}$ , is larger than the voltage across the TPL to FL path,  $V_{INT1}$ , the net torque on the FL must be in the same direction as the torque transferred by the BPL. Because of the polarity of the currents applied to the FL, this torque must act in a direction which is antiparallel to the BPL magnetization vector, which in turn would mean that the torque acts to refresh the data in the cell. It follows that because of the symmetry of the proposed cell, if a "0" was initially stored in the cell, the torque acting on the FL from the TPL would be larger than the BPL, thus acting to "refresh" the existing data in the cell.

After the currents  $I_{TPL}$  and  $I_{BPL}$  have settled to their steady state values (both equal to  $I_{READ}$ ), the net torque acting on the FL continues to serve to refresh the existing data in the cell in the steady state phase of a read operation. In fact, in the steady state phase we can find a simple closed form expression for the net torque acting on the FL. First, we rewrite (9) in terms of the applied current between the magnetic layers of an MTJ. Using the Julliere model [22], and under the assumption that the parallel state conductance is approximately constant with applied bias (as is consistent with experimental results [1]), (9) can be rearranged to give a torque term which is only a function of the reduced magnetization vectors of the magnetic body,  $\vec{m_B}$ , and the STT source,  $\vec{m_S}$ , as well as the applied current,  $I_B$ , which flows from the magnetic body to the spin torque transfer source, to yield the following equation:

$$\vec{\boldsymbol{\tau}}_{\parallel} = \frac{\hbar}{4e} \eta(\theta) I_B [\vec{\mathbf{m}}_B \times (\vec{\mathbf{m}}_B \times \vec{\mathbf{m}}_S)].$$
(10)

where  $\eta(\theta)$  is equal to  $P_S/(1+P_S^2\cos(\theta))$ . Now, given (10), we can determine the net torque acting on the FL during the steady state phase of a read operation, given that the net torque acting on the FL is simply the sum of the individual torques contributed by the TPL and BPL:

$$\vec{\boldsymbol{\tau}}_{total} = \vec{\boldsymbol{\tau}}_{TPL\_TO\_FL} + \vec{\boldsymbol{\tau}}_{BPL\_TO\_FL}$$

$$= -\frac{\hbar}{4e} \eta(\theta_1) I_{TPL} [\vec{\mathbf{m}}_{FL} \times (\vec{\mathbf{m}}_{FL} \times \vec{\mathbf{m}}_{TPL})]$$

$$-\frac{\hbar}{4e} \eta(\theta_2) I_{BPL} [\vec{\mathbf{m}}_{FL} \times (\vec{\mathbf{m}}_{FL} \times \vec{\mathbf{m}}_{BPL})]$$

$$= \frac{\hbar}{4e} [\eta(\pi - \theta_1) - \eta(\theta_1)] I_{READ} [\vec{\mathbf{m}}_{FL} \times (\vec{\mathbf{m}}_{FL} \times \vec{\mathbf{m}}_{TPL})].$$
(11)

In the above  $\theta_1$  represents the angle between the magnetization vectors of TPL and FL and  $\theta_2$  represent the angle between the magnetization vectors of BPL and FL. Note that the following relations were used in reaching (11):  $\vec{\mathbf{m}}_{BPL} = -\vec{\mathbf{m}}_{TPL}$ ,  $\theta_2 = \pi - \theta_1$ , and  $I_{TPL} = I_{BPL} = I_{READ}$ .

For our analysis, we must consider the net effective torque on the FL for both states of the FL. First we consider when the FL magnetization is parallel to the TPL magnetization,  $\theta_1 \approx 0$ . Given that the  $\eta(\theta)$  is an increasing function of  $\theta$  (over the range  $0 \le \theta \le \pi$ ), clearly  $\eta(\pi - \theta_1) - \eta(\theta_1)$  must be greater than zero. As such, it becomes evident that the net torque acting on the FL during a read operation,  $\vec{\tau}_{total}$ , must be in a direction which pulls the FL magnetization towards TPL. Next, we consider when the FL magnetization is parallel to the BPL magnetization,  $\theta_1 \approx \pi$ . Now,  $\eta(\pi - \theta_1) - \eta(\theta_1)$  must be less than zero. As such, the net torque acting on the FL is in the direction of  $-\vec{\mathbf{m}}_{TPL}$ , or equivalently, in the direction of  $\vec{\mathbf{m}}_{BPL}$ ; thus the net torque would pull the FL magnetization towards BPL. Therefore, as was the case in the transient phase of the read operation, in the steady state phase of the read operation the net torque acting on the FL will always be in a direction which reinforces the data stored in the cell, and so the proposed read operation for this cell offers guaranteed immunity to read disturbance.

#### C. Device Parameter Optimization

In conventional 1T1MTJ cells, due to problems of read disturbance, there is an inherent tradeoff between read stability, read performance, and the critical write current. To ensure read stability, the read current is restricted to be less than the critical write current. The tradeoff here is that high speed read access requires large sense currents, which in turn necessitates a *large* critical current (for read stability); on the other hand, high speed write access for a given constrained write current and/or low power write access is achieved through a *low* critical current. With the proposed cell offering guaranteed immunity to read disturbance, there is no tradeoff between the read access and write access currents. Indeed, the read access current can even exceed the cell's critical current. We are therefore in a position to optimize certain device parameters, namely the oxide thickness and the strength of the anisotropy field.

1) Oxide Thickness Optimization: The oxide thickness plays a crucial role in the read/write tradeoff for an MTJ, as the oxide thickness sets the parallel and antiparallel state resistances of the MTJ, in addition to the TMR value. A thicker oxide results in not only a larger MTJ resistance, but also for a range of oxide thicknesses, a larger TMR [23], [24]. Since the RSM is proportional to both the tunneling resistance and TMR values, for conventional cells a larger oxide thickness is favoured, as this allows for high speed read access. On the other hand, the increased resistance resulting from a thicker oxide results in difficulty during a write operation. For a given critical current, the increased resistance resulting from a thicker oxide causes the voltage between the two terminals of the MTJ to increase during a write operation, thereby potentially necessitating a larger access transistor and/or a higher supply voltage. Since the proposed cell offers disturb free read operation for any applied read sense current, the degradation to sense margin by using a thinner oxide may be offset by using larger read sense currents. This allows us to optimize the oxide thickness for write access, and we compensate the detrimental effects to the RSM by using a larger read sense current during a read operation. We highlight the choice of oxide thicknesses for the devices considered in this study in Section IV-B.

2) Magnetization Parameter Optimization: The thermal stability factor of a cell governs the data retention capabilities of the cell in both salient operation as well as during a read operation. As explained in Section II-B, even for currents smaller than the critical current, it is possible during a read operation that the cell's contents will be disturbed. The probability of cell flip for a current less than the critical current is given by Equation (4). To ensure a low read disturb rate (on the order of  $10^{-15}$ ),  $\Delta = (K_U V)/(k_B T)$  is set to be greater than 55 [13], where  $K_U = M_S H_K/2$ , and  $H_K$  is the strength of the anisotropy field. However, for our proposed device, we are able to reduce  $\Delta$  since read disturbance is no longer a concern. The only constraint on  $\Delta$  which remains is the 10 year data retention requirement [13]; using (4),  $\Delta$  of 43 results in a probability of greater than 99% that the data will be retained in a cell after 10 years. A smaller value for  $\Delta$  results in a decrease in the cell's critical current; this is intuitively obvious because a large value for  $\Delta$  indicates a large magnetostatic potential energy, as such, a larger torque must be applied to switch the state of the cell. We can estimate the potential reduction in critical current that can be brought about by a reduction in the value of  $\Delta$  by a common approximation for critical current for IPA devices [25]:

$$I_{C0} = \frac{2e\alpha M_S V (H_K + 2\pi M_S)}{\hbar\eta} \tag{12}$$

and for PPA devices [26]:

$$I_{C0} = \frac{2e\alpha M_S V H_K}{\hbar\eta}.$$
(13)

Reducing the value of  $\Delta$  for a fixed volume entails either reducing  $H_K$  or  $M_S$ . Since (12) shows a quadratic relationship between critical current and  $M_S$ , we would estimate that reducing  $M_S$  by 22% (from a value of 55 to a value of 43) would result in an almost 40% reduction in critical current (since the  $2\pi M_S$  term dominates over the  $H_K$  term). However, much of the materials optimization of conventional MTJs has targeted a reduction in  $M_S$  already, and therefore it may be difficult to further reduce this value. Therefore, it is more probable that a reduction in  $\Delta$  will be achieved through a reduction in  $H_K$ . For PPA devices, since  $I_{C0}$  is linearly related to  $H_K$ , a 22% reduction in  $H_K$  would result in a 22% reduction in  $I_{C0}$ , however for

 TABLE I

 Estimated  $I_{C0}$  Reduction Through Minimization of

 Anisotropy Field Strength ( $H_K$ )

| FL Dimensions              | Conv. $H_K$ | Prop. $H_K$ | IPA $I_{C0}$ | PPA $I_{C0}$ |
|----------------------------|-------------|-------------|--------------|--------------|
| $(nm \times nm \times nm)$ | (Oe)        | (Oe)        | Reduction    | Reduction    |
| 90×90×1                    | 535         | 419         | 1.6%         | 21.8%        |
| $65 \times 65 \times 1$    | 1027        | 803         | 2.9%         | 21.8%        |
| $45 \times 45 \times 1$    | 2142        | 1674        | 5.3%         | 21.8%        |
| $32 \times 32 \times 1$    | 4235        | 3311        | 8.5%         | 21.8%        |
| $22 \times 22 \times 1$    | 8961        | 7006        | 12.6%        | 21.8%        |
| $16 \times 16 \times 1$    | 16942       | 13246       | 15.7%        | 21.8%        |

IPA devices, this is not the case, since  $I_{C0}$  is a linear function of  $(H_K + 2\pi M_S)$  and not just  $H_K$ . For large device volumes the term  $(H_K + 2\pi M_S)$  is dominated by the  $2\pi M_S$  term, and so reductions in  $H_K$  have negligible impact to  $I_{C0}$ . However, since  $H_K$  increases for small device volumes (to maintain a given targeted value for  $\Delta$ ), the gains in current reduction for IPA devices from reduction in  $H_K$  become more prominent. Estimated reductions in  $I_{C0}$  for various FL volumes are shown in Table I.

#### IV. COMPARATIVE STUDY

In order to assess the merits of the proposed cell, we performed a simulation study to compare the proposed cell against a conventional 1T1MTJ cell and against a 2T2MTJ cell, which in this study is simply two 1T1MTJ cells which hold complimentary data. The MTJs in both the 1T1MTJ and 2T2MTJ cells are modeled as top-pinned devices [27]. The simulation study made use of STM's 65 nm process kit. In the following sections, we describe the details of the simulation study: how the devices in each of our cells were modelled, what simulation parameters were used, how the devices were all individually optimized, and finally present results of the simulation study.

#### A. Device Modeling

For the present study, models for the conventional and proposed devices were developed and written in Verilog-A. This has allowed for the co-simulation of MTJs with transistors, thus allowing transient analysis at both device and transistor level. For the devices studied in this work, the device models can be divided into two components; one component models the tunnel conductance as a function of various device parameters and the relative orientation of the FL and PL magnetization vectors, while the second component models the magnetodynamics of the magnetic layers. The details of the general modeling approaches are discussed below.

1) Tunnel Model: We used the Julliere model [22] and modeled the bias dependent decay of TMR using the approach followed in [28] to yield the following equation for the MTJ resistance as a function of  $\theta$ —the relative angle between the magnetization vector of two adjacent layers—and the applied bias, V:

$$R_{MTJ}(\theta, V) = \frac{2R_P(TMR_0 + 1 + (\frac{V}{V_H})^2)}{2(1 + (\frac{V}{V_H})^2) + TMR_0(1 + \cos(\theta))}$$
(14)

where  $TMR_0$  is the TMR at zero applied bias, and  $V_H$  is the voltage at which the TMR drops to half the value of  $TMR_0$ . In addition to modeling the  $\theta$  and bias voltage dependence on the MTJ resistance, it is also imperative to model the effect of the oxide barrier thickness on tunneling resistance and TMR. In lieu of a quantitative model relating oxide thickness to TMR, for this work we used recent experimental data which presents both tunneling resistance and TMR data over a range of oxide thicknesses [29]. In modeling resistance versus oxide thickness, we fit an exponential model of the form  $R(t_{OX}) = R_0 e^{t_{OX}}$  to the experimental data, while for modeling TMR versus oxide thickness, we built a simple piece-wise linear model directly from the experimental data, instead of attempting to find a functional relationship between TMR and oxide thickness.

2) Magnetodynamics Model: The Verilog-A model implemented for this work solves the LLGS (2), to model the dynamics of the FL magnetization vector. Note that (2) models the dynamics of the FL magnetization vector as it is subjected to a current induced spin transfer torque, in addition to device anisotropies. For this work, we assume the FL behaves as a single magnetic domain, and as such we use the macrospin approximation for the FL [12]. As mentioned in Section II-A, under this approximation  $\vec{\mathbf{H}}_{ANI}$  and  $\vec{\mathbf{H}}_D$  are the principal contributors to  $\vec{\mathbf{H}}_{EFF}$ . Assuming a coordinate system where the easy axis of FL is along the z-axis, the y - z plane is the easy plane of the FL, and the x-axis is the hard axis, simple models for  $\vec{\mathbf{H}}_{ANI}$  and  $\vec{\mathbf{H}}_D$  are [30]:

$$\vec{\mathbf{H}}_{ANI} = H_K \cos(\theta) \vec{\mathbf{z}} \tag{15}$$

$$\mathbf{H}_D = -M_S \sin(\theta) \cos(\phi) \mathbf{\vec{x}}.$$
 (16)

where  $H_K$  is the strength of the anisotropy field,  $\theta$  is the angle between the magnetization of the FL and the easy axis,  $\phi$  is the azimuthal angle between the projection of the magnetization vector of the FL on the x - y plane and the x-axis, and  $\vec{x}$  and  $\vec{z}$ are unit vectors along the x and z axes respectively.

3) Device Dimensions: For the conventional MTJs considered in this study (found in both the conventional 1T1MTJ cell and the differential cell), we chose the FL dimensions to be: 60 nm (length), 60 nm (width), and 1 nm (thickness). For the proposed device however, due to the fact that an additional metallic contact is required adjacent to the tunneling barriers, the FL for the proposed device must be larger than that of a conventional MTJ. This is similar to the case of a previously proposed 3-terminal MTJ [15]. We use the same lambda design rule based analysis used in previous work [15] to estimate the volume for the FL in the proposed device, given identical process node as the conventional MTJ. Since the spacing between the additional metallic contact and the tunneling barriers is half the width of the contact/oxide area, adding a contact adjacent to the oxide area results in increasing the FL area by 250%.

4) Material Parameters: Material parameters for the devices were chosen to match the parameters and characteristics of existing MTJs. The main material parameters to be set are the FL's saturation magnetization,  $M_S$ , thermal stability factor,  $\Delta$ , and the Gilbert damping constant,  $\alpha$ , of the FL. For this study,  $M_S$ was chosen to be  $1050 \text{ emu/cm}^3$ , this is in line with the saturation magnetization of CoFeB alloys [25]. For  $\Delta$ , we chose a value of 55 for the conventional and differential devices, as this allows for a read disturb rate of less than  $10^{-15}$  given a read sense current equal to 40% of the critical current, while for the proposed device we chose a value of 43 to ensure 10 year data retention, as previously discussed. As for the Gilbert damping constant,  $\alpha$ , this was tuned to yield critical current densities similar to recent experimental results. For IPA devices  $\alpha$  was set to 0.001 and for PPA devices  $\alpha$  was set to 0.002, yielding critical current density of 2-3  $MA/cm^3$  and 2.1  $MA/cm^3$ , respectively, which are consistent with experimental results presented in [31] and [26], respectively.

# B. Device Optimization

For all versions of the proposed and conventional devices, oxide thickness and access transistor sizes were optimized for the sake of optimal performance and area efficiency. We targeted these devices for two classes of applications: read-write applications which on average would have an equal number of read and write operations and read-mostly applications which on average have more read operations than write operations. To quantify the two classes of applications, we say that in the former class of applications, read operations occur 50% of the time while write operations occur 50% of the time. Thus the average access time—which is a weighted average of the read and write access times—for a read-write application is:

$$T_{AVE}^{50-50} = 0.5T_{READ} + 0.5T_{WRITE}.$$
 (17)

For a read-mostly application, we chose to set the average number of read operations to be 90%. Thus for the read-mostly applications targeted in this work, the average operation time is:

$$T_{AVE}^{90-10} = 0.9T_{READ} + 0.1T_{WRITE}.$$
 (18)

We begin with device optimization in choosing the optimal oxide thickness for all the devices for the two target application classes mentioned. The resistance of tunneling barriers is exponentially related to the oxide thickness, therefore, increasing oxide thickness reduces the amount of current an access transistor can provide to the device, thus increasing switching times. On the other hand, increasing oxide thickness results in increased RSM (for the same read sense current), thus improving read access time. By taking weighted sums of these read and write access times for all the devices studied in this work, we were able to find optimal values of oxide thickness for the two different application classes targeted here. Sample plots of the variation of average operation time over oxide thickness for IPA devices are shown in Fig. 12. Note that the conventional and differential cells have different curves because the differential cell offers superior read access performance; this is because its differential read access gives it a natural two-fold improvement in RSM over the conventional cell

These plots show that while the average access times for the proposed cell are optimized at small values for oxide thickness, the oxide thicknesses for the conventional and differential cells cannot be set to such small values. In fact, for the conventional and differential cells, the average access times are optimized when the oxide thickness is between 0.85 nm and 1 nm. Small oxide thicknesses degrade the performance of the conventional and differential cells because the read sense current is limited to 40% of the critical current; as the oxide thickness is decreased, the RSM, which is effectively the product of the read sense current and the oxide barrier resistance, must decrease as well. In fact, given a minimum value for RSM (chosen to be 50 mV in this study), there is also a lower limit on the minimum resistance of the tunneling oxide (and thus a minimum value for the oxide thickness). For the proposed device however, we are





Fig. 12. Average operation time versus oxide thickness for IPA devices (a) Read-Write Applications (b) Read-Mostly Applications.

free to increase the read sense current as the tunneling oxide thickness is decreased (to compensate for the decreased tunneling resistance), and thus a desired RSM can be maintained as oxide thickness is reduced. This explains why the average operation time continues to decrease as oxide thickness decreases for the proposed device; the read access time does not degrade as oxide thickness is reduced (again because RSM does not degrade as oxide thickness is reduced), while the write access time improves, as such the overall average operation time improves. The plots also show that the proposed cell shows inferior performance compared to the conventional and differential cells as the oxide thickness is increased; this is because given identical oxide thicknesses, the proposed device has inferior write performance compared to the conventional and differential cells (as will be discussed in Section IV-C). For large oxide thicknesses, due to the limited output swing of the read circuitry, the read sense current for the proposed cell cannot be increased to improve read performance as a means to compensate for the cell's degraded write performance.

In addition to optimizing oxide thickness, we need to optimize access transistor size for each cell. The access transistor width affects the read and write access times in different ways: as transistor width is increased write access time improves (since a larger current can be applied to the cell during a write operation), however read access time is degraded due to the increased capacitive loading on the BLs. Fig. 13 shows sample plots of the average operation time versus access transistor width for IPA devices; note that the access transistors have minimum length.

These plots show that while the differential and conventional cells show degradation in performance as access transistor width is increased, the proposed cell shows improved performance as access transistor width is increased. This is because for the differential and conventional cells, as the access

Fig. 13. Average operation time versus transistor width for IPA devices (a) Read-Write Applications (b) Read-Mostly Applications.

transistor width is increased past a certain width (twice the minimum width for both cases for the plot shown), the degradation in read access time cannot be overcome by the improvement in write access time, and as such the overall average access time is degraded. However, for the proposed cell, again the effects of degraded read access time from the increased BL loading can be compensated for by increasing the read sense current, which increases RSM; to reiterate, this was not a viable option for the conventional and differential cells due to the necessary restriction on read sense current for these two cells. Note that Fig. 13(b) shows that the performance of the proposed cell eventually begins to degrade as the access transistor width in increased past four times the minimum transistor width. This is because limitations on output swing of the sense node (the swing is dictated by  $V_{DD}$  and the overdrive voltage of the current source providing the read sense current) limits the extent to which the read sense current can be increased; as such for the proposed cell the penalty to read access time can only be compensated for to a certain extent. Nonetheless, we see that overall the proposed cell allows the read and write access operations to be independently optimized, and presents a potentially greater optimization space (and thus design flexibility) than the differential and conventional cells.

For this study, we chose to normalize cell areas between the three cells considered in this study; as such we chose to first optimize the access transistor size of the conventional cells, and then we chose access transistor sizes for the differential and proposed cells that would minimize the difference in cell areas between all three cells. Table II show the optimal oxide thicknesses and the transistor widths chosen for the simulation study. Finally, Fig. 14(a)–14(d) show layouts of the cells in this study; all WLs are routed on Metal-1/Poly, while SLs and BLs are routed on Metal-2.



Fig. 14. Cell layouts (a) Conventional Cell, 120 nm transistor width (b) Conventional Cell, 240 nm transistor width (c) Differential Cell (d) Proposed Cell.

TABLE II Optimal Oxide Thicknesses and Chosen Transistor Widths for Cells in This Study

|                  | Optimal Oxide<br>Thicknesses |         | Chosen Transistor<br>Widths |       |  |
|------------------|------------------------------|---------|-----------------------------|-------|--|
|                  | R/W R-M                      |         | R/W                         | R-M   |  |
|                  | Appl.                        | Appl.   | Appl.                       | Appl. |  |
| Conventional IPA | 0.95nm                       | 0.97nm  | 240nm                       | 120nm |  |
| Conventional PPA | 1.01nm                       | 1.05nm  | 240nm                       | 120nm |  |
| Differential IPA | 0.875nm                      | 0.925nm | 120nm                       | 120nm |  |
| Differential PPA | 0.9nm                        | 0.95nm  | 120nm                       | 120nm |  |
| Proposed IPA     | 0.7nm                        | 0.76nm  | 120nm                       | 120nm |  |
| Proposed PPA     | 0.7nm                        | 0.78nm  | 120nm                       | 120nm |  |

## C. Simulation Results: Write Performance

Fig. 15(a)-15(d) show comparisons of the write performance between the different cells considered in this study; the plots show the time evolution of the *z*-component of the FL magnetization vectors during an antiparallelizing write operation (as this is the worst case) for the devices being compared. We measure the switching time as the time taken for the *z*-component of the FL magnetization vector to equal the switching threshold, which we define to be equal to zero. The switching times for all of the cells considered in this study are summarized in Table III.

The results show that for IPA devices optimized for Read-Write applications, the conventional cell offers the best write performance because its access transistor width is twice that of the other cells, thus allowing for increased current during a write operation. For IPA devices optimized for Read-Mostly applications however, we first note that in general all of the write access times have degraded compared to the devices optimized for Read-Write applications; this is expected since the oxide thicknesses for these cells must be increased to improve read performance (which compromises write performance). We also note that now, the conventional cell offers the worst performance, while the differential and conventional cells have sim-



Fig. 15. Write operation comparison (a) IPA Devices, Read-Write application (b) IPA Devices, Read-Mostly application (c) PPA Devices, Read-Write application (d) PPA Devices, Read-Mostly application.

ilar switching times, from which we conclude that due to the lower RSM of the conventional cell compared to the proposed and differential cells, when optimizing for Read-Mostly applications the conventional cell oxide resistance must be increased by a greater degree than the proposed and differential cells.

Another interesting result shown in the table is that while the PPA versions of the differential and proposed cells show an improvement in switching time over their IPA counterparts, the

|              | IPA Devices           Conv.         Diff.         Prop. |         |         | PPA Devices |         |         |  |
|--------------|---------------------------------------------------------|---------|---------|-------------|---------|---------|--|
|              |                                                         |         |         | Conv.       | Diff.   | Prop.   |  |
|              | Cell                                                    | Cell    | Cell    | Cell        | Cell    | Cell    |  |
| Read-Write   | 7.07m                                                   | 8 65ns  | 8 30ns  | 8 72ns      | 6 5ns   | 7.4500  |  |
| Applications | 7.07115                                                 | 0.05115 | 0.39115 | 0.72115     | 0.5115  | 7.45115 |  |
| Read-Mostly  | 10.69ns                                                 | 9.15ns  | 9.2ns   | 13.62ns     | 6 96ns  | 8.03ns  |  |
| Applications | 10.07113                                                | 7.15115 | 7.2115  | 15.02115    | 0.90115 | 0.05113 |  |

TABLE III Write Performance Summary

|      | TABLE IV    |         |
|------|-------------|---------|
| READ | PERFORMANCE | SUMMARY |

|              | IPA Devices |                 |         | PPA Devices |         |         |  |
|--------------|-------------|-----------------|---------|-------------|---------|---------|--|
|              | Conv.       | Diff. Prop.     |         | Conv.       | Diff.   | Prop.   |  |
|              | Cell        | Cell            | Cell    | Cell        | Cell    | Cell    |  |
| Read-Write   | 2 58ng      | 1.46mg          | 1.0800  | 4.65mg      | 1.61m   | 1.0820  |  |
| Applications | 5.50115     | 1.40115 1.00115 |         | 4.05115     | 1.01115 | 1.00115 |  |
| Read-Mostly  | 2 4ns       | 1 2ns           | 0.76ns  | 2.96ns      | 1 33ns  | 0.6ns   |  |
| Applications | 2.4115      | 1.2115          | 0.70113 | 2.70115     | 1.55115 | 0.0115  |  |

conventional cell shows degradation. Since PPA devices have lower critical current than IPA devices, larger oxide thicknesses are required in both the conventional and differential cells in order to obtain similar RSM compared to the IPA devices. Since the differential cell naturally has larger read sense margin than the conventional cell (by virtue of the differential nature of the read operation), the increase in oxide thickness between the IPA and PPA embodiments of the differential cell was not as large as compared to the conventional cell (an increase of 0.025 nm for the differential cell versus an increase of over 0.05 nm for the conventional cell). While the increase in oxide thickness for the differential cell offsets the gains in write performance from a reduced critical current, we still observe an improvement in switching time over the IPA version of the cell; however this is not the case for the conventional cell. For the proposed cell however, since the read sense margin is completely unrelated to the cell's critical current, the oxide thickness does not have to increase compared to the IPA version of the cell to maintain similar levels of read performance.

In comparison to the differential cell, the proposed cell offers very similar switching times for the case where the devices have IPA. When the devices have PPA however, the proposed cell shows an average increase over the differential cell of approximately 15%. Compared to the conventional cell, the proposed cell shows an increase in switching time of approximately 19% for the case where the cells are comprised of IPA devices optimized for read-write applications. However, for all other cases, the proposed cell shows an improvement in write access time: the proposed cell shows an average decrease of 23%. While the proposed device offers various means to reduce switching time-such as reduced thermal stability and optimized oxide thickness, in addition to ensuring the current driven torque always originates from a PL whose magnetization vector is antiparallel to the magnetization vector of the FL-the simulation results show that in certain cases, the conventional and differential cells offer faster switching. This is mainly due to the fact that despite the measures employed to reduce the switching time of the proposed cell, the degradation in spin transfer torque due to the fact that the FL of the proposed cell is 250% the size of the FLs in the conventional and differential cells can, in some cases, result in larger switching times. However, potential fabrication techniques to reduce the volume of the FL, such as reducing the



Fig. 16. Read operation comparison (a) IPA Devices, Read-Write application (b) IPA Devices, Read-Mostly application (c) PPA Devices, Read-Write application (d) PPA Devices, Read-Mostly application.

thickness of the FL in the area beneath the metallic contact or by fabricating part of the FL with a non-magnetic material, can result in increases in write performance of the proposed cell. Exploring such techniques in detail is left as future work.

#### D. Simulation Results: Read Performance

The main advantage of the proposed cell is in its read performance, as will be shown in this section. In Fig. 16(a)-16(d), we monitor the read sense signal developed across relevant nodes during a read operation for the conventional, differential, and proposed cells. In memories, these signals are typically sensed and latched by a sense amplifier circuit—these circuits typically

|                                                             |                             | Average Access | Read Op. | Write Op. | Read Op. | Write Op. |           |             |
|-------------------------------------------------------------|-----------------------------|----------------|----------|-----------|----------|-----------|-----------|-------------|
|                                                             |                             | Time           | Energy   | Energy    | EDP      | EDP       | Cell Area |             |
|                                                             |                             |                | [ns]     | [pJ]      | [pJ]     | [pJ·ns]   | [pJ·ns]   | $[\mu m^2]$ |
|                                                             | Dood Write                  | Conv. Cell     | 5.32     | 0.19      | 1.4      | 0.67      | 9.93      | 0.567       |
| IPA<br>Devices Read Mostly<br>Applications                  | Diff. Cell                  | 5.06           | 0.15     | 2.54      | 0.22     | 22        | 0.643     |             |
|                                                             | Applications                | Prop. Cell     | 4.74     | 0.3       | 1.51     | 0.32      | 12.67     | 0.643       |
|                                                             | Read Mostly<br>Applications | Conv. Cell     | 3.23     | 0.13      | 1.33     | 0.3       | 14.17     | 0.352       |
|                                                             |                             | Diff. Cell     | 2        | 0.12      | 2.47     | 0.15      | 22.57     | 0.643       |
|                                                             |                             | Prop. Cell     | 1.6      | 0.2       | 1.6      | 0.15      | 14.74     | 0.643       |
| PPA<br>Devices Read Mo<br>Applicati<br>Read Mo<br>Applicati | Dood Write                  | Conv. Cell     | 6.69     | 0.18      | 1.48     | 0.83      | 12.89     | 0.567       |
|                                                             | Applications                | Diff. Cell     | 4.06     | 0.12      | 1.83     | 0.2       | 11.87     | 0.643       |
|                                                             | Applications                | Prop. Cell     | 4.27     | 0.3       | 1.33     | 0.32      | 9.91      | 0.643       |
|                                                             | Read Mostly<br>Applications | Conv. Cell     | 4.03     | 0.12      | 1.42     | 0.34      | 19.39     | 0.352       |
|                                                             |                             | Diff. Cell     | 1.89     | 0.11      | 1.88     | 0.14      | 13.09     | 0.643       |
|                                                             |                             | Prop. Cell     | 1.49     | 0.16      | 1.4      | 0.1       | 11.24     | 0.643       |

TABLE V Results Summary

require some minimum voltage to develop across their input terminals to overcome effects of input offset, variation, and noise, so that the state of the memory cells can be detected with a low error rate. For this study, we set the required threshold to be 50 mV, and so our definition of read sense time is the time it takes for the read sense signals to reach this 50 mV threshold. The read times for all of the cells considered in this study are summarized in Table IV.

Since PPA devices have lower critical currents than their IPA counterparts, the conventional and differential cells require reduced read sense currents to ensure a low read disturbance rate, however this comes at the cost of increased read access times; this is reflected in the results when comparing the simulations in Fig. 16(c) and 16(d) to Fig. 16(a) and 16(b). However, this is not the case for the proposed cell since the critical current does not limit how much current can be supplied to the cell during a read operation.

Overall, we see that the proposed cell offers improvements in read access times over the conventional and differential cell for all cases. Compared to a differential cell, the proposed cell is able to achieve an average reduction in read access time of approximately 38% over all of the different cases considered in this study. Compared to a conventional cell, the proposed cell achieves an average reduction in read access time of approximately 74% over all cases. The substantially reduced read access time for the proposed cell is attributed to an increased read sense current which allows for larger read sense margin (larger than the targeted 50 mV sense margin for the conventional cell), and also, the reduced oxide resistances allow for faster overall time constants for the sensing operation.

As a verification of the immunity to read disturbance, we also plot the simulated normalized net torque applied to the FL during a read operation for the different versions of the proposed device studied in this work; this is shown in Fig. 17. The plot shows the value of  $I_{TPL}\eta(\theta_{TPL}) - I_{BPL}\eta(\theta_{BPL})$ , which is equal to the total spin transfer torque acting on the FL, normalized by the term  $[-\gamma\hbar/((1 + \alpha^2)2eM_SVol)]\vec{\mathbf{m}}_{FL} \times (\vec{\mathbf{m}}_{FL} \times \vec{\mathbf{m}}_{TPL})$ , during the course of a read operation. The net torque acts to pull  $\vec{\mathbf{m}}_{FL}$  towards  $\vec{\mathbf{m}}_{BPL}$  when the normalized spin transfer torque term is positive, and pulls  $\vec{\mathbf{m}}_{FL}$  towards  $\vec{\mathbf{m}}_{TPL}$  when this term is negative. Thus these plots provide insight into the direction of the net torque during a read operation. In the figure, it is clear that for all versions of the proposed device, during a read operation, when a "1" is stored in the cell (i.e., the



Fig. 17. Simulated normalized net torque acting on FLs of all versions of proposed device during a read operation.

FL magnetization vector is parallel to the TPL magnetization vector), the net spin transfer torque pulls  $\vec{\mathbf{m}}_{FL}$  towards  $\vec{\mathbf{m}}_{TPL}$ . Similarly, when a "0" is stored in the cell, the net spin transfer torque pulls  $\vec{\mathbf{m}}_{FL}$  towards  $\vec{\mathbf{m}}_{BPL}$ . As such, these simulations show that the net spin transfer torque acting on the FL during a read operation serves to refresh the existing data in the cell, thus guaranteeing disturbance-free read operation.

# E. Results Summary

Table V summarizes the performance achieved (comparing average access times), energy per operation, energy delay product (EDP) per operation, and cells sizes of the different variants of the conventional, differential, and proposed cells considered in this study. As can be seen in the table, by measure of average access time, the proposed device offers the greatest performance in three of the four cases presented in this study, and shows clear superiority for read-mostly applications (as is expected given its superior read performance). One point to note is that the read operation energy for the proposed device is larger in all cases than the conventional and differential cells; this is primarily because the proposed cell makes use of a larger read sense current during a read operation, and the higher read sense current results in increased power dissipation during a read operation. However, note that as part of the design philosophy employed in optimizing the proposed device, a larger read sense current enables us to reduce oxide thickness,

and this leads to optimized write performance. The write operation energy of the proposed device is competitive against the conventional and differential cells. As such, this design methodology enables us to tradeoff write operation energy for read operation energy. Since write operation energy is approximately an order of magnitude larger than read operation energy (over all cases as shown in the table), it is favourable from the point of view of *overall* energy consumption to sacrifice the energy efficiency of a read operation in a bid to improve write operation energy.

From the point of view of EDP, we can see that for IPA devices, the proposed cell is very close to the conventional cell, and the differential cell offers very poor overall EDP. For PPA cells, we see the proposed cell offers the best overall EDP. In addition to these benefits, it should again be stated that the cell offers guaranteed read disturbance immunity and improved tolerance to process variation (over the conventional 1T1MTJ cell, while the 2T2MTJ cell is anticipated to also have improved tolerance to variation), although the proposed cell does incur the cost of increased cell area over the conventional 1T1MTJ cell.

#### V. CONCLUSION

In conclusion, in this work we have proposed a novel MTJ structure and memory cell for STT-MRAM. To the best of our knowledge, the proposed cell in this work is the first to offer truly disturbance-free read operation. Simulation results show that the proposed cell offers superior performance over the conventional 1T1MTJ cell, and for most cases, offers superior performance over the 2T2MTJ cell. We believe that with these characteristics, the proposed cell will be ideally suited for applications which allow sacrificing density for high performance, such as in emerging embedded applications for which STT-MRAM have recently been targeted [32].

## ACKNOWLEDGMENT

The authors thank Aynaz Vatankhahghadim and the anonymous reviewers whose feedback was crucial in strengthening this paper. The authors also acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for funding and the Canadian Microelectronics Corporation (CMC) for tools support and equipment.

#### REFERENCES

- [1] T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Min Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno, "2 Mb SPRAM (SPin-transfer torque RAM) with bit-by-Bit bi-directional current write and parallelizing-direction current read," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 109–120, Jan. 2008.
- [2] D. Halupka, S. Huda, W. Song, A. Sheikholeslami, K. Tsunoda, C. Yoshida, and M. Aoki, "Negative-resistance read and write schemes for STT-MRAM in 0.13 μm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2010, pp. 256–257.
- [3] K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe, "A 64 Mb MRAM with clamped-reference and adequate-reference schemes," in *Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2010, pp. 258–259.
- [4] J. P. Kim, T. Kim, W. Hao, H. M. Rao, K. Lee, X. Zhu, X. Li, W. Hsu, S. H. Kang, N. Matt, and N. Yu, "A 45 nm 1 Mb embedded STT-MRAM with design techniques to minimize read-disturbance," in *Proc. Symp. VLSI Circuits (VLSIC)*, Jun. 2011, pp. 296–297.

- [5] A. Driskill-Smith, S. Watts, D. Apalkov, D. Druist, X. Tang, Z. Diao, X. Luo, A. Ong, V. Nikitin, and E. Chen, "Non-volatile spin-transfer torque RAM (STT-RAM): An analysis of chip data, thermal stability and scalability," in *Proc. IEEE Int. Memory Workshop (IMW)*, May 2010, pp. 1–3.
- [6] N. A. Spaldin, Magnetic Materials: Fundamentals and Device Applications. : Cambridge University Press, 2003.
- [7] D. D. Tang and Y. J. Lee, Magnetic Memory: Fundamentals and Technology. Cambridge, U.K.: Cambridge Univ. Press, 2010.
- [8] T. W. Andre, J. J. Nahas, C. K. Subramanian, B. J. Garni, H. S. Lin, A. Omair, and W. L. Martino, Jr, "A 4-Mb 0.18 µm 1T1MTJ toggle MRAM with balanced three input sensing scheme and locally mirrored unidirectional write drivers," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 301–309, Jan. 2005.
- [9] J. C. Slonczewski, "Current-driven excitation of magnetic multilayers," J. Magn. Magn. Mater., vol. 159, no. 1-2, pp. L1–L7, 1996.
- [10] J. Z. Sun and D. C. Ralph, "Magnetoresistance and spin-transfer torque in magnetic tunnel junctions," *J. Magn. Magn. Mater.*, vol. 320, no. 7, pp. 1227–1237, 2008.
- [11] J. Slonczewski and J. Sun, "Theory of voltage-driven current and torque in magnetic tunnel junctions," J. Magn. Magn. Mater., vol. 310, no. 2, pp. 169–175, 2007.
- [12] J. Xiao, A. Zangwill, and M. D. Stiles, "Macrospin models of spin transfer dynamics," *Phys. Rev. B*, vol. 72, no. 1, pp. 014446-1-014446-13, 2005.
- [13] Y. Huai, M. Pakala, Z. Diao, and Y. Ding, "Spin-transfer switching current distribution and reduction in magnetic tunneling junction-based structures," *IEEE Trans. Magn.*, vol. 41, no. 10, pp. 2621–2626, Oct. 2005.
- [14] J. Li, H. Liu, S. Salahuddin, and K. Roy, "Variation-tolerant spin-torque transfer (STT) MRAM array for yield enhancement," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2008, pp. 193–196.
- [15] N. N. Mojumder, S. K. Gupta, S. H. Choday, D. E. Nikonov, and K. Roy, "A three-terminal dual-pillar STT-MRAM for high-performance robust memory applications," *IEEE Trans. Electron Devices*, vol. 58, no. 5, pp. 1508–1516, May 2011.
- [16] P. M. Braganca, J. A. Katine, N. C. Emley, D. Mauri, J. R. Childress, P. M. Rice, E. Delenia, D. C. Ralph, and R. A. Buhrman, "A threeterminal approach to developing spin-torque written magnetic random access memory cells," *IEEE Trans. Nanotechnol.*, vol. 8, no. 2, pp. 190–195, Mar. 2009.
- [17] J. C. Sankey, Y.-T. Cui, R. A. Buhrman, D. C. Ralph, J. Z. Sun, and J. C. Slonczewski, "Measurement of the spin-transfer-torque vector in magnetic tunnel junctions," *Nature Phys.*, vol. 4, no. 1, pp. 67–71, 2007.
- [18] H. Kubota, A. Fukushima, K. Yakushiji, T. Nagahama, S. Yuasa, K. Ando, H. Maehara, Y. Nagamine, K. Tsunekawa, D. D. Djayaprawira, N. Watanabe, and Y. Suzuki, "Quantitative measurement of voltage dependence of spin-transfer torque in MgO-based magnetic tunnel junctions," *Nature Phys.*, vol. 4, no. 1, pp. 37–41, 2007.
- [19] K.-T. Nam, S. C. Oh, J. E. Lee, J. H. Jeong, I. G. Baek, E. K. Yim, J. S. Zhao, S. O. Park, H. S. Kim, U.-I. Chung, and J. T. Moon, "Switching properties in spin transper torque MRAM with sub-5 Onm MTJ size," in *Proc. 7th Annu. Non-Volatile Memory Technol. Symp. (NVMTS)*, Nov. 2006, pp. 49–51.
- [20] M. Nakayama, T. Kai, N. Shimomura, M. Amano, E. Kitagawa, T. Nagase, M. Yoshikawa, T. Kishi, S. Ikegawa, and H. Yoda, "Spin transfer switching in TbCoFe/CoFeB/MgO/CoFeB/TbCoFe magnetic tunnel junctions with perpendicular magnetic anisotropy," *J. Appl. Phys.*, vol. 103, no. 7, pp. 07A710–07A710-3, 2008.
- [21] C. J. Lin, S. H. Kang, Y. J. Wang, K. Lee, X. Zhu, W. C. Chen, X. Li, W. N. Hsu, Y. C. Kao, M. T. Liu, Y. C. Lin, M. Nowak, N. Yu, and L. Tran, "45 nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell," in *Proc. IEEE Int. Electron Devices Meeting (IEDM)*, Dec. 2009, pp. 1–4.
- [22] M. Julliere, "Tunneling between ferromagnetic films," *Phys. Lett. A*, vol. 54, no. 3, pp. 225–226, 1975.
- [23] S. Yuasa, T. Nagahama, A. Fukushima, Y. Suzuki, and K. Ando, "Giant room-temperature magnetoresistance in single-crystal Fe/Mgo/Fe magnetic tunnel junctions," *Nature Mater.*, vol. 3, no. 12, pp. 868–871, 2004.
- [24] J. Hayakawa, S. Ikeda, F. Matsukura, H. Takahashi, and H. Ohno, "Dependence of giant tunnel magnetoresistance of sputtered CoFeB/Mgo/CoFeB magnetic tunnel junctions on MgO barrier thickness and annealing temperature," *Jpn. J. Appl. Phys.*, vol. 44, no. 19, pp. L587–L589, 2005.

- [25] Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, L.-C. Wang, and Y. Huai, "Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory," *J. Phys., Condens. Matter*, vol. 19, no. 16, pp. 165209-1–165209-13, 2007.
- [26] T. Kishi, H. Yoda, T. Kai, T. Nagase, E. Kitagawa, M. Yoshikawa, K. Nishiyama, T. Daibou, M. Nagamine, M. Amano, S. Takahashi, M. Nakayama, N. Shimomura, H. Aikawa, S. Ikegawa, S. Yuasa, K. Yakushiji, H. Kubota, A. Fukushima, M. Oogane, T. Miyazaki, and K. Ando, "Lower-current and fast switching of a perpendicular TMR for high speed and high density spin-transfer-torque MRAM," in *Proc. IEEE Int. Electron Devices Meet. (IEDM)*, Dec. 2008, pp. 1–4.
- [27] Y. M. Lee, C. Yoshida, K. Tsunoda, S. Umehara, M. Aoki, and T. Sugii, "Highly scalable STT-MRAM with MTJs of top-pinned structure in 1T/1MTJ cell," in *Proc. Symp. VLSI Technol. (VLSIT)*, Jun. 2010, pp. 49–50.
- [28] M. Madec, J.-B. Kammerer, F. Pregaldiny, L. Hebrard, and C. Lallement, "Compact modeling of magnetic tunnel junction," in *Proc. Joint* 6th Int. IEEE Northeast Workshop Circuits Syst. and TAISA Conf. (NEWCAS-TAISA), Jun. 2008, pp. 229–232.
- [29] Z. M. Zeng, P. Khalili Amiri, G. Rowlands, H. Zhao, I. N. Krivorotov, J. P. Wang, J. A. Katine, J. Langer, K. Galatsis, K. L. Wang, and H. W. Jiang, "Effect of resistance-area product on spin-transfer switching in MgO-based magnetic tunnel junction memory cells," *Appl. Phys. Lett.*, vol. 98, no. 7, pp. 072512-1–072512-3, 2011.
- [30] M. D. Stiles and J. Miltat, "Spin transfer torque and dynamics," *Spin Dynamics in Confined Structures III*, vol. 101, pp. 1–58, Sep. 2006.
- [31] Y. Huai, D. Apalkov, Z. Diao, Y. Ding, A. Panchula, M. Pakala, L.-C. Wang, and E. Chen, "Structure, materials and shape optimization of magnetic tunnel junction devices: Spin-transfer switching current reduction for future magnetoresistive random access memory application," *Jpn. J. Appl. Phys.*, vol. 45, no. 5A, pp. 3835–3841, 2006.
- [32] K. Lee and S. H. Kang, "Development of embedded STT-MRAM for mobile system-on-Chips," *IEEE Trans. Magn.*, vol. 47, no. 1, pp. 131–136, Jan. 2011.



**Safeen Huda** (S'13) received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Toronto, Toronto, ON, Canada in 2009 and 2012, respectively. He is currently working toward the Ph.D. degree in computer engineering at the University of Toronto.

His research interests include spintronic circuits, low power circuit design, development of CAD tools for digital circuit optimization, and FPGAs.

Mr. Huda has held the Natural Sciences and Engineering Research Council of Canada (NSERC)

Canada Graduate Scholarship and the University of Toronto Fellowship.



Ali Sheikholeslami S'98–M'99–SM'02) received the B.Sc. degree from Shiraz University, Shiraz, Iran, in 1990 and the M.A.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1994 and 1999, respectively, all in electrical and computer engineering.

In 1999, he joined the Department of Electrical and Computer Engineering, University of Toronto, as an Assistant Professor. He was promoted to the rank of Associate Professor in 2004 and to the rank of Full Professor in 2010. His research interests are in the

areas of analog and digital integrated circuits, high-speed signaling, and VLSI memory design (including STT-MRAM). He spent his 2005–2006 research sabbatical with Fujitsu Labs of Japan and Fujitsu Labs of America. He is currently spending his 2012–2013 research sabbatical with Analog Devices in Toronto, Canada.

Dr. Sheikholeslami served on the Memory Subcommittee of the IEEE International Solid-State Circuits Conference (ISSCC) from 2001 to 2004, and on the Technology Directions Subcommittee of the same conference from 2002 to 2005. He currently serves on the Wireline Subcommittee of ISSCC and on the executive committee of the same conference as its tutorial chair. He presented a tutorial on ferroelectric memory design at ISSCC 2002 and a tutorial on high-speed signaling at ISSCC 2008. In 2010 and 2011, he was an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS. He was the program chair for the 34th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2004) held in Toronto, Canada. He is a registered professional engineer in the province of Ontario, Canada. Dr. Sheikholeslami has received the Best Professor of the Year Award four times (in 2000, 2002, 2005, and 2007) by the popular vote of the undergraduate students in the Department of Electrical and Computer Engineering, University of Toronto. He received the 2005-2006 Early Career Teaching Award and the 2010 Faculty Teaching Award, both from the Faculty of Applied Science and Engineering at the University of Toronto, in "Recognition of Superb Accomplishment in Teaching."