Description: Description: Y:\public_www\mapa2005.jpg

Andreas Moshovos (alternate spellings: MoschovosΜόσχοβος)
Computer Engineering Group
Engineering Annex 310 (map)
Department of Electrical and Computer Engineering
Department of Computer Science (courtesy)
University of Toronto
FAX: 416-946-8734



 [Research][Group Members][Courses][Publications][Committees][Education][Nikos Moshovos]


Micro-Architecture Group / Research Interests


2013 Summer CUDA workshop, Thursdays in GB244. Thursday May 16: Seminar by Prof. Tor Aamodt, UBC.

2012 Summer CUDA workshop: the website is up.


The AENAO group is addressing emerging challenges in digital system design with an emphasis on power, complexity and performance optimizations for general purpose processors. Our research has produced several key innovations, including:


  • Predictor Virtualization: On chip caches have been used to store instructions and data. As their capacity increases we are looking at using these caches to also store program metadata, that is information about program behavior that is collected dynamically or provided through a software interface to hardware. This opens up several new opportunities for improving modern computer systems. We have been successful in demonstrating significant area savings for implementing prefetchers (instead of 60KB the same predictor requires just 1KB with predictor virtualization) and for allowing the implementation of otherwise impractical predictors (we get the accuracy of a 4K entry Branch Target Buffer with just 1K entries). An overview of this research direction can be found here.
  • Power-Aware Snoop Coherence Filters: Our group was the first to draw attention to the opportunities and the need for power optimizations for cache coherence mechanisms. We proposed Jetty, a simple, layered extension over snoop coherence that stops remotely-induced snoops from accessing the local cache tag arrays, thus saving power and reducing bandwidth on the tag arrays. In Jetty we also introduced hardware counting bloom-like filters a structure that provides fast and power efficient membership tests. Other researchers since then have used similar structures for other optimizations such as load/store queue complexity reduction and hit/miss prediction. Recently, we proposed a novel implementation of these filters that further improves their power and speed. In more recent work, our group introduced RegionScout a technique that avoids broadcasts in snoop-coherent systems. RegionsScout improves upon Jetty in that the source node knows a priori that a request will miss in all other nodes. Jetty has influenced the design of commercial snoop filters. An overview of some snoop filtering work can be found here.
  • Checkpoint Prediction and Intelligent Management: Our group was the first to draw attention to the lack of scalability in existing checkpoint/restore mechanisms that are used to support speculation in modern processors. We proposed checkpoint prediction along with intelligent checkpoint management methods to sustain high-performance with much fewer checkpoints.
  • Memory Dependence Prediction: In our earlier work while at the University of Wisconsin-Madison, we proposed a novel solution the the decades old problem of memory aliasing. Memory dependence prediction dynamically predicts dependences amongst memory operations. We proposed several optimizations. Some of these optimizations have been since implemented in commercial designs.

Our research currently focuses on two important design challenges: (i) the ever-growing gap between processor and memory performance, and (ii) the increasing complexity and reduced reliability of existing performance enhancing techniques coupled with the prohibitive levels of power dissipation. In addition, our research also considers how the enhanced semiconductor technologies can be used to enhance functionality. Our research focuses primarily in developing techniques for addressing perceived, long-term design challenges. A common theme amongst the techniques that we develop is that they are behavior-centric (i.e., they exploit aspects of the behavior of "typical" applications), programmer transparent (they required no changes to existing software) and often layered extensions (i.e., they can be incorporated into existing designs with minimal changes).

Talk on Recent Research Results given at several places including: IBM T.J. Watson, UIUC, Northwestern, Intel Oregon and Santa Clara, EPFL, and CMU.

An introduction to programming graphics processors given at the Sunnybrook Hospital.

A wiki that keeps track of developments in the snoop filtering area.

We thank for their continuing support through an academic license for SimICS.


Group Members



    • Kaveh Aasaraai (Ph.D. Candidate)
    • Ian Katsuno (M.A.Sc. Candidate)
    • Goran Narancic (M.A.Sc. Candidate)
    • Myrto Papadopoulou (Ph.D. Candidate)
    • Jason Zebchuk (Ph.D. Candidate)
    • Vitaly Zakharenko (M.A.Sc. Candidate)
    • Islam Atta (Ph.D. Candidate)
    • Alhassan Khedr
    • Michel Nacouzi
    • Patrick Judd
    • Xin Tong
    • Di Wu


    • Amirali Baniasadi, Faculty at the University of Victoria.
    • Gaurav Mittal, M.Sc. Dec. 2001.
    • Christopher Thomas, M.A.Sc., currently at York Univ.
    • Won-Ho Park, M.A.Sc.
    • Navid Azizi, Co-Advised w/ Prof. Farid Najm
    • Patrick Akl  M.A.Sc., currenly with AMD/ATI, Markham, ON.
    • Elham Safi Ph.D., currently with  SecureKey, Toronto, ON.
    • Elias Ferzli, M.A.Sc., Altera, Toronto, ON
    • Maryam Sadooghi-Alvandi, M.A.Sc.
    • Ioana Burcea, Ph.D. “Predictor Virtualization”, IBM T.J. Watson








Please respect all applicable copyrights. Please check the ACM and IEEE websites for details. At the time of this writing their policy was: Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. To copy otherwise, or to republish requires a fee and/or specific permission of the ACM/IEEE





    • Temporal Instruction Fetch Streaming,
      Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos,
      In Proc. International Symposium on Microarchitecture, Nov. 2008.
    • A Physical Level Study and Optimization of CAM-Based Checkpointed Register Alias Tables,
      Elham Safi, Andreas Moshovos, and Andreas Veneris,
      In Proc. International Symposium on Low Power Electronics and Design, August 2008 (short paper).
    • Temporal Streams in Commercial Server Applications,
      Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos,
      International Symposium on Workload Characterization.
    • Predictor Virtualization,
      Ioana Burcea, Stephen Somogyi, Andreas Moshovos and Babak Falsafi,
      In Proc. ACM Intl’ Conference on Architectural Support for Programming Languages and Operating Systems, March. 2008.
    • Turbo-ROB: A Low-Cost, Simple Checkpoint/Restore Accelerator
      Patrick Akl and Andreas Moshovos,
      In Proc. 2008 International Conference on High Performance Embedded Architectures & Compilers, Jan. 2008.















Implementing Non-Numerical Algorithms On An Access Decoupled Architecture That Supports Software Pipelining
Andreas Moshovos,Advisor: M. Katevenis.
M.Sc. Thesis, Aug. 1992.


Technical Program Committees


  • Sub-chair on Architecture, Intl. Conference on Parallel Processing, 2010
  • IEEE MICRO Top Picks, 2010
  • IEEE Intl. Symposium on High-Performance Computer Architecture (HPCA), 2010.
  • Design and Automation Europe (DATE), 2010.
  • SAMOS Workshop, 2009.
  • International Conference on High-Performance Computing, (HiPC 2009).
  • Design and Automation Europe (DATE), 2009.
  • International Workshop on Data Management on New Hardware (DaMoN), June 2009.
  • Fourth Conference on High-Performance Embedded Architectures and Compilers  (HiPEAC), 2009.
  • 12th Pan-Hellenic Conference on Informatics (PCI 2008)
  • International Conference on Computer Design (ICCD), 2008.
  • Design and Automation Europe (DATE), 2008.
  • SAMOS Workshop, 2008.
  • International Conference on Parallel Processing, 2008
  • IEEE Intl. Symposium on High-Performance Computer Architecture (HPCA), 2008.
  • Third Conference on High-Performance Embedded Architectures and Compilers  (HiPEAC), 2008.
  • Design and Automation Europe (DATE), 2007.
  • ACM/IEEE Conference on Parallel Architectures and Compilation Techniques (PACT), 2007.
  • ACM/IEEE Intl. Symposium on Microarchitecture (MICRO), 2007.
  • ACM/IEEE Intl. Symposium on Computer Architecture (ISCA), 2007.
  • IEEE International Parallel & Distributed Processing Symposium (IPDPS), March 2007.
  • ACM Annual International Conference on Supercomputing, July 2006.
  • Second International Workshop on Data Management on New Hardware (DaMoN), June 2006.
  • Workshop on Architectural Support for Gigascale Integration (ASGI), June 2006.
  • ACM/IEEE Intl. Symposium on Microarchitecture (MICRO), Nov. 2005.
  • First HiPEAC Conference, 2005.
  • Intl. Conference on Parallel Processing (ICPP), June 2005.
  • Workshop on Power-Aware Computer Systems, 2004
  • IEEE Symposium on the Performance Analysis of Systems and Software, ISPASS 2001
  • 4th International Symposium on High Performance Computing, (ISHPC-IV), 2002.
  • Sub-chair on Architecture, Intl. Conference on Parallel Processing, 2002.
  • Workshop on Power-Aware Computer Systems, 2001.
  • Workshop on Power-Aware Computer Systems 2000.









Description: Description: Y:\public_www\logo.gif