|
ResearchUntil recently we have been able to use the increasing hundreds of millions of transistors available on an integrated circuit chip to build ever more powerful processors capable of improving the performance of any program without extra effort from the programmer. However, we have recently reached limitations which leave only two possibilities for continuing to exploit increasing on-chip transistor resources: multicore processors, where multiple processors are incorporated on a single chip, and custom hardware accelerators, where a chip organized as a Field Programmable Gate Array (FPGA, a form of programmable integrated circuit) can be programmed to implement any form of hardware. For both multicores and FPGAs, the key to improving the performance of a single application is parallelism; however, to exploit parallelism requires that the programmer transform their programs: by introducing parallel threads and synchronization to target multicores, or by partitioning the software into hardware description language (HDL) to target FPGAs. The difficulty is that the vast majority of programmers are trained only in sequential programming, and do not understand the pitfalls of threaded programming or the subtleties of hardware design. The goal of this research program is to ease the extraction of parallelism from sequential programs so that non-expert programmers can exploit parallel hardware such as multicores and FPGAs, enabling the software industry to continue to enjoy dramatic improvements in computer system performance.PaCRaT: internal page (restricted access) Exploiting Multicores via Dynamic ParallelizationAs multicore processors become ubiquitous, we are faced with the challenge of providing access to this raw parallelism to average programmers. However, the current approach of requiring the programmer to write explicit hand-optimized parallel programs (EHOPPs) is far from a good solution for several reasons. First, EHOPPs obfuscate the underlying algorithms, making software maintenance and support extremely difficult. Second, EHOPPs may not scale easily to larger numbers of processors or port well to new multicore architectures. Finally, EHOPPs are too challenging for average programmers to develop, and expert parallel programmers are in short supply. This parallel programming gap obviates the need for automatic or mostly-automatic parallelization and optimization. Previous work on optimistic parallelization is a step in the right direction, freeing the compiler from worrying about correctness while attempting to exploit parallelism.The next step is to develop new parallel programming systems that allow the programmer to easily obviate the parallelism in algorithms using simple extensions to existing programming languages. There are three key parts to such a system. First, we need compilers that can perform deep analyses (such as probabilistic pointer analysis that can help identify potential parallelism, and also alert the programmer to any roadblocks to parallelism so that ambiguity, unnecessary restrictions, and indirection can be reduced. The compiler should also provide more information about the program beyond that available in a conventional binary. Second, we require a runtime system that can match the parallelism available in a program to the specific granularity and scale of the target parallel hardware. In particular, such a runtime system should help discover and implement many forms of parallelism (e.g., do-all, pipeline, and optimistic parallelism), optimize data locality, granularity, and synchronization, and provide fault tolerance and load balancing as necessary. Finally, in addition to supporting new modes of execution such as TM and TLS, hardware should provide feedback to the runtime system to enable a search of task and data distributions, parallelization techniques, and limiting speculation. With such a system, the programmer is concerned only with specifying the algorithm and limiting ambiguity, while the system optimizes performance.
Current Projects:
Soft SystemsFPGAs are increasingly being used instead of ASICs in embedded systems and more recently as co-processors high-performance systems. For FPGAs, as with multicores, the key to improving the performance of a single application is parallelism. However, to exploit parallelism requires that the programmer transform their programs by partitioning the software into {\em hardware description language} (HDL) to target FPGAs. The difficulty is that the vast majority of programmers are trained only in sequential programming, and do not understand the subtleties of hardware design. Building on our initial work on soft processor architecture, our goal is to empower sequential programmers to exploit FPGAs through automatically generated systems which fully exploit available parallelism, and are customized to match application requirements. For example, we have begun to investigate the potential for a FPGA-based vector processor, where vector dimensions and capabilities are customized to match those of the application. Such an architecture can allow a programmer to enjoy the data parallelism provided by an FPGA, while the underlying system is easily programmable and can scale to match the resources of larger FPGA devices. We are also exploring FPGA-based multiprocessors, in particular for network packet processing and to accelerate real-time ray-tracing. In both cases our goal is to create a development environment where programmers who are experts in neither hardware design nor parallel programming can create applications that will scale well and efficiently exploit the resources of the FPGAs. The key will be to capitalize on the ability of soft processors and systems to match the memory systems, processing, communication, and synchronization needs of the applications.(with Jonathan Rose)
Current Projects:
Past Projects:
|