|
257 Computer and Systems Research Laboratory University of Illinois at Urbana-Champaign 1308 W Main St Urbana, IL 61801-2307, USA |
Office: 217-244-5929 Mobile: 217-621-3995 mif@illinois.edu http://ipa.ece.illinois.edu |
|
Computer system architecture, concurrent microarchitecture, automatic parallelization, runtime systems and compilers for parallel and embedded computing. |
| PhD, Computer Science, Massachusetts Institute of Technology | May, 2003 | |
| SM, Computer Science, Massachusetts Institute of Technology | January, 1997 | |
| BS, Computer Science and Mathematics, University of Wisconsin, Madison | May, 1994 |
| National Science Foundation CAREER Award | 2008 | |
| National Science Foundation Graduate Research Fellowship | 1994-1997 |
| Assistant Professor of Electrical and Computer Engineering, Research Assistant Professor in the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL | June 2003-Present | |
| Visiting Lecturer, University of Illinois at Urbana-Champaign, Urbana, IL | August 2002-May 2003 | |
| Research Assistant, Massachusetts Institute of Technology | 1994-2002 | |
| Undergraduate Research Assistant, University of Wisconsin, Madison | 1992-1994 | |
| Software Engineer, Software Publishing Corporation, Madison, WI | 1990-1992 | |
| Independent Consultant, Madison, WI | 1988-1989 | |
| Database Applications Programmer, SoftCraft, Inc., Madison, WI | 1986-1987 |
PolyFlow is a dynamic dataflow processing system that automatically and dynamically converts sequential binaries into multithreaded code. One of the primary insights we've gained from the experience is that such systems are most effective when they create threads to maximize what we call "branch mispredict level parallelism," or BLP. Because each thread of control has its own program counter, branch mispredicts in one thread don't necessarily stall fetch and execution in other threads. On the other hand, we were surprised to discover that branch speculation within threads is at least as important as in the sequential domain. This is because branches require data (often data produced by another thread) in order to resolve. Branches that can be correctly predicted no longer depend on the latency of inter-thread data communication and synchronization.
I have been one of three advisors for the University of Illinois's ION (Illinois Observing Nanosat) Cubesat project, an interdisciplinary undergraduate design course sponsored by the College of Engineering. In Cubesat, undergraduate students build their own "nano-satellite." The first product of this class, the ION-1, was a 2-kilogram, 2- liter atmospheric observing satellite. It was delivered to the launch company, Kosmotras, in Fall 2005 and launched in Summer 2006. The design and fabrication of the satellite were accomplished entirely by students, mostly undergraduates with an occasional Master's student. I supervised the students designing the operating system and control software.
Based on our experience with testing and integrating ION-1, several students and I began the design of a general-purpose, reusable platform that allows modular integration of multiple sensors and actuators. In the Fall 2006 semester I guided a dozen undergraduates through the design and fabrication of a 10 x 14cm "motherboard" that included several Microchip PIC processors that drove both an I2C bus and a 9600-baud digital communication system. We also developed a software protocol so that future Cubesat students can easily design instruments that will plug directly into the bus to leverage the communication and storage facilities of the motherboard. During the Spring 2007 semester we built infrastructure for and launched a simple weather balloon as a platform for testing student designs in preparation for the future launch of ION-2.
I was one of the lead students (with Mike Taylor, Jon Babb and Walter Lee) on the Raw Microprocessor project from 1996-2002. Raw was one of the first general purpose multicore processor chips. It had 16 32-bit cores, each with 64Kbytes of cache. I specifically contributed to the design of the compilers, the memory system and the deadlock management system in the interconnect. I also supervised several projects researching memory system micro-optimization, including a software-based instruction cache management system and FlexCache, a method of improving data cache efficiency by using the compiler to eliminate unneeded cache tag checks.
I assisted in the design and implementation of the MIT Fugu scalable shared memory multiprocessor system. I implemented and maintained portions of the operating system and system simulator and contributed to the design and evaluation of scheduling algorithms to maximize performance of interactive jobs.
I designed and wrote the initial version of a flexible and detailed microarchitectural simulator toolkit that has served as my group's research infrastructure for the past five years. The toolkit has been the primary experimental vehicle for two PhD theses and over a dozen masters theses in addition to scores of semester projects in the graduate architecture and microarchitecture classes at University of Illinois since 2003.
The toolkit simulates a 64-bit MIPS-like ISA, supports simulation at several levels of detail and can checkpoint machine state. The toolkit's primary design goal was instructional. Simulated instructions flow through the software modules in the order that they would flow through the microarchitectural pipeline stages. This makes it easier for students to understand the simulator/microarchitecture mapping. To identify bugs quickly, the simulator executes instructions out-of-order (in the order they would execute on the microarchitecture being simulated) and checks instruction results against a dynamically generated trace as instructions retire. At its highest level of detail the toolkit is still relatively fast, able to simulate a two-wide out-of-order processor at about 100K cycles/second.
I wrote and still maintain a small, not-quite standard, C Library that is notable for its support of machines that have only 32-bit floating point hardware (common in academic microprocessor design projects). Parts are derived from Cygnus Newlib (an open source C Standard Lib for embedded systems), but most of the code is hand-rolled, in particular the code for scanning and printing floating point strings. It also includes an efficient best-fit malloc implementation. This was the C Library used on the Raw Microprocessor at MIT, for the Polyflow project at Illinois and now for the 10 TFlop Rigel accelerator chip being designed at Illinois.
I designed and built the SUDS Runtime system, a complete software transactional memory system that ran on the Raw multicore. SUDS was designed to enable speculative loop transformations. The parallelism enabled by those transformations often provided speedups despite a relatively large software overhead of about 20 machine cycles per speculative load operation.
The SUDS compiler, built as a set of modules on the Stanford SUIF 1.3 compiler system, performed parallelism enhancing loop transformations. By performing loop distribution, scalar expansion and reassociation on a program dependence graph representation of the program the compiler could find and expose parallelism in a wide variety of loops, including loops that worked on sparse data structures or contained complex internal control.
I was also the primary local maintainer of version 1.3b of the SUIF compiler infrastructure for a group of about 12 students and faculty at MIT after Stanford stopped supporting version 1 to work on the (abortive) SUIF version 2. I completely rewrote the build system and fixed dozens of bugs in the core libraries.
As a teaching assistant I designed and implemented the semester project for the Spring 1999 offering of MIT's Laboratory in Software Engineering, a junior level course. Gizmoball was a pinball game that included an editor that allowed the user to place bumpers and flippers, and connect flippers to triggers to create various "Rube Goldberg" contraptions. I prototyped the project, wrote the assignment (basically a specification) and implemented, documented, and supported a physics library that used spatial hashing to perform collision detection and then calculated the forces from impacts. The project was reused, with variations, every semester for the following four years.
Worked with a team of Chemical Engineering Students implementing parallel N-body codes on the Thinking Machines CM-5 at the University of Wisconsin. In addition to writing application code I was responsible for debugging and maintaining the local installation of the experimental Berkeley Split-C programming tools.
I designed and implemented testing and debugging tools for the team developing Software Publishing Corporation's Harvard Draw program. Among other projects I implemented a straight-forward, but effective, memory leak detector.
I was responsible for designing and maintaining all the information management systems at SoftCraft, a small (40 employee) company in Madison Wisconsin.
Mayank Agarwal; Nitin Navale; Kshitiz Malik; Matthew I. Frank; Fetch Criticality Reduction for Control Independence, Int'l Symp Computer Architecture, (ISCA-35), June, 2008. | |
Kshitiz Malik; Mayank Agarwal; Sam S. Stone; Kevin M. Woley; Matthew I. Frank; Branch-mispredict Level Parallelism (BLP) for Control Independence, Int'l Symp High-Performance Comp Arch, (HPCA-14), February, 2008. | |
Kshitiz Malik; Mayank Agarwal; Vikram Dhar; Matthew I. Frank; PaCo: Probability-based Path Confidence Prediction, Int'l Symp High-Performance Comp Arch, (HPCA-14), February, 2008. | |
Kshitiz Malik; Mayank Agarwal; Matthew I. Frank; Adaptive Memory Synchronization (AMS): Balancing the Risks and Benefits of Inter-thread Load Speculation, Second Annual Reconfigurable and Adaptive Architecture Workshop (RAAW-2), December, 2007. | |
Wen-Mei W. Hwu; Shane Ryoo; Sain-Zee Ueng; John H. Kelm; Issac Gelado; Sam S. Stone; Robert E. Kidd; Sara Sadeghi Baghsorkhi; Aqeel A. Mahesri; Stephanie Tsao; Nacho Navarro; Steve S. Lumetta; Matthew I. Frank; Sanjay J. Patel: Implicitly Parallel Programming Models for Thousand-Core Microprocessors, Design Automation Conference, (DAC-44), 2007. | |
Mayank Agarwal; Kshitiz Malik; Kevin M. Woley; Sam S. Stone; Matthew I. Frank: Exploiting Postdominance for Speculative Parallelization, Int'l Symp High-Performance Computer Architecture, (HPCA-13):295-305, Feb, 2007. | |
Shane Ryoo; Sain-Zee Ueng; Christopher I. Rodrigues; Robert E. Kidd; Matthew I. Frank; Wen-mei W. Hwu: Automatic Discovery of Coarse-Grained Parallelism in Media Applications, Trans. High-Performance Embedded Architectures and Compilers, 1(3), 2006. | |
Sam S. Stone; Kevin M. Woley; Matthew I. Frank: Address-Indexed Memory Disambiguation and Store-to-Load Forwarding, Int'l Symp Microarchitecture, (MICRO-38):171-182, Nov, 2005. | |
Michael Bedford Taylor; Walter Lee; Jason Miller; David Wentzlaff; Ian Bratt; Ben Greenwald; Henry Hoffmann; Paul Johnson; Jason Kim; James Psota; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Matthew I. Frank; Saman Amarasinghe; Anant Agarwal: Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Int'l Symp Computer Architecture, (ISCA-31):2-13, Jun, 2004. | |
Michael Bedford Taylor; Jason Kim; Jason Miller; David Wentzlaff; Fae Ghodrat; Ben Greenwald; Henry Hoffman; Paul Johnson; Jae-Wook Lee; Walter Lee; Albert Ma; Arvind Saraf; Mark Seneski; Nathan Shnidman; Volker Strumpen; Matthew I. Frank; Saman P. Amarasinghe; Anant Agarwal: The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, 22(2):25-35, 2002. | |
Csaba Andras Moritz; Matthew I. Frank: LoGPC: Modeling Network Contention in Message-Passing Programs, IEEE Trans. Parallel and Distributed Systems, 12(4), 2001. | |
Csaba Andras Moritz; Matthew Frank; Saman Amarasinghe: FlexCache: A Framework for Flexible Compiler Generated Data Caching, 2nd Workshop on Intelligent Memory Systems, Springer-Verlag Lecture Notes in Computer Science, 2107:135-146, Nov, 2000. | |
Jonathan Babb; Martin C. Rinard; Csaba Andras Moritz; Walter Lee; Matthew I. Frank; Rajeev Barua; Saman P. Amarasinghe: Parallelizing Applications Into Silicon, Field-Programmable Custom Computing Machines, (FCCM-7):70-80, Apr, 1999. | |
Walter Lee; Rajeev Barua; Matthew I. Frank; Devabhaktuni Srikrishna; Jonathan Babb; Vivek Sarkar; Saman P. Amarasinghe: Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine, Architectural Support for Programming Languages and Operating Systems, (ASPLOS-VIII):46-57, Oct, 1998. | |
Csaba Andras Moritz; Matthew I. Frank: LoGPC: Modeling Network Contention in Message-Passing Programs, SIGMETRICS 1998:254-263. | |
Kenneth Mackenzie; John Kubiatowicz; Matthew I. Frank; Walter Lee; Victor Lee; Anant Agarwal; M. Frans Kaashoek: Exploiting Two-Case Delivery for Fast Protected Messaging, Int'l Symp High-Performance Computer Architecture, (HPCA-4):231-242, 1998. | |
Elliot Waingold; Michael Taylor; Devabhaktuni Srikrishna; Vivek Sarkar; Walter Lee; Victor Lee; Jang Kim; Matthew I. Frank; Peter Finch; Rajeev Barua; Jonathan Babb; Saman Amarasinghe; Anant Agarwal: Baring It All to Software: Raw Machines, IEEE Computer, 30(9):86-93, Sep, 1997. | |
Matthew I. Frank; Anant Agarwal; Mary K. Vernon: LoPC: Modeling Contention in Parallel Algorithms, Principles and Practice of Parallel Programming, (PPoPP-6):276-287, 1997. | |
Walter Lee; Matthew I. Frank; Victor Lee; Kenneth Mackenzie; Larry Rudolph: Implications of I/O for Gang Scheduled Workloads, Job Scheduling Strategies for Parallel Processing, Springer-Verlag, Lecture Notes in Computer Science 1291:215-237, 1997. | |
Jonathan Babb; Matthew I. Frank; Victor Lee; Elliot Waingold; Rajeev Barua; Michael Bedford Taylor; Jang Kim; Devabhaktuni Srikrishna; Anant Agarwal: The RAW Benchmark Suite: Computation Structures for General Purpose Computing, Field-Programmable Custom Computing Machines, (FCCM-5):134-144, 1997. | |
Jonathan Babb; Matthew I. Frank; Anant Agarwal: Solving Graph Problems with Dynamic Computation Structures, Conf on Reconfigurable Technology for Rapid Product Development and Computing, at PhotonicsEast 96, November 1996. | |
Frank Traenkle; Matthew I. Frank; Mary K. Vernon; Sangtae Kim: Solving Microstructure Electrostatics with MIMD Parallel Supercomputers and Split-C, Journal of Non-Newtonian Fluid Mechanics, 53:197-213, 1994. | |
Matthew I. Frank; Mary K. Vernon: A Hybrid Shared Memory/Message Passing Parallel Machine, Int'l Conf on Parallel Processing, Vol I, pp. 232-236, 1993. |
Sam S. Stone; Matthew I. Frank: Forwarding Cache: Eliminating Address Multiversioning in the Store Queue, submitted for review to ACM Transactions on Architecture and Compiler Optimizations, April, 2008. | |
Matthew I. Frank: System Support for Implicitly Parallel Programming, University of Illinois Center for Reliable and High-Performance Computing Technical Report CRHC-07-06, October 8, 2007. | |
Sam S. Stone; Kevin M. Woley; Kshitiz Malik; Mayank Agarwal; Vikram Dhar; Matthew I. Frank: Synchronizing Store Sets (SSS): Balancing the Benefits and Risks of Inter-thread Load Speculation, University of Illinois Center for Reliable and High-Performance Computing Technical Report UILU-ENG-06-2221, Nov 17, 2006. | |
Kshitiz Malik; Kevin M. Woley; Sam S. Stone; Mayank Agarwal; Vikram Dhar; Matthew I. Frank: Confidence Based Out-of-Order Renaming for Speculatively Multithreaded Processors, University of Illinois Center for Reliable and High-Performance Computing Technical Report UILU-ENG-06-2208, June, 9, 2006. | |
Matthew I. Frank; Saman Amarasinghe: Scalar Queue Conversion: Dynamic Single Assignment For Concurrent Scheduling, University of Illinois Center for Reliable and High-Performance Computing Technical Report UILU-ENG-03-2215, August 2003. | |
Matthew I. Frank: SUDS: Automatic Parallelization for Raw Processors, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 23, 2003. | |
Matthew I. Frank; Walter Lee; Saman Amarasinghe: A Software Framework for Supporting General Purpose Applications on Raw Computation Fabrics, MIT-LCS Technical Memo MIT-LCS-TM-619, July 20, 2001. | |
Matthew I. Frank; Csaba Andras Moritz; Benjamin Greenwald; Saman Amarasinghe; Anant Agarwal: SUDS: Primitive Mechanisms for Memory Dependence Speculation, MIT/LCS Technical Memo MIT-LCS-TM-591, January 6, 1999. |
| Computer Systems Engineering | Fall 05, Spring 07, Fall 07 | |
| Digital Systems Laboratory | Spring 08 | |
| Computer Engineering I | Fall 02, Spring 05 | |
| Computer Architecture | Fall 03, Fall 04, Fall 06 | |
| Microarchitecture | Spring 06 | |
| Multithreaded Computer Architecture | Spring 04 | |
| Interdisciplinary Cubesat Laboratory | Fall 03-present |
Wei-Ping (Thomas) Soong, December 2003, Intel, Austin, TX; Thomas R. Novak, May 2004, AMD, Sunnyvale, CA; Daniel Hodges, December 2004, EDS, Dallas, TX; Benjamin J. Miller, January 2005, AMD, Boxborough, MA; Michael D. Tucknott, January 2005, Intel, Hillsboro, OR; Michael J. Dabrowski, May 2005, SpaceX; Kevin M. Woley, December 2005, Microsoft, Richmond, WA; Snehal Sanghavi, January 2006, AMD, Sunnyvale, CA; Kevin J. Stephano, May 2006, AMD, Austin, TX; Hing Lim Chan, May 2006; Kshitiz Malik, December 2006, (continued); Mayank Agarwal (CS), December 2006, (continued); Brandon K. Swamy, May 2007, AMD, Austin, TX; Christopher R. Burke, May 2007, AMD, Austin, TX; Sam S. Stone, July 2007, Harvard Law; Nitin S. Navale, December 2007, AMD, Sunnyvale, CA; Idan Lupinsky, Goldman Sachs, December 2007; Vikram Dhar, May 2008, NVIDIA, Santa Clara, CA; James S. Pike, May 2008, Microsoft, Redmond, WA; Lukasz R. Lempart, Riverbed, May 2008; Gene Wu, July 2008; Ali A. Hussain, Fall 2008 (expected); Nicholas R. Weaver, Spring 2009 (expected).
| Intel/Microsoft Universal Parallel Computing Research Center at University of Illinois (PIs: Marc Snir and Wen-mei Hwu, ~20 co-PIs) | ~$350,000 | March 2008-Feb 2013 | |
| National Science Foundation, CAREER Award: System Support for Implicitly Parallel Programming. | $400,000 | May 2008-Apr 2013 | |
| Microelectronics Advanced Research Corporation (MARCO), Gigascale Systems Research Center (PI: Jan Rabaey, UCB, ~40 co-PIs). | $252,500 | Jan 2006-Aug 2009 | |
| National Science Foundation, PolyFlow: An Architectural Model for Highly Concurrent Instruction Execution (1 co-PI). | $425,000 | Nov 2004-Nov 2007 |