Date  May 2, 2008 
Speaker  Nadya Bliss (MIT  Lincoln Lab.) 
Topic  SMaRT: Sparse Mapping and Routing Toolbox 
Abstract:  Over the last few years, MIT Lincoln Laboratory has been developing techniques for automatic algorithmtoarchitecture mapping of signal processing applications. Due to the nature of signal processing operations, modeling, program analysis, and mapping techniques have been limited to dense matrix computations. As more postprocessing algorithms move to the sensor front end, efficient parallel implementations of those algorithms for sparse data are becoming necessary. Postprocessing algorithms are typically represented in graphical form, but can also be formulated using sparse matrix notation. The efficiency of the graph algorithms is often only a small fraction (< 0.01) of the peak throughput of a conventional architecture, and the efficiency deteriorates as the problem size increases. Furthermore, graph algorithms are not well suited for parallelization. Sparse matrix formulations expose intrinsic algorithm parallelism; however, efficient mappings are still needed to exploit the parallelism and deliver desirable efficiency.
This talk will motivate the need for efficient sparse matrix processing with an example from the graph algorithm community and highlight a key computational kernel: sparse matrixmatrix multiply. We will first introduce a highlevel overview of our dense mapping framework and highlight its limitations with regards to parallelization of sparse computations. We will then show how SMaRT overcomes these limitations using finegrain architecture modeling, dependency analysis, mapping and routing techniques. Finally, we will present results for a number of implementations of the matrix multiply kernel. Our results show over an order of magnitude performance improvement over traditional sparse 2D blockcyclic mappings. This performance improvement can be attributed to detailed runtime analysis of the computation, sensitivity to the sparsity of the data, cooptimized map and route determination, and the detailed hardware modeling capability. This is joint work with UnaMay O'Reilly at CSAIL and Sanjeev Mohindra at Lincoln Laboratory. This work is sponsored by the Department of the Air Force under Air Force contract FA872105C0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government
