|Date||May 2, 2008|
|Speaker||Nadya Bliss (MIT - Lincoln Lab.)|
|Topic||SMaRT: Sparse Mapping and Routing Toolbox|
Over the last few years, MIT Lincoln Laboratory has been developing techniques for automatic algorithm-to-architecture mapping of signal processing applications. Due to the nature of signal processing operations, modeling, program analysis, and mapping techniques have been limited to dense matrix computations. As more post-processing algorithms move to the sensor front end, efficient parallel implementations of those algorithms for sparse data are becoming necessary. Post-processing algorithms are typically represented in graphical form, but can also be formulated using sparse matrix notation. The efficiency of the graph algorithms is often only a small fraction (< 0.01) of the peak throughput of a conventional architecture, and the efficiency deteriorates as the problem size increases. Furthermore, graph algorithms are not well suited for parallelization. Sparse matrix formulations expose intrinsic algorithm parallelism; however, efficient mappings are still needed to exploit the parallelism and deliver desirable efficiency.
This talk will motivate the need for efficient sparse matrix processing with an example from the graph algorithm community and highlight a key computational kernel: sparse matrix-matrix multiply. We will first introduce a high-level overview of our dense mapping framework and highlight its limitations with regards to parallelization of sparse computations. We will then show how SMaRT overcomes these limitations using fine-grain architecture modeling, dependency analysis, mapping and routing techniques. Finally, we will present results for a number of implementations of the matrix multiply kernel. Our results show over an order of magnitude performance improvement over traditional sparse 2D block-cyclic mappings. This performance improvement can be attributed to detailed runtime analysis of the computation, sensitivity to the sparsity of the data, co-optimized map and route determination, and the detailed hardware modeling capability.
This is joint work with Una-May O'Reilly at CSAIL and Sanjeev Mohindra at Lincoln Laboratory.
This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government
We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.