COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)

Date Mar. 14, 2014
Speaker Morris Jette (CTO, SchedMD LLC)
Topic Slurm Workload Manager
Abstract: Slurm is an open-source, fault-tolerant and highly scalable workload management framework. Slurm includes an extensive suite of plugins to support a wide range of architectures and use cases ranging from managing the processes and cores on a single microprocessor to managing the workload on many of the largest computers in the world. Some of Slurm's advanced features include resource allocations optimized for network topology, gang scheduling (time-slicing of parallel jobs), hot-spare resources for failure management, energy management, and the ability to re-size running job. An overview of Slurm's architecture and capabilities will be presented along with future development plans to satisfy the needs of exascale computing.

Archives

Acknowledgements

We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.

MIT Math CSAIL EAPS Lincoln Lab Harvard Astronomy

Accessibility