COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)
Date | Mar. 14, 2014 |
---|---|
Speaker | Morris Jette (CTO, SchedMD LLC) |
Topic | Slurm Workload Manager |
Abstract: | Slurm is an open-source, fault-tolerant and highly scalable workload management framework. Slurm includes an extensive suite of plugins to support a wide range of architectures and use cases ranging from managing the processes and cores on a single microprocessor to managing the workload on many of the largest computers in the world. Some of Slurm's advanced features include resource allocations optimized for network topology, gang scheduling (time-slicing of parallel jobs), hot-spare resources for failure management, energy management, and the ability to re-size running job. An overview of Slurm's architecture and capabilities will be presented along with future development plans to satisfy the needs of exascale computing. |
Archives
Acknowledgements
We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.