COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)
Date | Mar. 14, 2014 |
---|---|
Speaker | Morris Jette (CTO, SchedMD LLC) |
Topic | Slurm Workload Manager |
Abstract: | Slurm is an open-source, fault-tolerant and highly scalable workload management framework. Slurm includes an extensive suite of plugins to support a wide range of architectures and use cases ranging from managing the processes and cores on a single microprocessor to managing the workload on many of the largest computers in the world. Some of Slurm's advanced features include resource allocations optimized for network topology, gang scheduling (time-slicing of parallel jobs), hot-spare resources for failure management, energy management, and the ability to re-size running job. An overview of Slurm's architecture and capabilities will be presented along with future development plans to satisfy the needs of exascale computing. |
Archives
Acknowledgements
We thank the MIT Department of Mathematics, Student Chapter of SIAM, ORCD, and LLSC for their generous support of this series.