COMPUTATION RESEARCH in BOSTON and BEYOND (CRiBB)

DateMar. 14, 2014
Speaker Morris Jette (CTO, SchedMD LLC)
TopicSlurm Workload Manager
Abstract: Slurm is an open-source, fault-tolerant and highly scalable workload management framework. Slurm includes an extensive suite of plugins to support a wide range of architectures and use cases ranging from managing the processes and cores on a single microprocessor to managing the workload on many of the largest computers in the world. Some of Slurm's advanced features include resource allocations optimized for network topology, gang scheduling (time-slicing of parallel jobs), hot-spare resources for failure management, energy management, and the ability to re-size running job. An overview of Slurm's architecture and capabilities will be presented along with future development plans to satisfy the needs of exascale computing.



back