Job Schedulers

There are a myriad of job managers available for batch scheduling, and I'm only going to mention a few of the more common ones here. The notes I've written on the various schedulers are my own opinion, and you should treat them as such. Nevertheless, these days, there's honestly no good reason not to use Slurm for your cluster. Slurm is free to download and use, and has commercial support available if you need it (or if your management insists on paying for things that are free). It uses a modular architecture, so if you need something that isn't already available, it's not difficult to extend and customize for your own needs.


Arguably the Mother of All Job Schedulers, Portable Batch System (PBS) was originally designed for NASA in the early 1990s. Many later schedulers are direct descendants of the original PBS, either by using significant portions of the code, or at least by duplicating the concepts and functionality.

PBS has gone through many forks and merges over the years(PBS, OpenPBS, PBS Pro), and is still available today as PBS Professional. As of 2016, the commercial version of PBS Pro is available as open source, although its popularity has declined somewhat in recent years.


The combination of Maui and Torque used to be the de facto standard for job scheduling on research clusters. Maui is officially the scheduler, and Torque acts as a resource manager for the individual compute nodes, although Torque can be used as a full-fledged job manager as well, albeit with fairly limited scheduling capabilities.

In the mid 2000s, Cluster Resources, Inc. (now known as Adaptive Computing), stopped accepting and integrating changes into Maui, effectively halting any further development. As of 2018, Torque is no longer Open Source. Adaptive Computing is instead directing people toward its commercial scheduler, Moab. Nevertheless, there is still a small but loyal following for Maui and Torque.


Moab started as the commercial version of Maui, and the two products were very similar up until the early 2010s. Adaptive Computing started expanding its market share beyond academic and research markets, and into the commercial space. As part of that change in marketing, development on Moab picked up considerably. Even though the base product still shares some similarity to the final free version of Maui, it is considerably more advanced.


I'll say again, currently there's no good reason not to use Slurm. Slurm is a full-featured job manager with unique features that were developed directly from user input. From an administrative point of view, it's a dream to work with. From a user perspective, it's easy to use, easy to migrate to from other job managers, easy to monitor and debug jobs. Slurm has an active user community that contributes to the rapidly evolving code base. The code is modular, so if you're not happy with the way some portion of the code works, you can replace it or extend it easily. Slurm is probably to fastest growing job manager today.


In terms of similarity to the original PBS job manager, Grid Engine is probably the most different in terms of its structure, using a "token" system to assign weights to jobs, and then determining execution order based on highest weight. Originally developed in Germany shortly after the original PBS, it has a haphazard history of open source and commercial development, with owners such as Sun Microsystems, Oracle, and (currently) Univa Corporation. Although interest in Grid Engine waned considerably after the demise of Sun Microsystems, there is still a loyal base of users, and Grid Engine (of one flavor or another) has had some high-profile installations.


Platform LSF (Load Sharing Facility) is a commercial job manager that was acquired by IBM in 2012, and renamed to IBM Spectrum LSF. LSF has been around for a long time, but has struggled to keep up with modern advancements in job schedulers. In my opinion, LSF is still an over-priced, cumbersome, and temperamental job manager.


There are many other job managers, some of which are directed at specific markets. If you're in the market for a job manager, Slurm should be your first choice for evaluation. The other job managers listed above are also viable candidates (LSF excepted, of course).