Initial Software

There's typically a fairly large software stack required to run a cluster. We need a job manager to schedule and manage jobs, some extra software for general cluster management, libraries, and assorted applications. The specific software you use will depend on your personal preferences, and the users of your cluster. For starters, here are some software packages that we will use:

  • pdsh: a parallel distributed shell for running commands in parallel across multiple nodes
  • Modules: a system for managing software in user environments
  • SLURM: the job scheduler for our cluster
  • munge: a fast authentication module for compute nodes (required for SLURM)
  • Open MPI: A Message Passing Interface library used by nearly all parallel programs

Before we can start building software, we have to decide where to install our software. We can install all the software on a common shared filesystem, which makes management and maintenance a lot easier, and ensures that every node is running the same software stack. However, with a shared file system, when the file server goes down, every node in the cluster loses all its software and libraries.

The alternative is to install the software on each individual node, but then we would have to update every node whenever we updated a package or installed new software. This quickly becomes unwieldy as the number of nodes grows. The advantage to the this approach is that we're not dependent on a file server to run jobs. If your software is installed locally and your file server goes away, every job in your cluster keeps running. It also means you don't need to set up a file server, or an NFS shared file system. This may be easier in the short-term if you just want to get something up and running quickly, but keep in mind that it will be a lot more work further down the road.

Using a shared file system is still the preferred way of managing the shared software, and I recommend you use this approach.

For starters, let's set up an NFS file system that our compute nodes can mount to access the common software. Since we're trying to build a "pure Pi" cluster, I'll create a new directory on our head node and export it using NFS. Unfortunately there are no real standards for naming this directory. Typically, /opt is used for optional applications, but sometimes a commercial (licensed) package is installed under that path and not suitable for export to other nodes. Traditionally, /usr/local was used for local software installations, but since it's intended for software and data local to the node, we'll avoid that as well. The Modules package will manage the software paths for us anyway, so users don't need to remember where software is installed. Just to make sure we don't use a path that's already claimed by someone or something else, let's use a mostly arbitrary name that still conveys the purpose of the file system, and call it /apps.

Our JeOS install doesn't include NFS services by default, so let's start by installing that. From the command line, we can install the NFS server and related software like this:

baker:~ # zypper install nfs-kernel-server
Retrieving repository 'openSUSE-Ports-Leap-15.0-Update' metadata .........[done]
Building repository 'openSUSE-Ports-Leap-15.0-Update' cache ..............[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 6 NEW packages are going to be installed:
  keyutils nfs-client nfsidmap nfs-kernel-server rpcbind system-user-nobody

6 new packages to install.
Overall download size: 592.1 KiB. Already cached: 0 B. After the operation,
additional 2.2 MiB will be used.
Continue? [y/n/...? shows all options] (y): y
Retrieving package keyutils-1.5.10-lp150.3.3.aarch64
                                           (1/6),  83.5 KiB (256.8 KiB unpacked)
Retrieving: keyutils-1.5.10-lp150.3.3.aarch64.rpm ........................[done]
Retrieving package nfsidmap-0.26-lp150.1.3.aarch64
                                           (2/6),  42.2 KiB (275.4 KiB unpacked)
Retrieving: nfsidmap-0.26-lp150.1.3.aarch64.rpm ............[done (161.9 KiB/s)]
Retrieving package system-user-nobody-20170617-lp150.3.4.noarch
                                           (3/6),  10.4 KiB (  116   B unpacked)
Retrieving: system-user-nobody-20170617-lp150.3.4.noarch.rpm .............[done]
Retrieving package rpcbind-0.2.3-lp150.2.2.aarch64
                                           (4/6),  66.9 KiB (216.6 KiB unpacked)
Retrieving: rpcbind-0.2.3-lp150.2.2.aarch64.rpm ..........................[done]
Retrieving package nfs-client-2.1.1-lp150.4.3.1.aarch64
                                           (5/6), 260.5 KiB (  1.1 MiB unpacked)
Retrieving: nfs-client-2.1.1-lp150.4.3.1.aarch64.rpm ........[done (45.6 KiB/s)]
Retrieving package nfs-kernel-server-2.1.1-lp150.4.3.1.aarch64
                                           (6/6), 128.5 KiB (359.7 KiB unpacked)
Retrieving: nfs-kernel-server-2.1.1-lp150.4.3.1.aarch64.rpm ..............[done]
Checking for file conflicts: .............................................[done]
(1/6) Installing: keyutils-1.5.10-lp150.3.3.aarch64 ......................[done]
(2/6) Installing: nfsidmap-0.26-lp150.1.3.aarch64 ........................[done]
(3/6) Installing: system-user-nobody-20170617-lp150.3.4.noarch ...........[done]
Additional rpm output:
groupadd -r -g 65533 nogroup
groupadd -r -g 65534 nobody
useradd -r -s /sbin/nologin -c "nobody" -g nobody -d /var/lib/nobody -u 65534 nobody
usermod -a -G nogroup nobody

(4/6) Installing: rpcbind-0.2.3-lp150.2.2.aarch64 ........................[done]
Additional rpm output:
Updating /etc/sysconfig/rpcbind ...

(5/6) Installing: nfs-client-2.1.1-lp150.4.3.1.aarch64 ...................[done]
Additional rpm output:
Updating /etc/sysconfig/nfs ...
setting /sbin/mount.nfs to root:root 4755. (wrong permissions 0755)

(6/6) Installing: nfs-kernel-server-2.1.1-lp150.4.3.1.aarch64 ............[done]
baker:~ #

You'll note that the software install of the NFS server also installed the NFS client. We'll need the client later when we move the NFS file system to a different server. We'll leave the NFS server software on the head node for now so we can build a "pure Pi" cluster, but in the long term, we don't want to add load to the head node by running a file server on it.

Now let's create the file system for sharing our software and export it so we can access it remotely. Initially, we're going to be a bit lazy in how we set this up and ignore some security and performance settings. Right now, we just want to get things up and running, and we'll come back and clean things up a bit later.

baker:~ # mkdir /apps
baker:~ # echo "/apps,root_squash,no_subtree_check)" >> /etc/exports
baker:~ # systemctl enable nfs-server
Created symlink /etc/systemd/system/ → /usr/lib/systemd/system/nfs-server.service.
baker:~ # systemctl start nfs-server
baker:~ # showmount -e
Export list for
baker:~ #

What we've done above is create a directory to export and add it to the NFS exports list. We export the file system to any machine on the subnet, so any host with and address of 192.168.0.x will be able to see and mount the file system. This gives us a bit (but only a bit) of security, since this is a non-routable address range and we can presumably control which host connects to our network. We also export with the root_squash option explicitly set, so no files will appear to be owned by root and no host will be able to write files to the file system as root. Again, just a bit of security. After that, we set the NFS server to start on reboot, and then start the NFS server. The final command, showmount -e will show all exported file systems from a particular server, in this case, the server we are logged into. We can see that /apps is exported, so everything looks good so far.

We have one more task before we can start building software. Up to this point, we've been performing all our configuration as the root user. We're going to create a regular, non-privileged user to build the software. It's a general good practice to run as root as little as possible to prevent catastrophic accidents, and it will also protect us a bit from potentially harmful software.

Let's create an account called admin to download and build our software. We'll also create a subdirectory under /apps to hold our source code and the build trees.

baker:~ # useradd -m admin
Group 'mail' not found. Creating the user mailbox file with 0600 mode.
baker:~ # ls -a ~admin
.   .bash_history  bin      .emacs  .inputrc  .profile
..  .bashrc        .config  .fonts  .local
baker:~ # mkdir /apps/source
baker:~ # chown admin /apps/source
baker:~ #

The -m flag to the useradd command creates a home directory for the user. You can see that the system sets up some default files for us when it creates the home directory. Now we can set a password and log in as the admin user.

baker:~ # passwd admin
New password: 
Retype new password: 
passwd: password updated successfully
baker:~ # logout
Connection to closed.
[227]shikoku::gpike> ssh -l admin

Now we can start downloading and building software. Software typically develops fairly quickly, so the version numbers may change from what I download. You should usually grab the latest version of any software package you're building unless there's a particular reason you need an older version. As your cluster develops over time, you'll need to upgrade software regularly, but you'll usually need to keep older versions around as well, for software that depends on them. This is where the Modules software comes in handy for managing the different versions. I'm going to start by downloading all the software packages I listed at the top of this page.

admin@baker:~> cd /apps/source
admin@baker:/apps/source> wget
--2018-10-21 03:43:00--
Resolving (,
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2018-10-21 03:43:02--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 588869 (575K) [application/octet-stream]
Saving to: ‘pdsh-2.33.tar.gz’

pdsh-2.33.tar.gz    100%[===================>] 575.07K  2.41MB/s    in 0.2s    

2018-10-21 03:43:04 (2.41 MB/s) - ‘pdsh-2.33.tar.gz’ saved [588869/588869]

admin@baker:/apps/source> wget
--2018-10-21 03:45:56--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2018-10-21 03:45:58--
Resolving (, 2607:f748:10:12::5f:2
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1477102 (1.4M) [application/x-gzip]
Saving to: ‘modules-4.2.0.tar.gz’

modules-4.2.0.tar.g 100%[===================>]   1.41M  2.68MB/s    in 0.5s    

2018-10-21 03:46:01 (2.68 MB/s) - ‘modules-4.2.0.tar.gz’ saved [1477102/1477102]

admin@baker:/apps/source> git clone
Cloning into 'slurm'...
remote: Enumerating objects: 153, done.
remote: Counting objects: 100% (153/153), done.
remote: Compressing objects: 100% (100/100), done.
remote: Total 471286 (delta 77), reused 84 (delta 53), pack-reused 471133
Receiving objects: 100% (471286/471286), 196.16 MiB | 2.64 MiB/s, done.
Resolving deltas: 100% (372149/372149), done.
Checking out files: 100% (2505/2505), done.
admin@baker:/apps/source> wget
--2018-10-21 04:04:11--
Resolving (,
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2018-10-21 04:04:13--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 389952 (381K) [application/octet-stream]
Saving to: ‘munge-0.5.13.tar.xz’

munge-0.5.13.tar.xz 100%[===================>] 380.81K  1.72MB/s    in 0.2s    

2018-10-21 04:04:22 (1.72 MB/s) - ‘munge-0.5.13.tar.xz’ saved [389952/389952]

admin@baker:/apps/source> wget
--2018-10-21 04:06:34--
Resolving (,,, ...
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9461841 (9.0M) [binary/octet-stream]
Saving to: ‘openmpi-3.1.2.tar.bz2’

openmpi-3.1.2.tar.b 100%[===================>]   9.02M  6.31MB/s    in 1.4s    

2018-10-21 04:06:38 (6.31 MB/s) - ‘openmpi-3.1.2.tar.bz2’ saved [9461841/9461841]

admin@baker:/apps/source> ls -aCF
./   modules-4.2.0.tar.gz  openmpi-3.1.2.tar.bz2  slurm/
../  munge-0.5.13.tar.xz   pdsh-2.33.tar.gz

That's a lot of output for downloading a few packages. I've listed the contents of the directory as the last step, and there are a few things to note. The first is that our source code for SLURM has no version number. This is common for source code cloned from GitHub. When you clone the source code tree, you get the latest code available at the time. Source code is continually updated, so it's not easy to track a specific version number. The second thing to note is the rest of our source code is compressed with various compression tools. Fortunately, even the minimal software image we used has these, so we're okay.

Before we begin, let's see if we can figure out which version of SLURM we're running. You can usually find the base version number in the release notes, README file, or sometimes in a special file called .version in the source code directory. A quick look at the top of the RELEASE_NOTES file shows that the base version is 19.05. Let's rename the SLURM directory to reflect the version number.

admin@baker:/apps/source> head slurm/RELEASE_NOTES 
16 August 2018

If using the slurmdbd (Slurm DataBase Daemon) you must update this first.

NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.

The 19.05 slurmdbd will work with Slurm daemons of version 17.11 and above.
admin@baker:/apps/source> mv slurm slurm-19.05

For consistency, let's make an archive of the SLURM source code as it is, so we can recreate a fresh copy if needed. That way, we'll have a fresh, unconfigured version of every software package, as well as a source code tree where we will configure and build the software. Keep in mind that our version of SLURM is very likely a little newer than version 19.05, though, since we cloned it from the development repository and it probably has patches beyond the "official" version 19.05 release.

admin@baker:/apps/source> tar -czf slurm-19.05.tar.gz slurm-19.05
admin@baker:/apps/source> ls -l
total 229304
-rw-r--r--  1 admin users   1477102 Oct 18 04:59 modules-4.2.0.tar.gz
-rw-r--r--  1 admin users    389952 Sep 26  2017 munge-0.5.13.tar.xz
-rw-r--r--  1 admin users   9461841 Aug 22 15:17 openmpi-3.1.2.tar.bz2
-rw-r--r--  1 admin users    588869 Jun 29  2017 pdsh-2.33.tar.gz
drwxr-xr-x 10 admin users      4096 Oct 21 04:01 slurm-19.05
-rw-r--r--  1 admin users 222856199 Oct 21 04:24 slurm-19.05.tar.z

Now let's unpack all the other software distributions so we can start configuring and building them.

admin@baker:/apps/source> tar -xzf modules-4.2.0.tar.gz 
admin@baker:/apps/source> xzcat munge-0.5.13.tar.xz | tar -xf -
admin@baker:/apps/source> bzcat openmpi-3.1.2.tar.bz2 | tar -xf -
admin@baker:/apps/source> tar -xzf pdsh-2.33.tar.gz 
admin@baker:/apps/source> ls -l
total 230504
drwxr-xr-x  8 admin users      4096 Oct 18 04:23 modules-4.2.0
-rw-r--r--  1 admin users   1477102 Oct 18 04:59 modules-4.2.0.tar.gz
drwxr-xr-x  6 admin users      4096 Sep 26  2017 munge-0.5.13
-rw-r--r--  1 admin users    389952 Sep 26  2017 munge-0.5.13.tar.xz
drwxr-xr-x 10 admin users      4096 Aug 22 15:02 openmpi-3.1.2
-rw-r--r--  1 admin users   9461841 Aug 22 15:17 openmpi-3.1.2.tar.bz2
drwxr-xr-x  7 admin users      4096 Jun 29  2017 pdsh-2.33
-rw-r--r--  1 admin users    588869 Jun 29  2017 pdsh-2.33.tar.gz
drwxr-xr-x 10 admin users      4096 Oct 21 04:01 slurm-19.05
-rw-r--r--  1 admin users 222856199 Oct 21 04:24 slurm-19.05.tar.z

That's exactly how I want the source directory to look. Let's move on and start building the software.