Working with System Administrators

By request, I'm adding this section on how to work with your HPC System Administrators. As I've written about before, working with scientists can be a challenge, and requires a particular type of patience. As a scientist, working with your support staff such as Help Desk personnel and System Administrators also requires a certain amount of delicacy and appreciation. Don't underestimate the role that your support staff plays in advancing science.

Part of understanding the role of System Administrators is understanding the environment they work in. The IT department in your organization likely has a staff of people who handle desktop support (your first-line Help Desk), your networking, email, monitoring, DNS, DHCP, and lots of other acronyms you may not even have heard of. A computing cluster is like a microcosm of an IT department. It typically has its own dedicated Help Desk, email, DNS, DHCP, etc. and it's all handled by a very small number of people. In fact, it's not uncommon for a single person to handle all the responsibility alone. That's a lot of knowledge to master, and a lot of knobs and switches to be be responsible for, every day.

Speaking of days, your System Administrator is usually responsible for their job duties 24x7, 365 days per year. When things break, they regularly work until it's fixed. The Help Desk goes home at the end of the day. Your System Administrator may not have that luxury. Because these systems represent a very large investment for your organization, there is tremendous pressure to have it running at 100% all the time.

The job of your System Administrator is to be the local expert on HPC, and that also takes a lot of time. Leverage that expertise. Ask your System Administrator how to solve problems. They (should and usually do) have a lot of knowledge on the latest technologies. Just because your colleagues are using a particular GPU doesn't mean that a better model isn't available now, or that FPGAs might be a better solution to your particular problem. Approach your System Administrator with problems, not predetermined solutions.

Help your System Administrator help you. If there's an advisory panel available, either volunteer to be a member, or send a staff member who will take it seriously. If you add up the time required for a meeting in terms of the salaries and lost productivity of everyone there, meetings are exceptionally expensive. Use this time to explore new technologies that you may have heard about, and discuss how they might be integrated into your resource. This is a good opportunity to educate yourself on how the cluster is performing, problems that are being dealt with, emerging technologies, and future plans for expansion.