Introduction to High Performance Computing

West Virginia University

July 24,26,31 and August 2nd, 2024

9:00 - 16:00

Instructors: Guillermo Avendano-Franco

Helpers: Daniel Turpen, Irene Nelson, Jared Frick

General Information

WVU Research Computing is a team part of "WVU Research Office" with the mission of promoting High-Performance Computing (HPC) in research across all domains where WVU is actively conducting research. WVU Research Computing manages several research infrastructures, including three HPC clusters used by faculty, students, and academic staff across campus.

Our annual summer workshop offers our users the basic concepts and skills necessary to benefit the most from our HPC facilities.

In our 2024 version of the summer workshop, we have divided the topics into a major morning workshop and two-afternoon mini-workshops. These workshops are independent; you can choose which to attend according to your interests.

Morning Sessions: In the main workshop, we will cover the basic skills that any user needs to use properly in an HPC cluster. We start explaining the main concepts in Supercomputing, the different components of an HPC cluster, and other concepts needed to understand the topics of the rest of the material.

Most new users are unfamiliar with Unix/Linux operating systems, so we introduced terminals, command line interfaces, and text editors. From that foundation, we continue with essential topics specific to HPC, such as how to access the different software used in the computers and the resource manager (Slurm), the tool that allows you to execute jobs on one or several computers on the cluster. We also introduce tmux, a terminal multiplexer that is a convenient tool for working seriously on HPC clusters.

We conclude on the fourth day of the workshop with a quick introduction to Machine Learning. Artificial Intelligence and, more specifically, Machine Learning offer computational techniques that are now widely used to solve challenging problems in all areas of science. As these methods are not restricted to a single science, they also illustrate how an HPC cluster can be a powerful tool in research when you acquire the skills to use an HPC cluster effectively.

Python Scripting for HPC: The first mini-workshop will explore using Python as a powerful scripting language. We introduce the language basics, focusing on solving users' usual situations when working with HPC clusters. Python is also a powerful programming language. We mention some elements in that direction, but in general, the programs we use for demonstration will remain small and targeted to simple use cases.

Partial Differential Equations: The second mini-workshop will explore the application of HPC systems to approximate solutions to the equations governing fluid dynamics, general relativity, and electromagnetism using ANSYS Fluent and the Einstein Toolkit. The focus will be on using the cluster to carry out scientific investigations. It is unnecessary to have pre-existing knowledge of these physical principles or the underlying mathematics; however, basic coding skills are recommended.


Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Online via Zoom. Get directions with OpenStreetMap or Google Maps.

When: July 24,26,31 and August 2nd, 2024. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email gufranco@mail.wvu.edu , djturpen@mail.wvu.edu , pnelson@mail.wvu.edu or jared.frick@mail.wvu.edu for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

Day 1 (Morning)

09:00 Introduction to Supercomputing
11:00 Command Line Interface
12:00 Adjurn

Day 1 (Afternoon)

14:00 Python Programming 1
15:00 Python Programming 1
16:00 Adjurn

Day 2 (Afternoon)

14:00 Python Programming 2
15:00 Python Programming 2
16:00 END Miniworkshop

Day 3 (Morning)

09:00 Workload Management (Slurm)
11:00 Terminal Multiplexing (tmux)
12:00 Adjurn

Day 3 (Afternoon)

14:00 Partial Differential Equations (CFD)
16:00 Adjurn

Day 4 (Afternoon)

14:00 Partial Differential Equations (Relativistic)
16:00 END Miniworkshop

Setup

To participate in a Research Computing workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Accessing the WVU's HPC Clusters

There are two ways of today accessing WVU's HPC Clusters: A secure remote shell (SSH) and via a Web Browser using the web application Open On-Demand. A secure remote shell is the traditional way of accessing a High-Performance Computing (HPC) cluster. A remote shell allows you to execute commands on another machine as you do sitting in front of it. A remote shell is convenient because it also allows other people to do the same with very little consumption of resources, so you get access to a resource being utilized by several users simultaneously.

All you need on your computer is a terminal emulator and an SSH client. A terminal emulator is a program that mimics the behavior of old dumb terminals from a few decades ago. An SSH client is a program on your computer that allows you to connect to the SSH server from another computer. In the old times (the 80s), people accessed shell using Telnet. This method is obsolete today as all information travels unencrypted and could easily be collected and modified. SSH provides a secure channel over an unsecured network such as the Internet. SSH offers similar capabilities to Telnet but adds encryption, so all data sent and received between your computer and the remote host is encrypted in such a way that only your computer and the remote computer can see the data. If you want to know more about Secure Shell, see it at Wikipedia. Currently (2024), WVU has two clusters for HPC, Thorny Flat for general HPC applications and Dolly Sods, a specialized HPC cluster for GPU intense processing. You can access them using SSH.

Regardless of which Operating System you use, there is always a way of having the terminal emulator and SSH client that you need to connect to an HPC cluster. Both Linux and macOS include an SSH client by default. In this case, you only have to open a terminal. On macOS, the terminal is located in the Utilities folder inside your Applications folder. The terminal is so central on Linux that most Linux distributions create an icon directly from the desktop or taskbar for easy access.

    On Windows machines, you need to install an external application. One option in Windows is a free application called PuTTY. PuTTY offers a simple SSH client that is enough for this lesson. Another option is MobaXTerm, which offers a full-featured SSH client plus the ability to open X11 windows from the remote machine.

    Using PuTTY

    PuTTY

    PuTTY

    PuTTY

    PuTTY

    PuTTY

    Using MobaXterm

    MobaXterm

    MobaXterm

    MobaXterm

    More recently, there are also options to get access to a terminal and SSH clients using Microsoft products. One option is to install Visual Studio Code. Instructions to add an SSH client to VSC can be found here and here. Visual Studio Code will give you not only access to remote access to a file manager.

    Another alternative is the Windows Subsystem for Linux (WSL). WSL will allow you to run a full Linux OS inside Windows with all the commands that you will encounter on a typical Linux machine. With the latest version of WSL 2 and the development version of Windows 10, you can even run GUI applications from inside Windows alongside other Windows applications. You can see the files from your Windows machine. Running and completing Linux OS inside Windows will give you access not only to SSH client and terminal but also to the commands that you will learn during this lesson, as you will do from a Native Linux machine. Instructions about installing and configuring WSL can be found here

On MacOS the terminal is located in your "Applications" under "Utilities".

An SSH client is also integrated by default with the OS.

Terminal MacOS

Follow the steps below to connect to the gateway ssh.wvu.edu and from there to Thorny Flat or Dolly Sods.

On any Linux distribution, a Terminal is a central part of the Operating System even for desktop usage. An SSH is also part of a default installation. There is usually no need to install anything.

Terminal Linux

Follow the steps below to connect to the gateway ssh.wvu.edu and to the HPC cluster.

Connecting to Thorny Flat and Dolly Sods

Connecting to either Thorny Flat or Dolly Sods is a two-step process. First, use your SSH client to connect to ssh.wvu.edu like this::

ssh <username>@ssh.wvu.edu

Enter your DUO authentication, and you will get a shell on the SSH Gateway. Execute one of the following commands to access the respective cluster.

For Thorny Flat:

ssh tf.hpc.wvu.edu

For Dolly Sods:

ssh ds.hpc.wvu.edu

Once you enter the system, you can start typing commands. You can open several connections simultaneously. Each connection is independent of the other, and you must authenticate on each new terminal. In one of our lectures, we will learn about a terminal multiplexer which is a convenient tool to keep open multiple terminals which saves time when working with HPC clusters.