2018 Summer Workshop

West Virginia University

July 16-19, 2018

9:00 am - 4:00 pm

Instructor: Guillermo Avendano-Franco

This workshop focuses on the skills needed to use effectively a High-Performance Computing (HPC) cluster. In particular the usage of WVU clusters: "Mountaineer", "Spruce Knob" and the forthcoming cluster "Thorny Flat" to be deployed in Fall.

This Webpage: https://bit.ly/2NgRBoS
The streaming video: Zoom

General Information

WVU Research Computing team helps researchers optimize their interaction with our High-Performance Computing clusters. This workshop teaches the basic skills needed to work effectively with HPC clusters. The workshop will cover basic concepts and tools, including accessing the cluster, submit jobs, containers, version control, data management, and parallel programming. Participants will be encouraged to help one another and to apply what they have learned to their own research problems. This workshop is not aimed to HPC system administrators or advanced scientific software developers, the concepts are presented to offer a pragmatic overview of several topics, with the knowledge that can be transferred directly to current research projects.

For a very practical overview of Scientific Computing best practices, this document is recommended: "Best Practices for Scientific Computing".

Who: The course is aimed at WVU graduate students and researchers as well as other West Virginia students and researchers who can get access to WVU resources via collaborations with WVU researchers. External attendants can take benefit of these lectures, especially if they have access to their own HPC cluster or can reproduce exercises on their own machines. Some familiarity with Unix/Linux and basic programming is a recommended. The first lesson, however, lowers this requirement for new users willing to get access and explore the advantages that High-Performance Computing (HPC) infrastructure can offer to boost their research.

Where: One Waterfront Place. Get directions with OpenStreetMap or Google Maps.

When: July 16-19, 2018. Add to your Google Calendar.

Requirements: Most of the examples and exercises will be executed on Spruce and access will be provided for the workshop. The only requirement of desktop computers and personal laptops is the ability to connect via SSH to the cluster. Both Linux and Mac offers that natively. Windows users can download and install PuTTy for a basic SSH client or MobaXterm for an SSH client with X11 server.

Contact: Please email helpdesk@hpc.wvu.edu for more information.



Schedule

Day 1 (Using an HPC cluster)

09:00 Getting Access to the cluster
10:00 Basics of Command Line Interface
11:00 Text Editors (vi, emacs and nano)
11:30 Environment Modules
12:00 Lunch break
13:00 Job Submission (Torque/Moab)
14:00 Version Control (Git)
14:30 Transferring Files
15:00 Software Containers (Singularity)
16:00 END

Day 2 (Languages for Scientific Computing)

09:00 Sieve of Eratosthenes
10:00 Interpreted Languages: (Python and R)
11:00 C/C++: Traditional computing
11:30 Fortran: Intensive numerics
12:00 Lunch break
13:00 Julia: The best of two worlds
14:00 Cython: Accelerated Python
15:00 R: Accelerated with compiled code
16:00 END

Day 3 (Data Processing)

09:00 Processing Text Files with grep and awk
10:00 Using Regular Expressions with Python
10:30 Structured Text (XML and JSON)
11:30 Binary Formats (NetCDF and HDF5)
12:00 Lunch break
13:00 Creating simple Databases with SQLite
14:00 No-SQL databases with MongoDB
15:00 Machine Learning: Scikit-Learn, Keras, and TensorFlow
16:00 END

Day 4 (Parallel Programming)

09:00 Introduction to Parallel Computing
10:00 Embarrassing Parallel Jobs
11:00 Multithreading (OpenMP)
12:00 Lunch break
14:00 Multiprocessing (MPI)
15:00 HPC Accelerators (Cuda)
16:00 END

Syllabus

Using an HPC cluster

  • Accessing an HPC Cluster
  • Files, Folders, and Symbolic Links
  • Pipes and Redirection
  • Bash Scripting
  • Job Submission and Control
  • Job Prologue and Epilogue. Using Job Arrays
  • Singularity Containers

Languages for Scientific Computing

  • R
  • Python
  • Julia
  • Java
  • C/C++
  • Fortran
  • Assembler

Data Processing

  • Text files
  • Compressed Files
  • XML and JSON
  • NetCDF and HDF5
  • SQL (SQLite)
  • No-SQL (MongoDB)
  • Machine Learning (Scikit-Learn, Keras and TensorFlow)

Parallel Programming

  • Embarrassing Parallelism (GNU parallel)
  • Multithreading (OpenMP)
  • Multiprocessing (MPI)
  • HPC Accelerators (Cuda)