Transfer files in and out the cluster

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How to send files in and out the cluster

Objectives
  • Transfer files using Globus Online, sftp, and scp

Overview

Now that you have access to the HPC system, you need to be able to transfers files/data to and from the system. Each HPC system you encounter will have various ways to transfer files but we will focus on a few of the core methods

Globus Online

Globus Online is the preferred method for transferring files to, from, and between Research Computing Resources. WVU is also a Globus Subscriber which provides additional services over the basic/free subscription. Globus Online offers the following advantages over traditional transfer methods (i.e. scp, sftp, rsync):

WVU’s Globus Subscription adds the following features:

Globus Online Account Setup

Globus Online Demo

Transferring files interactively with sftp

SFTP (Secure File Transfer Protocol) utilizes SSH to transfer files over a ftp style interface. SFTP is an interactive way of downloading and uploading files. Let’s connect to a cluster, using sftp- you’ll notice it works the same way as SSH:

sftp yourUsername@remote.computer.address

This will start what appears to be a bash shell (though our prompt says sftp>). However we only have access to a limited number of commands. We can see which commands are available with help:

sftp> help
Available commands:
bye                                Quit sftp
cd path                            Change remote directory to 'path'
chgrp grp path                     Change group of file 'path' to 'grp'
chmod mode path                    Change permissions of file 'path' to 'mode'
chown own path                     Change owner of file 'path' to 'own'
df [-hi] [path]                    Display statistics for current directory or
                                   filesystem containing 'path'
exit                               Quit sftp
get [-afPpRr] remote [local]       Download file
reget [-fPpRr] remote [local]      Resume download file
reput [-fPpRr] [local] remote      Resume upload file
help                               Display this help text
lcd path                           Change local directory to 'path'
lls [ls-options [path]]            Display local directory listing
lmkdir path                        Create local directory
ln [-s] oldpath newpath            Link remote file (-s for symlink)
lpwd                               Print local working directory
ls [-1afhlnrSt] [path]             Display remote directory listing

# omitted further output for clarity

Notice the presence of multiple commands that make mention of local and remote. We are actually connected to two computers at once (with two working directories!).

To show our remote working directory:

sftp> pwd
Remote working directory: /global/home/yourUsername

To show our local working directory, we add an l in front of the command:

sftp> lpwd
Local working directory: /home/jeff/Documents/teaching/hpc-intro

The same pattern follows for all other commands:

To upload a file, we type put some-file.txt (tab-completion works here).

sftp> put config.toml
Uploading config.toml to /global/home/yourUsername/config.toml
config.toml                                   100%  713     2.4KB/s   00:00 

To download a file we type get some-file.txt:

sftp> get config.toml
Fetching /global/home/yourUsername/config.toml to config.toml
/global/home/yourUsername/config.toml                               100%  713     9.3KB/s   00:00

And we can recursively put/get files by just adding -r. Note that the directory needs to be present beforehand.

sftp> mkdir content
sftp> put -r content/
Uploading content/ to /global/home/yourUsername/content
Entering content/
content/scheduler.md              100%   11KB  21.4KB/s   00:00
content/index.md                  100% 1051     7.2KB/s   00:00
content/transferring-files.md     100% 6117    36.6KB/s   00:00
content/.transferring-files.md.sw 100%   24KB  28.4KB/s   00:00
content/cluster.md                100% 5542    35.0KB/s   00:00
content/modules.md                100%   17KB 158.0KB/s   00:00
content/resources.md              100% 1115    29.9KB/s   00:00

To quit, we type exit or bye.

Exercise 1

Using Spruce Knob as your client, retrieve the download.txt file from 149.165.169.156 and display the contents of the file to screen.

Note: Login information will be provided during class.

Secure Copy (scp)

To copy a single file to or from the cluster, we can use scp. The syntax can be a little complex for new users, but we’ll break it down here:

To transfer to another computer:

[local]$ scp /path/to/local/file.txt yourUsername@remote.computer.address:/path/on/remote/computer

To download from another computer:

[local]$ scp yourUsername@remote.computer.address:/path/on/remote/computer/file.txt /path/to/local/

Note that we can simplify doing this by shortening our paths. On the remote computer, everything after the : is relative to our home directory. We can simply just add a : and leave it at that if we don’t care where the file goes.

[local]$ scp local-file.txt yourUsername@remote.computer.address:

To recursively copy a directory, we just add the -r (recursive) flag:

[local]$ scp -r some-local-folder/ yourUsername@remote.computer.address:target-directory/

Exercise 2

Copy (i.e. download) the directory class from 149.165.169.156 and display the contents of the directory to screen.

Note: Login information will be provided during class.

Additional Options not Covered

In addition to the methods mentioned above, several other popular tools are available but not covered. Feel free to research these in your own time.

Google Drive

Google Drive provides WVU students and faculty free unlimited storage through their WVU MIX Account. Is this is great solution for archiving data sets and is integerated directly with Globus Online! Note: This does not work well for active data sets due to the slowness of having to retrieve the data.

Key Points

  • Transfer files using Globus Online, sftp, and scp