Research Computing

Demography Servers

in progress

Demography and Population Sciences maintains its own Unix servers that are in process of being relocated to the Berkeley Data Center (TBD Summer/Fall 2023). Among other services, the current servers host RStudio environments for various data projects.

rstudio.demog.berkeley.edu for general purpose research computing.

rstudio-coale.demog.berkeley.edu is a backup instance of rstudio.demog.berkeley.edu

Savio HPC (High Performance Computing)

Savio High Performance Computing (HPC) Linux cluster is designed and maintained by Berkeley Lawrence National Labs and Berkeley Research Computing which has more than 600 nodes, 15,000 cores, and 540 peak teraFLOPS (tldr; it’s powerful). This valuable shared computing resource is used for the most computationally intensive workloads that require performant CPUs, RAM, and disk as well as machine learning tasks that necessitate the use of graphics processing units (GPUs).

The lab manages an allocation of HPC credits under the account fc_demog intended for computational research in the Demography department. HPC systems operate on a fair share model which encourages users to plan for their job resource allocation in advance to optimize system resources and to prevent inadvertently impacting or crashing other user’s jobs by consuming more resources than are available. Savio uses a tool called Slurm to help manage jobs.

Creating Savio Account

  • Initial steps to access Savio are to create a BRC account, accept the terms of use, and join the fc_demog project.

    • More detailed instructions on joining Savio.

    • You will also need to create a pin + one-time password (OTP) combination for logging into Savio. The pin is a fixed four digit number and the OTP is a six digit number generated via an authenticator app. These two sets of numbers together, without spaces, is your Savio password.

    • With your credentials you can ssh into Savio’s login node:

     ssh <your-savio-username>@hpc.brc.berkeley.edu

Running Savio Jobs

  • Savio is composed of several nodes, each with their own specialized purpose and resources. When you first access Savio, you are accessing a login node which itself is simply a place to keep and edit scripts, and eventually schedule the computation on a specified node.

    ⚠️ Do not run your analyses on the login node ⚠️. When you are ready to schedule your job, you next identify the node you want to run the computation on (savio2, savio2_gpu, savio3, etc.), and specify the length of time it should run.

  • At a high level there are 3 ways to run a session on Savio:

    1. Interactive, terminal: This allows you to run programs interactively as you normally might on your own laptop’s terminal. To queue an interactive terminal session, you use the srun command specifying, at a minimum, the savio project you belong to, node partition, and time to allocate. Additional srun options.

      • Example srun command that schedules an interactive session on the fc_demog account, on the savio2 node for one hour:

        srun -A fc_demog -p savio2  -t 01:00:00 --pty bash

    2. Interactive, web-browser using Open OnDemand: This is the best way to run programs in RStudio, Jupyter Notebooks, and VS Code IDEs.

      • After logging in with your Savio credentials, navigate to Interactive Apps, select your environment (Jupyter, Rstudio, Matlab, etc.) and specify the time and compute resources required for your session.

    3. Batch, terminal: A batch job will run in the background via terminal. You use the sbatch command to execute a script which contains your instructions to Savio.

      • Batch processing is best for longer running processes, parallel processes or for running large numbers of short jobs simultaneously. If your code can be left running for a significant amount of time without any interaction consider running it as a batch job.

      • Additional templates for how batch scripts are organized.

  • Other useful commands:

    • scancel to cancel a job
    • sinfo to get current information on the queue
    • More here
  • To troubleshoot a job or to see how long it will take to run see this documentation

Storing and transferring data on Savio

  • Storage: To store your data it is recommended to use Savio’s scratch space which is several petabytes in size. Each user has their own scratch space that is inaccessible to other users.

    Your scratch directory will be at: /global/scratch/users/<your-savio-username>

    ⚠️ Please note that files in scratch storage that have not been accessed in 120 days will be purged ⚠️. More on Savio Storage

    To transfer data in and out of Savio you will need to move the data to a dedicated data transfer node (DTN). An example transferring a file on your laptop to savio using scp:

    scp <file> <your-savio-username>@dtn.brc.berkeley.edu:/global/scratch/users/<your-savio-username>

  • Demography has a shared directory /global/home/groups/fc_demog limited to 30 GB. This directory is intended only for temporarily sharing files between fc_demog researchers. You should also not read files directly from this directory but instead copy them into your own scratch directory to work with.

  • Savio is approved for P2/P3 level data but check with computing director before working with any sensitive data.

Software on Savio

  • Modules: To see a list of available software, type:

    module available

    to load software into memory:

    module load <name-of-software>/<version-number>

Analytics Environments on Demand (AEoD) & Secure Research Data and Computing (SRDC)

Analytics Environments on Demand (AEoD) is a virtual machine service for researchers who need to run analytic software packages (Python, R Studio, Stata, Matlab, etc.) on a platform that is scaled up from a standard laptop or workstation. This allows for custom analysis environments best suited for short to medium term project timelines. AEoD virtual machines are available running Windows or Linux OS, and may be used with moderately sensitive (P3) data. The AEoD Service is offered as a partnership between Research IT, Demography, and the BPC.

Secure Research Data and Compute platform (SRDC) is designed for highly sensitive data (P3/P4). Through the SRDC, we provision custom servers for restricted-use and sensitive data. These machines range from 2-16 cores with 8-128GBs of RAM each depending on the application and data protection level. The security architecture of the SRDC leverages campus information security services such as intrusion detection, multi-factor authentication, and access control systems. The SRDC platform has been assessed and reviewed by private cybersecurity firms to meet or exceed administrative, technical, and physical controls based on requirements provided by the NIST Risk Management Framework as well as EU laws such as GDPR. For secure data transfer, the SRDC uses a dedicated data transfer node (dtn.srdc.berkeley.edu) with file transfer utilities such as sftp, rsync, and the Globus gridftp server under a HIPAA-compliant subscription to ensure encrypted transfers.

Data Transfer

scp

scp (secure file copy) is an elegant and simple way to move smaller files over a network. It's general usage:

To copy a file from your local machine to a remote server:

scp <file> <user>@<host>:<dest>

To copy a file from a remote server to your local machine:

scp <user>@<host>:<src> <dest>

rclone

rclone is a command-line tool that can connect to cloud storage services such as: Dropbox, Box, Google Drive, sftp servers, and many other file services. It preserves timestamps and verifies checksums at all times. Transfers over limited bandwidth; intermittent connections, or subject to quota can be restarted, from the last good file transferred. Rclone has rich documention and supports many different cloud services. The first step is to configure rclone:

rclone config

From there, you might execute basic commands to ls and copy files between locations.

List content of remote directory

rclone ls remote:path

Copy /local/path to the remote

rclone copy /local/path remote:path

Globus

Globus is ideal for securely transferring large files (gigabytes, terabytes, petabytes) and/or large numbers of files at high speed. It allows you to start an unattended transfer and have confidence that it will be completed reliably. To get started, check out the step-by-step guide.

Data Storage

in progress