CGRL User Guide for Computing Workshop



Introduction

Being a user of the CGRL gives you access to computing resources in the Berkeley Research Computing (BRC) high-performance computing (HPC) environment, including a large variety of genomic and bioinformatic programs. Getting to know how to use these resources efficiently is a challenge, even for those familiar with command-line use. In this workshop, we'll introduce the CGRL resources available to users, including new computing resources for your largest genomics projects and recommendations for best practices.


Starting point: Berkeley Research Computing (BRC) supercluster high-performance computing (HPC) user guide

http://research-it.berkeley.edu/services/high-performance-computing/user-guide


CGRL-specific user guide for the Vector cluster and Rosalind condo in Savio

http://research-it.berkeley.edu/services/high-performance-computing/cgrl-vectorrosalind-user-guide
  • How do I sign up for an account, link my account to BRC HPC, and log in?
  • Where should I store data and how should I move it around?
  • What compute nodes are available?
  • How do I run my jobs?


Getting data into the BRC HPC environment

# log in to the DTN
ssh username@dtn.brc.berkeley.edu
 
# log in to the Genomic Sequencing Laboratory's FTP server
lftp ftp://gslftp@gslserver.qb3.berkeley.edu


Example interactive Slurm session

# see what Slurm jobs are running on CGRL's Vector cluster
squeue -p vector
 
# see all of my jobs running at the moment
squeue -u $USER
 
# start an interactive bash session Slurm job with 1 CPU on a Savio2 HTC node in CGRL's Rosalind condo
srun --pty --partition=savio2_htc --account=co_rosalind --qos=rosalind_htc2_normal --time=00:30:00 bash -i
 
# see information about the job you're currently running
echo $SLURM_JOB_ID
scontrol show job $SLURM_JOB_ID
 
# see all of the jobs running on the HTC node partition of the Savio2 cluster
squeue -p savio2_htc
 
# exit your bash session
exit
 


Example Slurm batch job: RNA-Seq quantification with kallisto

# download some example data to your Savio (Rosalind condo) scratch folder
cd /global/scratch/$USER
curl -L https://www.dropbox.com/sh/0abnf67z8m9iv02/AADbq28QEqBXmfPFe7jVvbiLa?dl=1 > download.zip
unzip download.zip
 
# editing batch a script
vim kallisto_for_workshop.sh
 
# running the Slurm batch job with 4 CPUs on a Savio2 HTC nodes in CGRL's Rosalind condo
sbatch kallisto_for_workshop.sh test_genome_index test 40 test_R1.fastq test_R2.fastq
 
# check output
head test/abundance.tsv
 
 


Customizing, locally installing

# setting a local library directory for installing R packages
cd ~
vim .bashrc
# add something like the following: export R_LIBS_USER="global/home/users/$USER/R"
source ~/.bashrc
 
module load r/3.2.5
R