Being a user of the CGRL gives you access to computing resources in the Berkeley Research Computing (BRC) high-performance computing (HPC) environment, including a large variety of genomic and bioinformatic programs. Getting to know how to use these resources efficiently is a challenge, even for those familiar with command-line use. In this workshop, we'll introduce the CGRL resources available to users, including new computing resources for your largest genomics projects and recommendations for best practices.
Starting point: Berkeley Research Computing (BRC) supercluster high-performance computing (HPC) user guide
How do I sign up for an account, link my account to BRC HPC, and log in?
Where should I store data and how should I move it around?
What compute nodes are available?
How do I run my jobs?
Getting data into the BRC HPC environment
# log in to the DTN
ssh username@dtn.brc.berkeley.edu
# log in to the Genomic Sequencing Laboratory's FTP server
lftp ftp://gslftp@gslserver.qb3.berkeley.edu
Example interactive Slurm session
# see what Slurm jobs are running on CGRL's Vector cluster
squeue -p vector
# see all of my jobs running at the moment
squeue -u $USER
# start an interactive bash session Slurm job with 1 CPU on a Savio2 HTC node in CGRL's Rosalind condo
srun --pty --partition=savio2_htc --account=co_rosalind --qos=rosalind_htc2_normal --time=00:30:00 bash -i
# see information about the job you're currently running
echo $SLURM_JOB_ID
scontrol show job $SLURM_JOB_ID
# see all of the jobs running on the HTC node partition of the Savio2 cluster
squeue -p savio2_htc
# exit your bash session
exit
Example Slurm batch job: RNA-Seq quantification with kallisto
# download some example data to your Savio (Rosalind condo) scratch folder
cd /global/scratch/$USER
curl -L https://www.dropbox.com/sh/0abnf67z8m9iv02/AADbq28QEqBXmfPFe7jVvbiLa?dl=1 > download.zip
unzip download.zip
# editing batch a script
vim kallisto_for_workshop.sh
# running the Slurm batch job with 4 CPUs on a Savio2 HTC nodes in CGRL's Rosalind condo
sbatch kallisto_for_workshop.sh test_genome_index test 40 test_R1.fastq test_R2.fastq
# check output
head test/abundance.tsv
Customizing, locally installing
# setting a local library directory for installing R packages
cd ~
vim .bashrc
# add something like the following: export R_LIBS_USER="global/home/users/$USER/R"
source ~/.bashrc
module load r/3.2.5
R
CGRL User Guide for Computing Workshop
Introduction
Being a user of the CGRL gives you access to computing resources in the Berkeley Research Computing (BRC) high-performance computing (HPC) environment, including a large variety of genomic and bioinformatic programs. Getting to know how to use these resources efficiently is a challenge, even for those familiar with command-line use. In this workshop, we'll introduce the CGRL resources available to users, including new computing resources for your largest genomics projects and recommendations for best practices.Starting point: Berkeley Research Computing (BRC) supercluster high-performance computing (HPC) user guide
http://research-it.berkeley.edu/services/high-performance-computing/user-guideCGRL-specific user guide for the Vector cluster and Rosalind condo in Savio
http://research-it.berkeley.edu/services/high-performance-computing/cgrl-vectorrosalind-user-guideGetting data into the BRC HPC environment
Example interactive Slurm session
Example Slurm batch job: RNA-Seq quantification with kallisto
Customizing, locally installing