High Performance Computing - Baobab Hello World

๐Ÿ•“ Dec 4, 2019 ยท โ˜•4 min read
๐Ÿท๏ธ
  • #computational statistics
  • Motivation & Introduction

    Researchers at the University of Geneva may request access and use the high-performance computing (HPC) server of the University of Geneva, Baobab. This cluster is particularly well suited for massive parallel computations. There exist already different ressources to help the user that are listed at the end of this tutorial. However, the documentation already provided can be challenging to grasp for a new user. The aim of this tutorial is therefore to provide a clear and concise introduction to the use of the HPC Cluster Baobab.

    Before getting interested in how to run tasks on Baobab, it is necessary to define some notions and install some applications.

    • The administrative procedure required to get access to Baobab is presented here.
    • Depending on your OS, you may access Baobab and transfer files between your computer and your LINUX session on Baobab using different softwares. Read and install the required softwares here.
    • As indicated above, to work with Baobab, you will be coding instructions on a LINUX command prompt, typing bash commands. We will list in this tutorial the most frequently used bash commands when working with Baobab.
    • Regarding the architecture of Baobab, one must understand that Baobab is composed of different partitions, each partition composed of different nodes, for which, each node is composed of a number of CPU/GPU. One can find further details about partitions and their various limits here .
    • Baobab schedules tasks using slurm cluster management and job scheduling system.

    Useful bash commands

    Command syntaxDescription
    ls -llist current directory
    pwdprint working directory
    cd ~navigate to home directory
    cd ..navigate up one directory
    cp oldfile newfilemake a copy of a file
    mv oldfile newfilerename a file
    rm filedelete a file

    Useful slurm commands

    Command syntaxDescription
    sbatchsubmit a job script for later execution
    scontrol show jobid 12345display the slurm state of a given job
    scancelcancel a running or pending job
    squeue -u usernamedisplay pending job of username

    Your first bash script to execute a R script

    In order to launch a given Rscript to be executed on Baobab, one need to execute a bash script via the command sbatch. Let’s look at an example of a simple bash script that launch a given R script.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    #!/bin/bash
    #SBATCH --job-name=simu_R
    #SBATCH --ntasks-per-node=1
    #SBATCH --cpus-per-task=1
    #SBATCH --time=0:15:0
    #SBATCH --partition=debug-EL7
    #SBATCH --mail-type=ALL
    #SBATCH --mail-user=firstname.lastname@unige.ch
    
    module load GCC/8.2.0-2.31.1 OpenMPI/3.1.3 R/3.6.0
    
    INFILE=simu.R
    OUTFILE=report_simu.Rout
    
    srun R CMD BATCH $INFILE $OUTFILE

    Interpreting the above bash script

    • The first line is called a shebang or hashbang and indicate to the shell what program to interpret the script with.
    • Lines 2 to 8 are slurm command options. Here is a table that present these options.
    Command syntaxDescription
    --job-namejob name
    --ntasks-per-nodenumber of nodes on which to run the job
    --cpus-per-tasknumber of CPU required per task
    --timewall clock time limit
    --partitionpartition(s) on which to run the job
    --mail-typeselect which event types to notify the user
    --mail-typeuser to receive email notification
    • line 10 load modules required to run R.
    • line 12 and 13 specify both input and output files.
    • line 15 run launch the execution of the task.

    Imagine that you create the following R script and want to run it on Baobab:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    #Load libraries
    library(foreach)
    
    #Simulate population
    mysd = 15
    mymean = 50
    pop = rnorm(10e5, mean = mymean, sd = mysd)
    
    #Define nbr of iterations
    B = 10e3
    samplesize = 100
    myresults = foreach(b = icount(B), .combine = rbind)%do%{
      mysample = sample(pop, size = samplesize)
      mean(mysample)
    }
    
    #Theoretical xbar variance
    mysd^2 / samplesize
    
    #Observed xbar variance
    var(myresults[,1])
    
    #Save results
    save(myresults, "xbar_simulation_results.rda")

    In order to transfer it on your LINUX session, you can either write it on your computer and then transfer it on your LINUX session using for example Filezilla, or you can directly write it on your LINUX session using vim. The following command will create a .R file that you can then edit and save it using vim commands.

    1
    
    vim simu.R

    Once this R script is saved on your LINUX session as simu.R, you can then run it on Baobab by running the previously discussed bash script. Assuming that you save the above bash script as launch_simu.sh, you can launch the job with the following command.

    1
    
    sbatch launch_simu.sh

    Useful ressources