Simple Parallelism (GNU Parallel, R parallel and Python Multiprocessing)
Overview
Teaching: 45 min
Exercises: 15 minQuestions
What is embarrassing parallelism?
Objectives
Learn how to quickly explore parametric spaces using GNU parallel
GNU Parallel
On 4.Parallel/1.BasicParallel
there is a copy of the file output.log
used on a previous lesson.
We will use that file to mimic the creation of a parametric study involving 1000 simulations.
The command GNU parallel is very practical executing the same operation for a given list of instances.
The command will compute in parallel up to the limit of cores available or the limit given with the argument -j
$ seq -w 0 1000 | parallel mkdir {}
$ seq -w 0 1000 | parallel ln -s ../output.log {}
time grep "Time step" */output.log > /dev/null
seq -w 0 1000 | time parallel -j 4 grep \"Time step\" {}/output.log
R in parallel
R offers one option to run some commands in parallel, the most basic form offers replacements to commands like lapply
or sapply
with parLapply
and parSapply
.
> lapply(1:3, function(x) c(x, x^2, x^3))
The library is called parallel
and most embarrassing parallelism can be implemented with it.
> library(parallel)
> ncores<- detectCores()
> ncores<-4
> cl <- makeCluster(ncores)
> ans <- parLapply(cl, 2:4, function(x) 2^x + 3^x)
> stopCluster(cl)
> cl <- makeCluster(ncores)
> ans <- parSapply(cl, 2:4, function(x) 2^x + 3^x)
> stopCluster(cl)
RNGkind("L'Ecuyer-CMRG")
nsamples<-1e8
pois.lambda<-10
mc.reset.stream()
system.time({
jobs<-list()
jobs[[1]]<-mcparallel(rpois(nsamples, pois.lambda), "pois", mc.set.seed=TRUE)
jobs[[2]]<-mcparallel(runif(nsamples), "unif", mc.set.seed=TRUE)
jobs[[3]]<-mcparallel(rnorm(nsamples), "norm", mc.set.seed=TRUE)
jobs[[4]]<-mcparallel(rexp(nsamples), "exp", mc.set.seed=TRUE)
random3 <- mccollect(jobs)
})
Python multiprocessing
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = 10*[None]
for i in range(10):
p[i] = Process(target=f, args=('bob',))
p[i].start()
p[i].join()
Key Points
GNU parallel is just a tool to exploit embarrassing parallel problems. You can equally use multiprocessing in python or R parallel libraries for the same effect.