Parallel MATLAB: Parallel For Loops
Parallel MATLAB: Parallel For Loops
Parallel MATLAB: Parallel For Loops
1 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
2 / 68
3 / 68
4 / 68
5 / 68
INTRO: Execution
6 / 68
7 / 68
9 / 68
11 / 68
INTRO: ITHACA
12 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
13 / 68
14 / 68
15 / 68
QUAD: Comments
The function quad fun estimates the integral of a particular
function over the interval [a, b].
It does this by evaluating the function at n evenly spaced points,
multiplying each value by the weight (b a)/n.
These quantities can be regarded as the areas of little rectangles
that lie under the curve, and their sum is an estimate for the total
area under the curve from a to b.
We could compute these subareas in any order we want.
We could even compute the subareas at the same time, assuming
there is some method to save the partial results and add them
together in an organized way.
16 / 68
17 / 68
QUAD: Comments
The parallel version of quad fun does the same calculations.
The parfor statement changes how this program does the
calculations. It asserts that all the iterations of the loop are
independent, and can be done in any order, or in parallel.
Execution begins with a single processor, the client. When a parfor
loop is encountered, the client is helped by a pool of workers.
Each worker is assigned some iterations of the loop. Once the loop
is completed, the client resumes control of the execution.
MATLAB ensures that the results are the same whether the
program is executed sequentially, or with the help of workers.
The user can wait until execution time to specify how many
workers are actually available.
18 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
19 / 68
20 / 68
21 / 68
22 / 68
24 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
25 / 68
Advanced Topics
Description
Loop
Sliced
Broadcast
Reduction
Temporary
26 / 68
Reduction
Temporary
Variable an
created
inside the loop, but unlike sliced or
CLASSIFICATION:
example
reduction variables, not available outside the loop
Each of these variable classifications appears in this code fragment:
temporary variable
reduction variable
sliced output variable
loop variable
sliced input variable
broadcast variable
Loop Variable
NB:TheThe
range
of a isparfor
statement
must
following
restriction
required,
because changing
i inbe
theincreasing
parfor body
invalidates the assumptions MATLAB makes about communication between
consecutive
integers
the client and workers.
Trick Ques.: What values to a and d have after exiting the loop ?
2-15
27 / 68
Sliced variables:
28 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
29 / 68
30 / 68
31 / 68
1
2
...
9
10
Potential
Energy
498108.113974
498108.113974
...
498108.111972
498108.111400
Kinetic
Energy
0.000000
0.000009
...
0.002011
0.002583
(P+K-E0)/E0
Energy Error
0.000000e+00
1.794265e-11
...
1.794078e-11
1.793996e-11
Profile Summary
Generated 27-Apr-2009 15:37:30 using cpu time.
Function Name
Calls
md
415.847 s
0.096 s
compute
11
415.459 s
410.703 s
repmat
11000 4.755 s
4.755 s
timestamp
0.267 s
0.108 s
datestr
0.130 s
0.040 s
timefun/private/formatdate 2
0.084 s
0.084 s
update
10
0.019 s
0.019 s
datevec
0.017 s
0.017 s
now
0.013 s
0.001 s
datenum
0.012 s
0.012 s
datestr>getdateform
0.005 s
0.005 s
initialize
0.005 s
0.005 s
etime
0.002 s
0.002 s
Self time is the time spent in a function excluding the time spent in its child functions. Self time also includes overhead res
the process of profiling.
33 / 68
34 / 68
MD: Speedup
Replacing for i by parfor i, here is our speedup:
36 / 68
MD: Speedup
37 / 68
38 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
39 / 68
40 / 68
41 / 68
f u n c t i o n t o t a l = prime ( n )
%% PRIME r e t u r n s t h e number o f p r i m e s b e t w e e n 1 and N .
total = 0;
for i = 2 : n
prime = 1;
for j = 2 : i 1
i f ( mod ( i , j ) == 0 )
prime = 0;
end
end
t o t a l = t o t a l + prime ;
end
return
end
42 / 68
43 / 68
m a t l a b p o o l ( open , l o c a l , 4 ) % f u n c t i o n form
n =50;
w h i l e ( n <= 500000 )
primes = prime number parfor ( n ) ;
f p r i n t f ( 1 , %8d %8d\n , n , p r i m e s ) ;
n = n 10;
end
matlabpool
( close )
44 / 68
PRIME: Timing
PRIME_PARFOR_RUN
Run PRIME_PARFOR with 0, 1, 2, and 4 labs.
N
50
500
5000
50000
500000
1+0
0.067
0.008
0.100
7.694
609.764
1+1
1+2
1+4
0.179
0.023
0.142
9.811
826.534
0.176
0.027
0.097
5.351
432.233
0.278
0.032
0.061
2.719
222.284
45 / 68
There are many thoughts that come to mind from these results!
Why does 500 take less time than 50? (It doesnt, really).
How can 1+1 take longer than 1+0?
(It does, but its probably not as bad as it looks!)
This data suggests two conclusions:
Parallelism doesnt pay until your problem is big enough;
AND
Parallelism doesnt pay until you have a decent number of workers.
46 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Classification of variables
Conclusion
47 / 68
d 2x
dx
+ k x = f (t)
+b
2
dt
dt
48 / 68
49 / 68
50 / 68
51 / 68
m = 5.0;
bVals = 0.1 : 0.05 : 5;
kVals = 1.5 : 0.05 : 5;
[ kGrid , bGrid ] = meshgrid ( bVals , kVals ) ;
p e a k V a l s = nan ( s i z e ( k G r i d ) ) ;
tic ;
parfor
i j = 1 : numel ( k G r i d )
[ T , Y ] = ode45 ( @( t , y ) o d e s y s t e m ( t , y , m, b G r i d ( i j ) , k G r i d ( i j ) ) ,
[0 , 25] ,
[0 , 1] ) ;
p e a k V a l s ( i j ) = max ( Y ( : , 1 )
...
);
end
toc ;
52 / 68
%
%
%
Display the r e s u l t s .
figure ;
s u r f ( bVals , kVals , peakVals ,
EdgeColor , I n t e r p , FaceColor , I n t e r p ) ;
t i t l e ( R e s u l t s o f ODE P a r a m e t e r Sweep )
x l a b e l ( Damping B ) ;
ylabel ( Stiffness K );
z l a b e l ( Peak D i s p l a c e m e n t ) ;
v i e w ( 5 0 , 30 )
54 / 68
55 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
57 / 68
subject to:
A*X
<= B,
Aeq*X = Beq (linear constraints)
C(X) <= 0,
Ceq(X) = 0
(nonlinear constraints)
LB <= X <= UB (bounds)
If no derivative or Hessian information is supplied by the user, then
FMINCON uses finite differences to estimate these quantities. If
fun is expensive to evaluate, the finite differencing can dominate
the execution.
58 / 68
59 / 68
60 / 68
Introduction
QUAD Example
Executing a PARFOR Program
Classification of variables
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
62 / 68
64 / 68
65 / 68
66 / 68
67 / 68
68 / 68