Worked Example Grain Size & Speedup

Consider adding m numbers using N processors.
Processors 1 -> N-1 compute the partial sum of m / N-1 numbers. These partial sums are passed on to processor PN to calculate the final sum i.e. add N-1 partial sums.

i.e. There are N-1 communications ( Assume PN can read input simultaneously )

Say each floating point addition takes 1 unit of time and it takes 100 units of time to pass the partial sums from P1 .... PN-1 to PN i.e.TOTAL communication time = 100 units.

Parallel time is

Tp = (m / N-1) + ( 100 ) + ( N -1 )

where (m / N-1) is time to add (m / N-1) numbers P1....PN-1
and ( 100 ) is the total communication time
and ( N -1 ) is the time to add N-1 numbers in PN

i.e. TP = M  +  N * N + 98N - 99
	  --------------------------
			N - 1
Time for sequential algorithm Ts is m units (since no communication needed)

therefore Consider now the grain size i.e. amount of work done in each processor between synchronizations, in this case, the size of m.

if m= 1000 and use 10 processors i.e. N = 10

      SPEEDUP = 10 000 - 1 000		=   4.54
		----------------------
		1 000 + 100 + 980 - 99
and	
      EFFICIENCY = 4.54	     = 0.454	=	45%
		   ----
		    10
If m= 10 000 and use 10 processors ( N = 10)
	SPEEDUP = 100 000  -  10 000		= 8.20
		  ------------------------
		  10 000 + 100 + 980 - 99

	EFFICIENCY = 8.20    =   0.82	 =  82%
		    ------   
		      10
So we can see that increasing the grain size, in this case by adding more numbers in each processor, increases the speedup and improves the efficiency.

Worked Example Tcomm and Tcalc

If we generalize the results of the previous example (above) to include Tcomm and Tcalc we get

	Tp  =  ( (m / N-1) + N-1 )Tcalc  +  Tcomm

	Ts  =  m Tcalc

Therefore % efficiency is given by

	100 m ( N-1 ) Tcalc  			=	E .......  ( 1 )
	---------------------------------------
	N (m + (N-1)*(N-1))Tcalc + N(N-1)Tcomm
Say we require to attain at least 80% efficiency using N = 10 then rearranging ( 1 ) we get :
	m > EN( N-1 ) [ ( N-1)Tcalc + Tcomm ]
	      -----------------------------------
		Tcalc [ 100(N-1) - EN ]
If Tcomm / Tcalc = 1 on machine A then m > 720
However if Tcomm / Tcalc = 10 on machine B then m needs to be > 1368