Next: Interconnection Networks
Up: Interprocessor Communication

Shared Memory and Message Passing

Interprocessor Communication

Where there are N processors each with its own individual data stream i.e. SIMD. and MIMD. , it is usually necessary to communicate data between processors.
This is done in two ways
  1. Using SHARED MEMORY and SHARED VARIABLES or
  2. via an INTERCONNECTION NETWORK

Shared Memory and Message Passing

Here each processor has access to any variable residing in the shared memory. So if processor x wishes to pass a number to processor y it must be done in two steps:-
processor x writes the number into the shared memory at a given location accessible to processor y, which then reads the number from that location.

During the execution of a parallel algorithm, the N processors can access shared memory for reading / writing data and / or results.

All processors can gain access to the shared memory simultaneously if the memory locations they are trying to read from or write into are different.
However we can get problems when two or more processors require access to the same memory location simultaneously.


Let x be a variable that can be accessed by two processors P1 and P2
Now consider the assignment x := x + 1, which is normally done in three stages.
  1. copy value of x into some register
  2. add 1 to value on register
  3. store value on register at the address for x
Say now that P1 and P2 both execute such an assignment, assume x=0 initially

What is the final result ?

P1 copies value of x (= 0) into its register
P2 copies value of x (= 0) into its register

P1 adds 1 to its register --- these two at the same time
P1 stores value of x (= 1)

P2 adds 1 to its register --- these two at the same time
P2 stores value of x (= 1)

Giving a result of 1 rather than 2. This is because P2 reads the value of x (=0 ) before P1 has updated it.

Therefore depending on whether 2 or more processors can gain access to the same memory location simultaneously, we have 4 subclasses of shared memory computers :-

1. Exclusive Read, Exclusive Write (EREW) SM Computers

Access to memory locations is exclusive i.e. no 2 processors are allowed to simultaneously read from or write into the same location.

2. Concurrent Read, Exclusive Write (CREW) SM Computers

Multiple processors are allowed to read from the same location but write is still exclusive. .i.e. no 2 processors are allowed to write into the same location simultaneously

3. Exclusive Read, Concurrent Write (ERCW) SM Computers

Multiple processors are allowed to write into the same memory location but read access remains exclusive.

4. Concurrent Read, Concurrent Write (CRCW) SM Computers

Both multiple read and multiple write privileges are allowed.

NOTES
Allowing concurrent read access to the same address should pose no problems ( except perhaps to the result of a calculation; as in the example )

Conceptually, each of the several processors reading from that location makes a copy of its contents and stores it in its own register ( RAM )
Problems arise however, with concurrent write access.
If several processors are trying to simultaneously store ( potentially different ) data at the same address, which of them should succeed ?
i.e. we need a deterministic way of specifying the contents of a memory location after a concurrent write operation.

Some ways of resolving write conflicts include :-
A ) Assign priorities to the processors and accept value from highest priority processor

B ) All the processors are allowed to write, provided that the quantities they are attempting to store are equal, otherwise access is denied to ALL processors.

C ) The max / min / sum / average of the value is stored (numeric data ).

Despite this, it is only feasible to allow P processors to access P memory locations simultaneously for relatively small P (< 30 )
Usually because the cost of the communication hardware is too high and physical size of the device used for a memory location.

Generally SIMD machines, because they can use very simple processors ( since have no control unit ), typically need to have large numbers of processors ( > 1000 ) to achieve high performance.

So shared memory SIMD machines are unrealistic and no commercial machines exist with this design.

However in MIMD machines, which use much more powerful processors, shared memory systems are in existance which have small numbers of processors ( 2 - 30 ).

To illustrate the theoretical potential of the four different subclasses of shared memory consider the following example.

We have N processors to search a list S = { L1, L2, .... Ln } for the index of a given element x. 1 < N < = n assume x may appear several times in the list and any index will do.

ALGORITHM :


	Procedure SM search (S, x, k)
	
	Step 1: FOR i=1 to N do in parallel
read x ENDFOR Step 2: FOR i=1 to N do in parallel Si := {L(i-1)n/N + 1) , .........L in/N } perform sequential search on sublist (returns Ki = -1 if not in list or index if it is in the list ENDFOR Step 3: FOR i=1 to N do in parallel if Ki > 0 then K = Ki endif ENDFOR
Assuming the sequential search procedure takes O(n/N) time in the worst case what is the time complexity of running this algorithm on the four subclasses of shared memory machine ?

EREW : 	Step 1 takes O(N) time (N reads, one at a time)
	Step 2 takes O(n/N) time (time for reading list & sequential search)
	Step 3 takes O(N) time 
		
	i.e. overall O(N) + O(n/N) time 

ERCW : Step 1 takes O(N) time Step 2 takes O(n/N) time Step 3 takes constant time ( with an appropriate conflict resolution rule ) i.e. overall O(N) + O(n/N)

CREW : Step 1 takes constant time Step 2 takes O(n/N) time Step 3 takes O(N) time i.e. overall O(N) + O(n/N)

CRCW : Step 1 takes constant time Step 2 takes O(n/N) time Step 3 takes constant time i.e. overall O(n/N)