processor x writes the number into the shared memory at a given location accessible to processor y, which then reads the number from that location.
During the execution of a parallel algorithm, the N processors can access shared memory for reading / writing data and / or results.
All processors can gain access to the shared memory simultaneously if the memory locations they are trying to read from or write into are different.
However we can get problems when two or more processors require access to the same memory location simultaneously.
Giving a result of 1 rather than 2. This is because P2 reads the value of x (=0 ) before P1 has updated it.
Let x be a variable that can be accessed by two processors P1 and P2
Now consider the assignment x := x + 1, which is normally done in three stages.Say now that P1 and P2 both execute such an assignment, assume x=0 initially
- copy value of x into some register
- add 1 to value on register
- store value on register at the address for x
What is the final result ?
P1 copies value of x (= 0) into its register
P2 copies value of x (= 0) into its register
P1 adds 1 to its register --- these two at the same time
P1 stores value of x (= 1)
P2 adds 1 to its register --- these two at the same time
P2 stores value of x (= 1)
Therefore depending on whether 2 or more processors can gain access to the same memory location simultaneously, we have 4 subclasses of shared memory computers :-
1. Exclusive Read, Exclusive Write (EREW) SM Computers
Access to memory locations is exclusive i.e. no 2 processors are allowed to simultaneously read from or write into the same location.
NOTES
Allowing concurrent read access to the same address should pose no problems ( except perhaps to the result of a calculation; as in the example )
Conceptually, each of the several processors reading from that location makes a copy of its contents and stores it in its own register ( RAM )
Problems arise however, with concurrent write access.
If several processors are trying to simultaneously store ( potentially different ) data at the same address, which of them should succeed ?Despite this, it is only feasible to allow P processors to access P memory locations simultaneously for relatively small P (< 30 )
i.e. we need a deterministic way of specifying the contents of a memory location after a concurrent write operation.Some ways of resolving write conflicts include :-
A ) Assign priorities to the processors and accept value from highest priority processorB ) All the processors are allowed to write, provided that the quantities they are attempting to store are equal, otherwise access is denied to ALL processors.
C ) The max / min / sum / average of the value is stored (numeric data ).
Generally SIMD machines, because they can use very simple processors ( since have no control unit ), typically need to have large numbers of processors ( > 1000 ) to achieve high performance.
So shared memory SIMD machines are unrealistic and no commercial machines exist with this design.
However in MIMD machines, which use much more powerful processors, shared memory systems are in existance which have small numbers of processors ( 2 - 30 ).
To illustrate the theoretical potential of the four different subclasses of shared memory consider the following example.
We have N processors to search a list S = { L1, L2, .... Ln } for the index of a given element x. 1 < N < = n assume x may appear several times in the list and any index will do.
ALGORITHM :
Procedure SM search (S, x, k)
Step 1: FOR i=1 to N do in parallel
read x
ENDFOR
Step 2: FOR i=1 to N do in parallel
Si := {L(i-1)n/N + 1) , .........L in/N }
perform sequential search on sublist
(returns Ki = -1 if not in list or index if it is in the list
ENDFOR
Step 3: FOR i=1 to N do in parallel
if Ki > 0 then K = Ki endif
ENDFOR
Assuming the sequential search procedure takes O(n/N) time in the worst case what is the time complexity of running this algorithm on the four subclasses of shared memory machine ?
EREW : Step 1 takes O(N) time (N reads, one at a time) Step 2 takes O(n/N) time (time for reading list & sequential search) Step 3 takes O(N) time i.e. overall O(N) + O(n/N) timeERCW : Step 1 takes O(N) time Step 2 takes O(n/N) time Step 3 takes constant time ( with an appropriate conflict resolution rule ) i.e. overall O(N) + O(n/N)
CREW : Step 1 takes constant time Step 2 takes O(n/N) time Step 3 takes O(N) time i.e. overall O(N) + O(n/N)
CRCW : Step 1 takes constant time Step 2 takes O(n/N) time Step 3 takes constant time i.e. overall O(n/N)