To illustrate this consider the following.
Say we have two processors A and B. A produces a sequence of numbers a1, a2 .. B inputs ai and performs some calculation F which uses ai.
Say that B runs much faster than A.
The idea in using asynchronous algorithms is that all processors are kept busy and never remain idle (unlike synchronous algorithms ) so speedup is maximized.
A drawback is that they are difficult to analyse ( because we do not know what data is being used ) and also an algorithm that is known to work ( e.g. converge) in synchronous mode may not work (e.g diverge) in asynchronous mode.
[ Examples of calculations that do run successfully as asynchronous algorithms are : Gauss Seidel and Jacobi methods for solving linear systems of equations. ]

Say we have 3 processors
P1 : given x, P1 calculates F (x ) in time t1, units and sends it to P3
P2 :given y, P2 calculates F'(y) in time t2 units and sends it to P3
P3 : given a, b, c, P3 calculates d = a - b/c in time t3 units; if | d-a | > Epsilon then d is sent to P1 and P2 otherwise d is output.
Again k iterations will be necessary so total time is k [ max( t1, t2) + t3]
X1 = X0 - F(X0)/F'(X0) ...etc
P3 computes a new value using (1) as soon as EITHER P1 OR P2 provide a new input i.e. (1) is now of the form
Xn+1 = Xn - F(SXi)/F'(Xj)
Time per iteration is at most min( t1, t2) + t3 [but may be as low as t3 - e.g. if P1 and P2 complete at consecutive time steps]
we cannot predict how many iterations will be necessary
Say t1 = 2, t2 = 3 and t3 = 1.
2 units 3 units 1 unit TIME P1 P2 P3 2 F(X0) C0 - 3 - F'(X0) - 4 - - X1 = X0 - F(X0)/F'(X0) 5 C1 C1 - 6 F(X1) C1 - 7 - F'(X1) X2 = X1 - F(X1)/F'(X0) 8 C2 C2 X3 = X2 - F(X1)/F'(X1) 9 F(X2) C2 - 10 C3 F'(X2) X4 = X3 - F(X2)/F'(X1) 11 F(X3) C3 or C4 X5 = X4 - F(X2)/F'(X2) 12 C4 or C5 X6 = X5 - F(X3)/F'(X2) 13 . . . 14 . . . etc- indicates processor is idle
NOTE
At time 11 P2 has the choice of using X3 to calculate F'(X3) or X4 to calculate F'(X4), i.e. omit X3. Which choice is made should be determined experimentally to see which gives the best results.
we could relax the parameter i.e. use X4 or we could synchronise with the parameter i.e. using X3