Outline for February 6, 2001 1. Greetings and felicitations! a. Friday times good, also Tuesday 3-4:30. Please send me your preferences! 2. Global state a. Show problem of slicing state when something is in transit b. Define local state; send(mij) ? LSi iff time of send(mij) < current time of LSi; similar for receive c. transit(LSi, LSj); inconsistent(LSi, LSj); consistent state is one with inconsistent set empty for all pairs LSi, LSj d. Consistent global state: Chandry-Lamport 3. Termination detection a. Haung 4. Differences with non-distributed algorithms a. no shared memory, no common clock b. unpredictable message delays 5. Types of algorithms a. non-token algorithms: Lamport, Ricart and Agrawala, Maekawa b. token-based: Singhal, Raymond c. detection and recovery 6. System model a. states: requesting (entry section), executing (critical section), idle (remainder section), idle token (like idle, but you have the token) b. site: many others requesting entry, all are queued and served one at a time 7. Solution assumptions a. process names can be integers b. messages received in the order sent, in a finite amount of time, and correctly c. any process can communicate with any other process 8. Solution requirements a. mutual exclusion b. no deadlocks (progress) c. no starvation (bounded wait) d. fairness (requests executed in the order made, or in the order they arrive in system) e. fault tolerance (if a system fails, the algorithm can recover and continue to function) 9. Performance measures a. Performance under varying loads (low, high) b. Best, average, worst case performance 10. Terminology for non-token based protocols a. Request set Ri for a process pi:set of nodes from which pi must obtain permission to enter critical section b. Requests ordered bytimestamps: (time, pid); the pid is used to disambiguate equal timestamps c. Request sets satisfy the following conditions: i. pairwise non-null intersection property: for all 1 i, j n with i j, Ri Ť Rj ii. equal effort rule: for all 1 i n, | Ri | = K iii. equal responsibility rule: pj is contained in D number of Ri. iv. for all 1 i n, pi ? Ri 11. Obvious solution: pick a single controlling site a. Advantages: 3 messages per request (site REQUEST, controller REPLY, site RELEASES) b. Disadvantages: single point of failure, congestion, controller does all the work 12. Lamport's a. Request set is all processes b. Performance: 3(n-1) messages 13. Ricart and Agrawala's a. Request set is all processes b. Performance: 2(n-1) messages Chandy-Lamport Global State Recording Protocol Introduction The goal of this distributed algorithm is to capture a consistent global state. It assumes all communication channels are FIFO. It uses a distinguished message called a marker to start the algorithm. Protocol Pi sends marker 1. Pi records its local state LSi 2. For each Cij on which Pi has not already sent a marker, Pi sends a marker before sending other messages. Pi receives marker from Pj 1. If Pi has not recorded its state: a. Record the state of Cji as empty b. Send the marker as described above 2. If Pi has recorded its state LSi a. Record the state of Cji to be the sequence of messages received between the computation of LSi and the marker from Cji. Example Here, all processes are connected by communications channels Cij. Messages being sent over the channels are repre- sented by arrows between the processes. Snapshot s1: P1 records LS1, sends markers on C12 and C13 P2 receives marker from P1 on C12; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 P3 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32. P1 receives marker from P2 on C21; as LS1 is recorded, it records the state of C21 as empty. P1 receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. P2 receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. P3 receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty. Snapshot s2: now messages are in transit on C12 and C21. P1 records LS1, sends markers on C12 and C13 P2 receives marker from P1 on C12 after the message from P1 arrives; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 P3 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32. P1 receives marker from P2 on C21; as LS1 is recorded, and a message has arrived since LS1 was recorded, it records the state of C21 as containing that message. P1 receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. P2 receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. P3 receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty. Huang's Termination Detection Protocol Introduction The goal of this protocol is to detect when a distributed computation terminates. Notation ˇ n processes ˇ Pi process; without loss of generality, let P0 be the controlling agent ˇ Wi. weight of process Pi; initially, W0 = 1 and for all other i, Wi = 0. ˇ B(W) computation message with assigned weight W ˇ C(W) control message sent from process to controlling agent with assigned weight W Protocol Pi sends a computation message to Pj 1. Set Wi' and Wj to values such that Wi' + Wj = Wi, Wi > 0, Wj > 0. (Wi' is the new weight of Pi.) 2. Send B(Wj) to Pj Pj receives a computation message B(W) from Pi 1. Wj = Wj + W 2. If Pj is idle, Pj becomes active Pi becomes idle: 1. Send C(Wi) to P0 2. Wi = 0 3. Pi becomes idle Pi receives a control message C(W): 1. Wi = Wi + W 2. If Wi = 1, the computation has completed. Example The picture shows a process P0, designated the controlling agent, with W0 = 1. It asks P1, P2, and P3 to do some computation. It sets W1 to 0.3, W2 to 0.2, and W3 to 0.5. P2 in turn asks P4 and P5 to do some computa- tions. It sets W4 to 0.1 and W5 to 0.1. When P5 terminates, it sends C(W3) = C(0.1) to P0, which changes W0 to 0 + 0.1 = 0.1. When P3 terminates, it sends C(W3) = C(0.5) to P0, which changes W0 to 0.1 + 0.5 = 0.6. When P4 terminates, it sends C(W4) = C(0.1) to P0, which changes W0 to 0.6 + 0.1 = 0.7. When P1 terminates, it sends C(W1) = C(0.3) to P0, which changes W0 to 0.7 + 0.3 = 1. P0 thereupon concludes that the computation is finished. Total number of messages passed: 9 (one to start each computation, one to return the weight to the controlling node). Lamport's Distributed Mutual Exclusion Protocol Introduction Lamport's scheme uses distributed clocks. Every process notifies all others when it wants to enter the region of mutual exclusion. The process desiring to go in, enters when all others trying to get in made their request later. Notation ˇ n processes p1, -, pn ˇ tj timestamp Protocol 1. To enter the critical section, pi sends REQUEST(ti, i) to all sites and puts the request on its queue. 2. When pi receives a REQUEST(tj, j) message, it returns a REPLY(t'j, j) to pj and puts the request on its queue. 3. When pi receives a RELEASE(tj, j) message, it deletes pj's request from the queue. 4. pi enters the critical section when both of the following conditions hold: a. pi has received a message with timestamp larger than (ti, i) from each of the other sites b. pi's request is at the front of its request queue 5. When pi leaves the critical section, it removes its request from the top of the queue and sends RELEASE(ti, i) to all sites and puts the request on its queue. Example There are three processes, p1, p2, and p3. p1 and p3 seek mutually exclusive access to a shared resource. who action what whom C1 C2 C3 Q1 Q2 Q3 10 4 4 p1 sends Q(10,1) all 11 Q(10,1) p3 sends Q(4,3) all 5 Q(4,3) p2 receives Q(10,1) p1 10 Q(10,1) p2 sends P(10,2) p1 11 p2 receives Q(4,3) p3 11 Q(4,3)Q(10,1) p2 sends P(11,2) p3 12 p1 receives Q(4,3) p3 11 Q(4,3)Q(10,1) p1 sends P(11,1) p3 12 p3 receives Q(10,1) p1 10 Q(4,3)Q(10,1) p3 sends P(10,3) p1 11 p1 receives P(10,3) p3 12 Q(4,3)Q(10,1)P(10,3) p3 receives P(11,1) p1 12 Q(4,3)Q(10,1)P(11,1) p3 receives P(11,2) p3 13 Q(4,3)Q(10,1)P(11,1)P(11,2) p3 enters Q(4,3)Q(10,1) p1 receives P(10,2) p2 12 Q(4,3)Q(10,1)P(10,2)P(10,3) p3 leaves Q(10,1) p3 sends R(13,3) p1,p2 14 p2 receives R(13,3) p3 13 Q(10,1) p1 receives R(13,3) p3 13 Q(10,1)P(10,2)P(10,3) p1 enters Q(10,1) p1 leaves - p1 sends R(13,1) p1,p2 14 p2 receives R(13,1) p1 14 - p3 receives R(13,1) p1 15 - Ricart and Agrawala's Distributed Mutual Exclusion Protocol Introduction Ricart and Agrawala's protocol is an optimization of Lamport's. They piggyback the release message onto the reply. Notation ˇ n processes p1, -, pn ˇ tj timestamp Protocol 1. To enter the critical section, pi sends REQUEST(ti, i) to all sites. 2. When pi receives a REQUEST(tj, j) message: a. if it is not trying to enter the region of mutual exclusion, it returns REPLY(t'j, j) to pj. b. if it is trying to enter the region of mutual exclusion, and if (ti, i) ? (tj, j), it retains the REQUEST. c. otherwise it returns a REPLY(t'j, j) to pj.. 3. When pi has received a REPLY message from every other process, it enters the region of mutual exclusion. 4. When pi leaves the region of mutual exclusion, it sends REPLY(ti, i) to all processes with deferred requests. Example There are three processes, p1, p2, and p3. p1 and p3 seek mutually exclusive access to a shared resource. who action what whom C1 C2 C3 Q1 Q2 Q3 10 4 4 p1 sends Q(10,1) all 11 Q(10,1) p3 sends Q(4,3) all 5 Q(4,3) p2 receives Q(10,1) p1 10 p2 sends P(10,2) p1 11 p2 receives Q(4,3) p3 p2 sends P(11,2) p3 12 p1 receives Q(4,3) p3 Q(10,1) p1 sends P(11,1) p3 12 p3 receives Q(10,1) p1 10 Q(4,3)Q(10,1) p3 receives P(11,1) p1 11 Q(4,3)Q(10,1)P(11,1) p3 receives P(11,2) p2 12 Q(4,3)Q(10,1)P(11,1)P(11,2) p3 enters Q(4,3)Q(10,1) p1 receives P(10,2) p2 Q(10,1)P(10,2) p3 leaves Q(10,1) p3 sends P(12,3) p1 13 - p1 receives P(12,3) p3 13 Q(10,1)P(10,2)P(12,3) p1 enters Q(10,1) p1 leaves -