参考资料
Google I/O 2009 - Transactions Across Datacenters
youtube video
slide
Two Phase Commit Protocol (2PC)
http://www.cs.fsu.edu/~xyuan/cop5611/lecture15.html
http://www.cs.iastate.edu/~cs554/NOTES/Ch8-5.pdf
Three Phase Commit Protocol (3PC)
http://courses.cs.vt.edu/~cs5204/fall00/distributedDBMS/sreenu/3pc.html
两阶段提交,2PC
Assumptions
The protocol works in the following manner: One node is designated the coordinator, which is the master site, and the rest of the nodes in the network are called cohorts. Other assumptions of the protocol include stable storage at each site and use of a write ahead log by each node. Also, the protocol assumes that no node crashes forever, and eventually any two nodes can communicate with each other. The latter is not a big deal since network communication can typically be rerouted. The former is a much stronger assumption; suppose the machine blows up!
有限状态自动机(上半部分是coordinator,下半部分是participants)
Actions by coordinator:
write START_2PC to local log;
multicast VOTE_REQUEST to all participants;
while not all votes have been collected {
wait for any incoming vote;
if timeout {
write GLOBAL_ABORT to local log;
multicast GLOBAL_ABORT to all participants;
exit;
}
record vote;
}
if all participants sent VOTE_COMMIT and coordinator votes COMMIT {
write GLOBAL_COMMIT to local log;
multicast GLOBAL_COMMIT to all participants;
} else {
write GLOBAL_ABORT to local log;
multicast GLOBAL_ABORT to all participants;
}
When the coordinator crashes in state S, and then recovers to S:
- S=WAIT: retransmit VOTE_REQUEST
- S=ABORT: retransmit GLOBAL_ABORT
- S=COMMIT: retransmit GLOBAL_COMMIT
Actions by participant:
write INIT to local log;
wait for VOTE_REQUEST from coordinator;
if timeout {
write VOTE_ABORT to local log;
exit;
}
if participant votes COMMIT {
write VOTE_COMMIT to local log;
send VOTE_COMMIT to coordinator;
wait for DECISION from coordinator;
if timeout {
multicast DECISION_REQUEST to other participants;
wait until DECISION is received; /* remain blocked */
write DECISION to local log;
}
if DECISION == GLOBAL_COMMIT {
write GLOBAL_COMMIT to local log;
} else if DECISION == GLOBAL_ABORT {
write GLOBAL_ABORT to local log;
}
} else {
write VOTE_ABORT to local log;
send VOTE_ABORT to coordinator;
}
Actions for handling decision requests: (excuted by seperate thread)
while true {
wait until any incoming DECISION_REQUEST is received; /* remain blocked */
read most recently recorded STATE from the local log;
if STATE == GLOBAL_COMMIT {
send GLOBAL_COMMIT to requesting participant;
} else if STATE == INIT or STATE == GLOBAL_ABORT {
send GLOBAL_ABORT to requesting participant;
} else {
skip; /* participant remains blocked */
}
}
When a participant crashes in state S, and then recovers to S:
- S=INIT: abort and inform coordinator
- S=READY: contact other participants
- S=ABORT: enter into ABORT state
- S=COMMIT: enter into COMMIT state
三阶段提交,3PC
为了消除两阶段提交中额外处理DECISION_REQUEST线程,同时避免向其他participants询问当前状态,引入三阶段提交协议
Assumptions
each site uses the write-ahead-log protocol
atmost one site can fail during the execution of the transaction
有限状态自动机(左边是participants,右边是coordinator)