Database/Distributed Systems

Lecture 6: Fault Tolerance: Raft (1) ,(2)

Tony Lim 2022. 2. 26. 14:21

we should avoid split brain 

1. client needs to talk to both servers to get some progress (worst idea, worst than single server communication)

2. client can talk to single server to get some progress , but we might have some network issue , each server will hold wrong result and move on

 

 

majority vote

more than half of server should response to get some progress (leader election.. etc)

always calculate with total number of server eventhough you know some severs are dead but you still count total

2f+1 -> f failure can be tolerated, if total is 3 than f==1 

 

 

Raft

http://thesecretlivesofdata.com/raft/

 

Raft

 

thesecretlivesofdata.com

we can learn raft by following this well made animation tutorial

 

key ,value are data that are stored in database

raft is consensus algorithm

client send operation to leader -> leader sends operation to replica(append entries ,next heart beat)
and when majority has got that operation -> leader commit -> replica commit -> send client it is commited

 


 

 

example of raft log

consider S3 as leader and it sends prevLogIndex and prevLogTerm

other server receive it and checks if those 2 matches. for example S2's prevLongIndex is 12 but prevLogTerm is 4 which is different

S1 and S2 will send reject signal

 

S3 will now decrement nextIndex for S2 and S1 as 12 , prevLogIndex=11 , prevLogterm=3 and send log entries again

S2 will agree and change 12index to term 5

S1 is still mismatching so S3 decrement nextIndex for S2 as 11 and prevllogindex,term

S1 agree , everybody is same now ->  nextIndex for S1,2 becomes 14

 

notice we deleted term 4 at S2 12index why is it okay? -> it wasn't recorded to majority of server -> leader didn't send postive reply to client that it is commited -> okay to ignore

 

 

 

 

why longest log as leader is a bad idea

how this situation happend

value == term == at every term there is leader (if leader is dead , then term +1 and we do leader election)

1. at term 5 leader died -> term+1 =6 -> S1 got that and majority (either S2, S3) agreed that it can be leader at term 6.

2. but suddenly S1 died (so couldn't send append entries to add term to S2, S3) and quickly recover and selected as leader again at term 7

3. same things again -> and S1 died -> S2 or S2 selected as leader at term 8 and send append entiries

 

if S1 becomes leader due to having long log S2,S3 term 8 will be erased which is bad because at term 8 majority(S2,S3) agreed and comitted their value

 

 

election restriction

1. vote "yes" only if higher term in last log entry or
2. if same term in last log entry vote "yes" for longer log length

S1 election timer is short and it sends heartbeat to S2,S3 for voting 
but neither S2 nor S3 will going to vote for S1

S2 , S3 will not vote for each other because their last term is same and has same log length

 

 

Fast backup

there are other servers , only showing 1 leader and 1follower

case1 = S2 is leader S1's 5term is missing in leader

case2 = S1 receive other 4 from older leader but new leader S2 didn't see them

case3 = S1 did't even received 4 from older leader

 

append entries reply (follower sends back)

xterm =  term of conflcting entry

xindex = index of first entry of xterm

xlen = length of log

 

case1 = leader doesn't have xterm(5) -> leader will back up to xindex(here 1) -> S1 will be 4 6 6 6

case2 = leader have xterm(4)  -> S1 will be 4 6 6 6

case3 = S1 will be 4 6 6 6 

 

snapshot

log compaction = we can remove raft log from 1 to 3 and throw it away , if it has snapshot -> save it in disk

 

if snapshot happen follower that has only log index 1,2 and miss index 3 will never catch up because it is gone

leader will send snapshot instead of that index 3 log entry

 

 

Linearizability (happens instantaneously )

execution history is linearizable if total order of operations in history
that matches real time for non-concurrent(time doesn't overlap each other perfectly) operations 
each read sees most recent write in order

https://stackoverflow.com/questions/9762101/what-is-linearizability

 

What is "Linearizability"?

Can anyone out there help me understand what Linearizability is? I need an explanation that is simple and easy to understand. I'm reading The Art of Multiprocessor Programming by Maruice Herilihy...

stackoverflow.com

this example is linearizable , 

this one has cycle -> not linearizable there is no lienar order that can satisify that cycle?

this example is linearizable

 

this case = client didn't get response for 1st response and send 2nd(same) response then get response

it depends on design(or kind of app service, if request was asking for current time it should return 4) whether server should give 3 or 4 to client