we should avoid split brain
1. client needs to talk to both servers to get some progress (worst idea, worst than single server communication)
2. client can talk to single server to get some progress , but we might have some network issue , each server will hold wrong result and move on
majority vote
more than half of server should response to get some progress (leader election.. etc)
always calculate with total number of server eventhough you know some severs are dead but you still count total
2f+1 -> f failure can be tolerated, if total is 3 than f==1
Raft
http://thesecretlivesofdata.com/raft/
Raft
thesecretlivesofdata.com
we can learn raft by following this well made animation tutorial
key ,value are data that are stored in database
raft is consensus algorithm
client send operation to leader -> leader sends operation to replica(append entries ,next heart beat)
and when majority has got that operation -> leader commit -> replica commit -> send client it is commited
example of raft log
consider S3 as leader and it sends prevLogIndex and prevLogTerm
other server receive it and checks if those 2 matches. for example S2's prevLongIndex is 12 but prevLogTerm is 4 which is different
S1 and S2 will send reject signal
S3 will now decrement nextIndex for S2 and S1 as 12 , prevLogIndex=11 , prevLogterm=3 and send log entries again
S2 will agree and change 12index to term 5
S1 is still mismatching so S3 decrement nextIndex for S2 as 11 and prevllogindex,term
S1 agree , everybody is same now -> nextIndex for S1,2 becomes 14
notice we deleted term 4 at S2 12index why is it okay? -> it wasn't recorded to majority of server -> leader didn't send postive reply to client that it is commited -> okay to ignore
why longest log as leader is a bad idea
how this situation happend
value == term == at every term there is leader (if leader is dead , then term +1 and we do leader election)
1. at term 5 leader died -> term+1 =6 -> S1 got that and majority (either S2, S3) agreed that it can be leader at term 6.
2. but suddenly S1 died (so couldn't send append entries to add term to S2, S3) and quickly recover and selected as leader again at term 7
3. same things again -> and S1 died -> S2 or S2 selected as leader at term 8 and send append entiries
if S1 becomes leader due to having long log S2,S3 term 8 will be erased which is bad because at term 8 majority(S2,S3) agreed and comitted their value
election restriction
1. vote "yes" only if higher term in last log entry or
2. if same term in last log entry vote "yes" for longer log length
S1 election timer is short and it sends heartbeat to S2,S3 for voting
but neither S2 nor S3 will going to vote for S1
S2 , S3 will not vote for each other because their last term is same and has same log length
Fast backup
there are other servers , only showing 1 leader and 1follower
case1 = S2 is leader S1's 5term is missing in leader
case2 = S1 receive other 4 from older leader but new leader S2 didn't see them
case3 = S1 did't even received 4 from older leader
append entries reply (follower sends back)
xterm = term of conflcting entry
xindex = index of first entry of xterm
xlen = length of log
case1 = leader doesn't have xterm(5) -> leader will back up to xindex(here 1) -> S1 will be 4 6 6 6
case2 = leader have xterm(4) -> S1 will be 4 6 6 6
case3 = S1 will be 4 6 6 6
snapshot
log compaction = we can remove raft log from 1 to 3 and throw it away , if it has snapshot -> save it in disk
if snapshot happen follower that has only log index 1,2 and miss index 3 will never catch up because it is gone
leader will send snapshot instead of that index 3 log entry
Linearizability (happens instantaneously )
execution history is linearizable if total order of operations in history
that matches real time for non-concurrent(time doesn't overlap each other perfectly) operations
each read sees most recent write in order
https://stackoverflow.com/questions/9762101/what-is-linearizability
What is "Linearizability"?
Can anyone out there help me understand what Linearizability is? I need an explanation that is simple and easy to understand. I'm reading The Art of Multiprocessor Programming by Maruice Herilihy...
stackoverflow.com
this example is linearizable ,
this one has cycle -> not linearizable there is no lienar order that can satisify that cycle?
this example is linearizable
this case = client didn't get response for 1st response and send 2nd(same) response then get response
it depends on design(or kind of app service, if request was asking for current time it should return 4) whether server should give 3 or 4 to client
'Database > Distributed Systems' 카테고리의 다른 글
Lecture 10: Cloud Replicated DB, Aurora (0) | 2022.03.06 |
---|---|
Lecture 8: Zookeeper , More Replication, CRAQ (0) | 2022.03.05 |
Lecture 4: Primary-Backup Replication (0) | 2022.02.11 |
Lecture 3: GFS(Google File System) (0) | 2022.01.31 |
Lecture 2: RPC and Threads (0) | 2022.01.30 |