Database/Distributed Systems

Lecture 8: Zookeeper , More Replication, CRAQ

Tony Lim 2022. 3. 5. 13:53
728x90

why focus on zookeepr

1. N * Servers -> N * Performance? 

2. what is api for genral purpose coordinate service? ,raft is library

 

 

zk works on top of ZAB which is similar to raft

replica should never serves(write operation) client because it's data might not be consistent with other , leader can work fine with majority and some replica might not get updated

write should be served with leader

 

zookeepr guarantees

1. linearizable writes not reads

2. FIFO client order (individual, not among multiple)
reads, writes will hapepen as client specified order

but dosen't gaurantees read will get latest data.

1. if there is only 1 client that writes and reads ZK guarantee order so i can always read latest data

2. if more then 2 client -> client 1 writes -> client 2 reads -> client 2 might not read latest data

 

read write example

no problem exist in this example 

 

read f2 might read wrong data , zookeepr api has feature "watch"

watch = before  we get result of read f2 replica will send notification that write is triggered  

 

 


 

file (/master, /workers) == znode

ZK handles leader election if /master dies

if 1 of worker dies ZK notify every other znodes

 

let new master knows where old master left off

 

ZK ensemble = many servers of ZK , prevent single point of failure

 

ZK quorums = threshold about how many server's agreement is needed to actaully commit the request

if there are network partition and quorums == 2 then above situation would keep on running 

problem (split brain)
1. since quorums is 2 no request will be commited among those 2 servers and will not be replicated to other 3 servers
2. if client request to other 3 servers it will get unexpected response

solution = quoroms to be 3 / more than half

 


 

zookeeper is based on raft

list of apis

create (path, data, flags) = create znode(file) exclusively , multi client will attempt to create a file but zookeeper make sure that only 1 client create "a file" and give other client an error

delete (path, version) = znode has version , deleting with specified version

exisits (path, watch) = watch applied to table , if table get modified zookeeper tell client

getdata (path, watch) = watch applied to content of that file

list (directory) = list all the files

 

while true
	x,v = getdata("f")
	if setdata("f",x+1,v)
		break

use case of api to solve get , put operation not atomic -> simplified(mini) transaction (atomic)

e.g) 1000 client send getdata("f",watch=t) -> only 1 of them succeed in setdata("f",x+1, 1) -> 999 gets notified that data is modified and their version (1) is too low

 

lock api

try to own lock if succeed return true else false

if file is already locked then wait for watch notification

 

why this doesn't suffer from herd = every client is just waiting for "one" particualr file 

1000 attemp to acquire lock -> 500 succeed , the other 500 failed -> every failed client are waiting from 1 to 500 files

why goto 2 = predecessor release lock delete the file  or  predecessor didn't hold the lock an exited and ZK delete that file

 

scalable lock

partial failure can be achieved 

 

chain replication

client send write to leader -> leader tell other replica -> tail gives response to client

client sends read to tail -> tail sends reponse

if intermideate fail we just remove it and link properly , if head or tail fails next node can be head or tail

but if network fails -> split brain can be happen -> head thinks 2nd node is dead at same time 2nd node think it's gonna be head

 

to avoid let node decide who's gonna be head -> ZK manages that kind of configuration outside

 

 

 

728x90