Database/Distributed Systems

Lecture 3: GFS(Google File System)

Tony Lim 2022. 1. 31. 14:10
728x90

Why is it hard to build BIg Stoarge

want performance -> do sharding -> faults(some storage might fail) -> we want tolerance -> so we use replication  -> now there can be inconsistency -> we need lots of network talking to achieve consistency -> low performance

trade off happens

 

bad replication design

want to keep this 2 table identical

we haven't made any process that 2 servers are going to handle request in same order , client will try to read same key but might end with different value

 

GFS

big seqential access (not random)

Single data center , internal use

 

chunk handles = where to find data (identifier)

primary is only allow to be primary for certain lease time == lease expiration

nv = non volatile (need to be wrriten to disk)

 

read

1. name and offset (what client send to master)

2. master send handler (list of servers that has chunk of what client want)
client caches the chunk server , so client don't need to ask master again

3. client receive data from one of the chunkserver

 

write

if there are no primary master need to find the latest chunk(data) from servers

but server that has latest data might be down , and master might consider second latest data as up to date data which is bad

version number = even master itself crashes it can know what was primary and secondary chunk server with this number because they are non volatile(written in disk)

primary picks offset -> all replica write data at that offset -> if all "yes we did" primary reply to client to "success" else "failed(no)" -> client need to reissue(send again) append operation request

 

B failed and since primary choose offset (where to append)

C is append at weird place at replica 3. and for D too

 

 

 

GFS ++

https://www.youtube.com/watch?v=eRgFNW4QFDc&ab_channel=DefogTech 

 

 

 

728x90