Database/Distributed Systems

Lecture 10: Cloud Replicated DB, Aurora

Tony Lim 2022. 3. 6. 14:02
728x90

History

EC2 = rent bunch of virutal machines in a single physical node + locally attached storage (hard drive for virtual machine guest)

for webserver ,since it is state less , it is okay if ec2 instances go down client can just fire up new webserver

but for db if the ec2 goes down there is no way to access that attached storage -> AWS provides S3 to store snapshot of DB but between snapshot are still vulnerable

 

EBS (Elastic Block Store)

pair of EBS server exist -> DB send write to EBS and chain replication happen -> last EBS sends response to DB

now if DB EC2 goes down we can just fire up new EC2 + DB and attache to EBS

only one EC2 can mount to one EBS (not sharable)

EBS pair is stored in same database center for performance so if both of them fail no way to recover

DB writes use network to send data to EBS -> network overload happens

 

DB tutorial - transaction , crash recovery

before actually commiting to disk -> log must be wrriten -> DB cache update to 510,740 -> disks get updated (wrriten log can be reused, don't have to write again)

each log entiry have it's transaction id -> recovery software can know what transaction need to be undone

 

RDS (Relational Database Service)

seperate available zone (different database center) -> more fault tolerance

but still using network -> expensive and slow 

 

Aurora

3 available zone , 6 replication -> super tolerence

1. only sends log entries not whole write data -> faster than RDS

2. use Quorum 

3. read only replica DB

 

fault tolerence goals for Aurora

1. want to write even with 1 dead AZ

2. want to read even with 1 dead AZ + 1 server dead

3. transient slow???

4. fast replication generation if 1 server fails

 

Qourum replciation

total N replicas

if we want to write -> W(number)<N need to be recognized by other replicas

same for Read(R<N, number)

R+W = N+1  -> read and write server needs to have at least 1 overlap server

version exist when client attempt to read and see 2 Servers( S2, S3) have different version client will read highest(latest) versioned data

do not need to wait for slow or dead servers client just to wait for threshold of servers

we can make read faster or write faster by choosing number(R,W)

 

 

storage servers (6 of them)

storage server will have old data pages and log entries will get attach to that old pages

when db evicts data pages from cache and needs to read again
db sends read -> SS need to update the old pages -> storage server will apply the logs entries

 

what if we need storage server that needs very big data storage -> split to 6 SS

each of PG(protection group) store some fraction of data pages + all the log records that applied to those data pages (subset of log which are relevent to that pages)

each node has different auora instance's data fragment

if one node fail we can recover from getting data from all the other PG node

 

reads happens more than write 

Aurora has read replica db , but only 1 write db

read replica can cache data -> write db sends log to read replica -> to update cache data pages

 

 

 

 

728x90