Algorithm/Design and Analysis of Algorithms

Randomization: Universal & Perfect Hashing

Tony Lim 2021. 6. 4. 22:19

Universal Hashing

choose a random hash function h(function) from H(set)

require H(set) tobe a universal hashing family such that

Theorem = For n arbitrary distinct keys and random h(function) from H(set) , where H(set) is a universal hashing family

here I is indicator random variable. by computing expectation of I(i,j) is less than 1/m because h is from universal hashfunction family(set) 

E[I(i,i)] is always 1 because it will always map to same slot. 

 

Dot product hash family 

m is prime , u = m^r where r is integer.

computing ha(k) takes constant time because we are assuming word RAM model

Word RAM model = any integer that is size of word , and with constant number of words we can do any operation in constant time.

by choosing a randomly we can achieve universality of hash funciton 

 

we are using expectation to remove other randomness (a0 ,a1 a2.. ar). we are missing mod m when we use expectation.

with Mod m  , 1/u becomes 1/m because we are choosing a randomly.

 

Perfect Hashing (no insert or delete)

O(1) for search , O(n) space both in worst case , polynomial build time With High Probability

Step1 = Pick h1 from universal hash family that maps {0,1,... u-1} to {0,1,...,m-1} 

Step 1.5 = if  sum of squared number of items in slot j (0~m-1) is bigger than cn (c is chosen constant) we redo Step1

Step 2 = for each slot j (0~m-1) pick h(2,j) from universal hash family that maps {0,1, ... u-1} to {0, 1 , ... mj} where mj size is simliar to squared # of items in slot j.

Step 2.5 = if h(2,j) makes collision repick h(2,j) and do step2 again

 

first we prove how much time it takes to do step 2.5

l(j) = number of items in slot j.  union bound , because there might be many other keys that will collide beside key(i) so summing all probability of that key colliding will be bigger.

we choose among possilbe collsion key == (l(j) 2) 

by universality == 1/m == 1/ l(j) ^2

zero collsion will happen at least 50% ,  expected number of trial until we get zero collsion hash function h(2,j) is 2. 

 

just like in skip List (WITH HIGH PROBABILITY)

Randomization: Skip Lists (tistory.com)

 

Randomization: Skip Lists

Search(x) in case for level 2 walk right in top linked list walk down to bottom linked list walk right if it is last row linked list In general case Search will take O(logn) Insert(x) do Search(x)..

tonylim.tistory.com

then total time to do step 2 and step 2.5 is  O(N) [size of first hash table , we need to h2 for all slot] * O(logn) [size of l(j)] * O(logn) [expected nubmer of trial] == O(Nlog^2N)

 

n == for same i and i` 

(n 2) == choosing 2 from n keys but orders matter so we multiply by 2 

1/m == universality

 

picture is wrong the first inequality sign need to be reversed

probability of seocnd hash table size is bigger than cn is less than half

expected number of trial until we get table size less than cn is O(logn)

total time cost to do step 1 and step 1.5 == O(n) [time to hash all n keys] * O(nlogn) [repeat time] == O(nlogn)