AI/NLP (cs224n)

Lec10) Question Answering

Tony Lim 2021. 5. 11. 15:02
728x90

SQUAD evaluation

F1 meause is primary , exact match: 1/0 accuracy on whether you match one of the 3 answers

F1 = 2*precision*recall / (precision + recall)

precision = tp/(tp+fp) , recall = tp/(tp+fn)

tp = true positive ==  true (the model prediction is actually true) , postive ( model said it is true,positive)
number of tokens that are shared between the correct answer and the prediction

fp = false positve ==  false (the model prediction is acutally wrong) , postive ( model said it is true,positive)
number of tokens that are in the prediction but not in the correct answer.

fn = false negative == false (the model prediction is acutally wrong) , negative ( model said it is false , negative)
number of tokens that are in the correct answer but not in the prediction.

tn = true negative == true (the model prediction is actually true) , negative (mode said it is false, negative) 
in this case this won't make any sense ( number of token that are not in correction and not in the prediction)

F1 is less based on choosing exactly the same span that humans chose, which is susceptible to various effects including line breaks

 

SQUAD limitation 

it is not actually understanding paragraph it is sort of doing matching solving.

answers are only span-based answer ( no yes or no , counting ,  implicit why)

 

 

728x90