First, using the demonstrations significantly outperforms the no demonstrations method
even with small k (k = 4), and performance drop
from using gold labels to using random labels is
consistently small across varying k, in the range of
Interestingly, model performance does
not increase much as k increases when k ≥ 8, both
with gold labels and with random labels.