Skip to main content

Intrinsic task performance leaderboard

Intrinsic task performance leaderboard

Intrinsic task datasets:

  • WordSim353 (Finkelstein et al., 2002), together with its split into similarity and relatedness sections (Agirre et al., 2009);
  • RareWords (Luong et al., 2013);
  • MTurk (Radinsky et al., 2011);
  • MEN (Bruni et al., 2014);
  • SimLex999 (Hill et al., 2015);
  • Bigger Analogy Test Set (BATS) (Gladkova et al., 2016), split into sections for inflectional and derivational morphology, lexicographic and encyclopedic semantics.
Model Context MEN Mturk Rare Words WS353 WS353 rel. WS353 sim. Sim999 BATS (Infl.) BATS (Der.) BATS (Lex.) BATS (Enc.) BATS (avg)
CBOW structured_deps 0.61 0.61 0.22 0.56 0.47 0.68 0.32 69.19 16.53 8.05 19.43 27.9
GloVe structured_deps 0.52 0.50 0.28 0.52 0.42 0.66 0.34 69.62 5.97 2.66 12.77 22.3
SkipGram structured_deps 0.67 0.62 0.39 0.61 0.48 0.74 0.41 86.07 24.41 10.31 24.3 35.72
CBOW structured_linear 0.63 0.56 0.28 0.51 0.43 0.61 0.38 81.49 26.27 6.35 30.21 35.6
GloVe structured_linear 0.53 0.54 0.27 0.51 0.45 0.60 0.31 74.87 13.63 2.45 17.74 26.64
SkipGram structured_linear 0.69 0.59 0.43 0.62 0.52 0.72 0.44 88.03 34.57 10.84 33.42 41.17
CBOW word_deps 0.68 0.58 0.32 0.55 0.44 0.65 0.38 75.44 26.36 6.54 28.56 33.74
GloVe word_deps 0.64 0.58 0.26 0.51 0.49 0.59 0.26 75.33 18.38 7.51 44.89 35.96
SkipGram word_deps 0.72 0.62 0.43 0.66 0.55 0.76 0.44 88.27 36.07 12.43 39 43.37
CBOW word_linear 0.69 0.61 0.35 0.63 0.54 0.70 0.40 83.29 32.68 6.98 34.44 38.78
GloVe word_linear 0.60 0.58 0.24 0.50 0.50 0.56 0.25 68.63 13.96 3.87 44.3 32.18
SkipGram word_linear 0.71 0.63 0.42 0.64 0.55 0.72 0.44 87.62 38.4 11.42 39.19 43.61
See this page for the full data for dimensionality 25-500.