Skip to main content

Extrinsic task performance leaderboard

Extrinsic task performance leaderboard

See LDT paper for implementation details on all tasks.

Extrinsic task datasets:

  • POS-tag.: POS-tagging task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
  • Chunk.: Chunking task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
  • NER: Named Entity Recognition task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
  • Rel.Clas.: semantic relation classification task from SemEval 2010 (Hendrickx et al., 2010);
  • Subj. Clas.: Rotten Tomato dataset (Pang and Lee, 2004);
  • MR: short movie review dataset for sentence-level sentiment classification (Pang and Lee, 2005);
  • IMDB: Standofd IMDB movie review dataset (Maas et al., 2011);
  • SNLI: Stanford Natural Language Inference dataset (Bowman et al., 2015).
Model Context POS-tag. Chunk. NER Rel. Clas. Subj. Clas. MR IMDB SNLI
CBOW structured_deps 88.34 78.03 75 73.92 87.36 73.88 82.11 69.33
GloVe structured_deps 86.98 75.76 70.05 72.83 87.76 74.8 76.08 68.82
SkipGram structured_deps 88.44 78.82 75.86 74.2 89.56 75.42 81.71 69.87
CBOW structured_linear 88.12 78.29 75.24 74.48 88.96 75.12 82.66 69.19
GloVe structured_linear 87.67 77.45 74.12 72.57 89.28 75.59 82.68 69.11
SkipGram structured_linear 88.31 79.14 75.54 75.6 90.2 76.91 82.95 69.42
CBOW word_deps 87.71 77.56 73.58 73.53 89.7 75.46 81.25 69.14
GloVe word_deps 84.67 68.36 68.76 72.02 88.8 74.69 82.57 69.28
SkipGram word_deps 87.77 78.05 75.38 73.96 89.8 77.15 82.78 69.93
CBOW word_linear 87.66 77.53 75.21 74.78 89.8 75.9 82.22 69.29
GloVe word_linear 83.8 66.1 69.62 71.05 89.16 74.6 82.24 69.51
SkipGram word_linear 87.86 78.23 75.72 74.8 89.92 76.86 82.73 69.74
See this page for the full data for dimensionality 25-500.