Extrinsic task performance leaderboard
See LDT paper for implementation details on all tasks.
Extrinsic task datasets:
- POS-tag.: POS-tagging task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
- Chunk.: Chunking task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
- NER: Named Entity Recognition task from CoNLL 2003 dataset (Tjong Kim Sang and De Meulder, 2003);
- Rel.Clas.: semantic relation classification task from SemEval 2010 (Hendrickx et al., 2010);
- Subj. Clas.: Rotten Tomato dataset (Pang and Lee, 2004);
- MR: short movie review dataset for sentence-level sentiment classification (Pang and Lee, 2005);
- IMDB: Standofd IMDB movie review dataset (Maas et al., 2011);
- SNLI: Stanford Natural Language Inference dataset (Bowman et al., 2015).
Model | Context | POS-tag. | Chunk. | NER | Rel. Clas. | Subj. Clas. | MR | IMDB | SNLI |
---|---|---|---|---|---|---|---|---|---|
CBOW | structured_deps | 88.34 | 78.03 | 75 | 73.92 | 87.36 | 73.88 | 82.11 | 69.33 |
GloVe | structured_deps | 86.98 | 75.76 | 70.05 | 72.83 | 87.76 | 74.8 | 76.08 | 68.82 |
SkipGram | structured_deps | 88.44 | 78.82 | 75.86 | 74.2 | 89.56 | 75.42 | 81.71 | 69.87 |
CBOW | structured_linear | 88.12 | 78.29 | 75.24 | 74.48 | 88.96 | 75.12 | 82.66 | 69.19 |
GloVe | structured_linear | 87.67 | 77.45 | 74.12 | 72.57 | 89.28 | 75.59 | 82.68 | 69.11 |
SkipGram | structured_linear | 88.31 | 79.14 | 75.54 | 75.6 | 90.2 | 76.91 | 82.95 | 69.42 |
CBOW | word_deps | 87.71 | 77.56 | 73.58 | 73.53 | 89.7 | 75.46 | 81.25 | 69.14 |
GloVe | word_deps | 84.67 | 68.36 | 68.76 | 72.02 | 88.8 | 74.69 | 82.57 | 69.28 |
SkipGram | word_deps | 87.77 | 78.05 | 75.38 | 73.96 | 89.8 | 77.15 | 82.78 | 69.93 |
CBOW | word_linear | 87.66 | 77.53 | 75.21 | 74.78 | 89.8 | 75.9 | 82.22 | 69.29 |
GloVe | word_linear | 83.8 | 66.1 | 69.62 | 71.05 | 89.16 | 74.6 | 82.24 | 69.51 |
SkipGram | word_linear | 87.86 | 78.23 | 75.72 | 74.8 | 89.92 | 76.86 | 82.73 | 69.74 |