One of the strengths of LD approach is that it is easy to extend to any vocabulary sample (i.e. whatever is relevant for your domain-specific task), and can be run on any set of word embeddings. That being said, fair comparison with models published by others must be conducted on the same data. It is only fair to evaluate what the model retained if you know what data it started with.
The training corpus is available in 3 versions:
(the last link is the version used in the non-dependency-parsed embeddings in our study, so use this one if you would like to have directly comparable embeddings).
Since LD relies on the content of vector neighborhoods, it is not very fair to compare embeddings with different vocabulary sizes. Our source embeddings were prepared from the same corpus, but with different context types, and so their vocabulary sizes different significantly. We therefore filtered them down to the vocabulary present in all of the models. The vocabulary list is available here.
Our study was performed with ldt909, a balanced sample of 909 common words, balanced for parts of speech (adjectives, adverbs, nouns, verbs) and frequencies in the Wikipedia corpus. Only common nouns were included. For the purposes of POS balance, we also restricted the selection to words that had no more than one part of speech (according to WordNet). See the paper for details.