Ranking or classifying adjacent words
I came across WordRank -- a fresh new approach to embedding words by looking at it as a ranking problem. In hindsight, this makes sense. In typical language modeling situation, NN based or otherwise, we are interested in this: you have a context cc, and you want to predict which word \hat{w}w^
from your vocabulary \SigmaΣ will follow it. Naturally, this can be setup either as a ranking problem or a classification problem. If you are coming from the learning the rank camp, all sorts of bells might be going off at this point, and you might have several good reasons for favoring the ranking formulation. That's exactly what we see in this paper. By setting up word embedding as a ranking problem, you get a discriminative training regimen and built in attention-like capability (more on that later).
-- Summary by Delip Rao.