Vaclav Kosar's face photo
Vaclav Kosar
Software And Machine Learning Blog

StarSpace Embedding - United and universal spaces of vectors

To embed variety of entities into single vector space, this paper describes general-purpose neural embedding model.
StarSpace Embedding - United and universal spaces of vectors


  1. Is general-purpose method to embed multi-class entities into single vector space e.g. words, documents, and users can be embedded into single space.
  2. Requires discrete features e.g. user’s features are docs that he liked.
  3. Trains by summing bag-of-features and contrasting with k-negative samples.
  4. In terms of quality the method performs competitively.
  5. In terms of speed the method is on par with FastText.


StarSpace: Embed All The Things!” with publication date 2017-11-21. Authors are Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes and Jason Weston. Funding comes from Facebook AI Research.


We train only the vectors directly without any other parameters. In contrast to Word2vec and FastText there is no word (input) vector concept, but only context (output) vector concept. The method is highly influenced by FastText, in comparison to which it is much more general, although slightly slower.

For each discrete feature, an embedding vector is fitted such that SGD minimizes the loss function below. Embeddings of the composite entities are then constructed as a sum of their sub-entities (bag-of-features). The loss function relies on having labels for positive (close) and non-positive (negative, distant) pairs. Thanks to this very general notion of labels, the embeddings can be constructed in many different scenarios.

The loss is calculated using margin ranking loss max(0, m - sim(s, ps) + sim(s, ns[0]) + sim(s, ns[1]) ..., where m is margin, s is sample, ps is positive sample, ns is negative sample array. Similarity function used was either dot product performing better in lower number of dimensions or cosine similarity being more suitable for higher dimensionality.

Embeddings for classes of entities higher in hierarchy are calculated by summing bag-of-words representations of its children.


  • text classification
  • ranking entities
  • collaborative filtering-based recommendation
  • content-based recommendation
  • word, sentence, dcoument, graph embedding


Retain what you have just read by taking training quiz generated from this article.

StarSpace Quiz

08 May 2020

Privacy Policy How many days left in this quarter?