StarSpace - Embeddings For Documents, Users, and Words

Create vectors of various entities in a single space with this general-purpose embedding model from Facebook AI.
StarSpace - Embeddings For Documents, Users, and Words
JS disabled! Watch StarSpace - Embeddings For Documents, Users, and Words on Youtube
Watch video "StarSpace - Embeddings For Documents, Users, and Words"

Summary

StarSpace: Embed All The Things!” with publication date 2017-11-21 from Facebook AI Research

  1. Is general-purpose method to embed multi-class entities into single vector space e.g. words, documents, and users can be embedded into single space.
  2. Requires discrete features e.g. user’s features are docs that he liked.
  3. Trains by summing bag-of-features and contrasting with k-negative samples.
  4. In terms of quality the method performs competitively.
  5. In terms of speed the method is on par with FastText.

Method

facebook starspace model method - sum
facebook starspace model method - sum

We train only the vectors directly without any other parameters. In contrast to Word2vec and FastText there is no word (input) vector concept, but only context (output) vector concept. The method is highly influenced by FastText, in comparison to which it is much more general, although slightly slower.

For each discrete feature, an embedding vector is fitted such that SGD minimizes the loss function below. Embeddings of the composite entities are then constructed as a sum of their sub-entities (bag-of-features). The loss function relies on having labels for positive (close) and non-positive (negative, distant) pairs. Thanks to this very general notion of labels, the embeddings can be constructed in many different scenarios.

\( v_{\mathrm{document}} = \sum_{w \in \mathrm{document}} v_w \)

The loss is calculated using margin ranking loss max(0, m - sim(s, ps) + sim(s, ns[0]) + sim(s, ns[1]) ..., where m is margin, s is sample, ps is positive sample, ns is negative sample array. Similarity function used was either dot product performing better in lower number of dimensions or cosine similarity being more suitable for higher dimensionality.

Embeddings for classes of entities higher in hierarchy are calculated by summing bag-of-words representations of its children.

Results

Text classification comparison with FastText:

StarSpace text classification results comparison with fastText
StarSpace text classification results comparison with fastText

Content based document recommendation, each user is described by the bag-of-documents they like, while each document is described by its bag-of-words.

StarSpace content-based recommendation results comparison with TF-IDF, word2vec, fastText
StarSpace content-based recommendation results comparison with TF-IDF, word2vec, fastText

Applications

  • text classification
  • ranking entities e.g. Automatically Expanding Taxonomy
  • collaborative filtering-based recommendation
  • content-based recommendation
  • word, sentence, document, graph embedding

Beyond StarSpace

While StarSpace is computationally and memory-wise cheap, post 2017 the state of the art usually involves Transfomer models. If you don’t understand transformer or self-attention yet, then read more about it here.

Quiz

Retain what you have just read by taking training quiz generated from this article.

StarSpace Quiz

Created on 08 May 2020.
Thank you










About Vaclav Kosar How many days left in this quarter? Twitter Bullet Points to Copy & Paste Averaging Stopwatch Privacy Policy
Copyright © Vaclav Kosar. All rights reserved. Not investment, financial, medical, or any other advice. No guarantee of information accuracy.