# StarSpace - Embeddings For Documents, Users, and Words

Create vectors of various entities in a single space with this general-purpose embedding model from Facebook AI.

## Summary

StarSpace: Embed All The Things!” with publication date 2017-11-21 from Facebook AI Research

1. Is general-purpose method to embed multi-class entities into single vector space e.g. words, documents, and users can be embedded into single space.
2. Requires discrete features e.g. user’s features are docs that he liked.
3. Trains by summing bag-of-features and contrasting with k-negative samples.
4. In terms of quality the method performs competitively.
5. In terms of speed the method is on par with FastText.

## Method

We train only the vectors directly without any other parameters. In contrast to Word2vec and FastText there is no word (input) vector concept, but only context (output) vector concept. The method is highly influenced by FastText, in comparison to which it is much more general, although slightly slower.

For each discrete feature, an embedding vector is fitted such that SGD minimizes the loss function below. Embeddings of the composite entities are then constructed as a sum of their sub-entities (bag-of-features). The loss function relies on having labels for positive (close) and non-positive (negative, distant) pairs. Thanks to this very general notion of labels, the embeddings can be constructed in many different scenarios.

$$v_{\mathrm{document}} = \sum_{w \in \mathrm{document}} v_w$$

The loss is calculated using margin ranking loss max(0, m - sim(s, ps) + sim(s, ns[0]) + sim(s, ns[1]) ..., where m is margin, s is sample, ps is positive sample, ns is negative sample array. Similarity function used was either dot product performing better in lower number of dimensions or cosine similarity being more suitable for higher dimensionality.

Embeddings for classes of entities higher in hierarchy are calculated by summing bag-of-words representations of its children.

## Results

Text classification comparison with FastText:

Content based document recommendation, each user is described by the bag-of-documents they like, while each document is described by its bag-of-words.

## Applications

• text classification
• ranking entities e.g. Automatically Expanding Taxonomy
• collaborative filtering-based recommendation
• content-based recommendation
• word, sentence, document, graph embedding

## Beyond StarSpace

While StarSpace is computationally and memory-wise cheap, post 2017 the state of the art usually involves Transfomer models. If you don’t understand transformer or self-attention yet, then read more about it here.

## Quiz

Created on 08 May 2020.