Transformer Positional Embeddings and Encodings

How transformers encode information about token positions?
positional embeddings in BERT architecture
positional embeddings in BERT architecture

Learned Positional Embeddings

Visualization of position-wise cosine similarity of different position embeddings
Visualization of position-wise cosine similarity of different position embeddings
  • In BERT, positional embeddings give first few tens of dimensions of the token embeddings meaning of relative positional closeness within the input sequence.
  • In Perceiver IO positional embeddings are concatenated to the input embedding sequence instead.
  • In SRU++ the positional embeddings are learned feature of the RNN.

Fourier (Sinusoid) Positional Encodings in BERT

  • Positional embeddings are added to the word embeddings once before the first layer.
  • Each position \( t \) within the sequence gets different embedding
    • if \( t = 2i \) is even then \( P_{t, j} := \sin (p / 10^{\frac{8i}{d}}) \)
    • if \( t = 2i + 1 \) is odd then \( P_{t, j} := \cos (p / 10^{\frac{8i}{d}}) \)
  • This is similar to fourier expansion of Diracs delta
  • dot product of any two positional encodings decays fast after first 2 nearby words
  • average sentence has around 15 words, thus only first dimensions carry information
  • the rest of the embeddings can thus function as word embeddings
  • not translational invariant, only the self-attention key-query comparison is
  • in-practical work for high-resolution inputs
Fourier (Sinusoid) Positional Encodings in BERT
Fourier (Sinusoid) Positional Encodings in BERT

Rotary Position Embedding (RoPE)

  • introduced in RoPE Embeddings in RoFormer
  • want relative position info in query-value dot-product
  • use multiplicative rotational matrix mixing pairwise neighboring dimensions
  • improves accuracy on long sequences?
  • poor results also reported: tweet 1, tweet 2
  • used in Google’s $10M model PaLM

Created on 05 Jun 2022. Updated on: 11 Jun 2022.
Thank you










About Vaclav Kosar How many days left in this quarter? Twitter Bullet Points to Copy & Paste Averaging Stopwatch Privacy Policy
Copyright © Vaclav Kosar. All rights reserved. Not investment, financial, medical, or any other advice. No guarantee of information accuracy.