Vaclav Kosar's face photo
Vaclav Kosar
Software And Machine Learning Blog

Cross-Attention in Transformer Architecture

Cross-attention is a way add activations from another embedding sequence into transformer layers.

Cross-Attention in Transformer Architecture

  • an attention mechanism that mixes usually different modalities
  • one of the modalities defines the output dimensions and length by playing a role of a query
  • This is similar the feed forward layer where the other sequence is static


  • Let us have sequence A and sequence B
  • Attention matrix from sequence A is used to highlight in sequence B
  • Queries from sequence A
  • Keys and Values from another sequence B
  • sequences A and B lengths can differ


28 Dec 2021

Privacy Policy How many days left in this quarter?