Vaclav Kosar's face photo
Vaclav Kosar
Software And Machine Learning Blog

Manipulate Item Attributes via Disentangled Representation

Using attribute-specific subspaces for image manipulation retrieval, outfit completion, conditional similarity retrieval.
Manipulate Item Attributes via Disentangled Representation
  • Tasks:
    • For an image give me find the same but with different color from the dataset.
    • Generate a image of the item but with this attribute modified
    • Downstream task: complete fashion outfit can benefit from better representation
  • What is disentangled representation?
    • Entangled representation = hard to preserve some attributes and change others
    • Disentangled = Attributes have separate dimensions

Unsupervised Disentangling Methods

  • Below methods are generative
    • so instead of search, can manipulate the image
  • Variational Auto-encoders
    • speculation: some disentanglement thanks to the architecture
      • compressing into low-dimension and small-space (reg. term)
      • high-level factors only
      • similar in high level factors are encoded close to each other
    • methods: mutual information between latents, total correlation e.g. unsupervised Relevance factors VAE
  • GANs (has encoder and decoder) e.g. DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images,
  • Flow-Based models e.g. OpenAI’s Glow - Flow-Based Model Teardown
    • like VAE but the decoder is reverse of the encoder
    • reversibly encodes into independent gaussian factors
    • the attribute vectors are found using labeled data

Glow model smiling vector

Unsupervised Disentangled Representations

  • Google ICML 2019 Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
  • A large-scale evaluation of various unsupervised methods (12k models)
  • On dataset Shape3D try to separate all attributes of the scene
    • into 10 dimensions: object shape, object size, camera rotation, colors
  • No model disentangled reliably into above
  • Theorem: infinite transformations of the true distribution
    • cannot ever find true dimensions without a guide
    • but could find with additional data?
  • Assumptions about the data have to be incorporated into the model (inductive bias)
  • Each unsupervised model has to be specialized

Shape3D dataset for disentagling factors: floor color, wall color, object color, object size, camera angle

Multi-Task Learning

  • Multi-task learning may improve performance
    • Google NeurIPS 2021 paper on a method for grouping tasks
    • meta-learning
    • usually the tasks have to be related
    • inter-task affinity:
      • measure one task’s gradient affects the other tasks loss
      • correlates overall model performance
  • in below outfit recommendation improved on disentangled

inter-task affinity for multi-task learning task grouping

Supervised-Disentangling: Attribute-driven Disentangled Representations

  • Amazon 2021 paper Learning Attribute-driven Disentangled Representations for Interactive Fashion Retrieval
  • SoTA on the fashion tasks (Attribute manipulation retrieval, Conditional similarity retrieval, Outfit completion)
  • supervised disentangled representation learning
    • all attribute multiple values
    • split embedding into sections corresponding to attributes
    • multi-task training
    • store prototype embeddings of each attribute value in memory module
    • prototypes can then be swapped for items attribute vector

disentangled representation using attribute-specific encoder


  • image representation (AlexNet, Resnet18)
  • per attribute:
    • fully-connected two-layer network
    • map into attributed-specific subspace
    • producing image’s attribute embedding
  • disentangled representation
  • called Attribute-Driven Disentangled Encoder (ADDE)
  • memory block
    • stores prototype embeddings for all values of the attributes
    • e.g. each color has one prototype embeddings
    • stored in a matrix that forces small non-block diagonal elements
    • trained via triplet loss

Attribute-Driven Disentangled Encoder (ADDE)

Loss Function

  • Label triplet loss
    • representations with same labels to have same vectors
  • Consistency triplet loss
    • attribute representations of an image close to corresponding memory vectors
    • align prototype embeddings with representations
  • Compositional triplet loss
    • generate change in attributes
    • create manipulation vector using prototype vectors
    • sample positive and negative samples based on labels
  • Memory block loss
    • off-block-diagonal to zero

Experiments and Results


  • Shopping100k: 100k samples, 12 attributes
  • DeepFashion: 100k samples, 3 attributes: category, texture, shape

Attribute manipulation retrieval examples on Shopping100k and DeepFashion

Attribute Manipulation Retrieval

Attribute manipulation top-k retrival on Shopping100k and DeepFashion

Outfit Completion

ADDE outfit complementary retrieval

Outfit Ranking Loss

  • operates on entire outfit
  • calculates average distance from all members in the outfit to the proposed addition
  • input these distances into a triplet loss

Outfit Ranking Loss

25 Oct 2021

Privacy Policy