Vaclav Kosar's face photo
Vaclav Kosar
Software And Machine Learning Blog

DreamCoder: Wake & Sleep Program Learning

Learning to code by growing function library, fantasising coding tasks, and training neural search.
DreamCoder: Wake & Sleep Program Learning

Why coding?

Neural Networks are missing System 2:

Wiki disambiguation problem and submodularity

How DreamCoder works?

  • DreamCoder uses:
    • learned library of functions
    • learned neural search on program space
  • To:
    • given inputs-output examples
    • produce a the best solution

The best program:

  • short
  • in based on experience
  • too large program search space
  • DreamCoder
    • learn functions that compress solved examples
    • learn neural guided search

Similarities to AlphaZero:

  • also trains a neural search
  • also does “self-play” (Dream phase)

Example solved tasks:

Solved domains from the DreamCoder paper
Solved domains (sourced from the paper)

How DreamCoder learns?

Training:

  1. provide program primitives (map, fold, if, cons, >)
  2. loop
    1. wake phase:
      1. search for the best solution given the library
      2. store successful solutions
    2. sleep phase
      1. abstract
        1. replay successful solutions
        2. compress them by adding to library
      2. dream:
        1. generate & replay programs
        2. train neural search
Training phases
Training phases (sourced from the paper)

Wake Phase

  • neural network \( Q \) ranks potential solutions
    • given input-outputs and programs returns probability
    • called recognition model
  • the best solution:
    • many programs solve
    • select 5 smallest programs
    • use for next sleep
Abstraction phase:
  • propose new library functions
  • minizing the new functions
  • minimizing the length of solutions
  • reduce number of refactorings \( \sim 10^{14} \)
    • ideas from:
      • version space algebras
      • equivalence graphs
    • plus limit to small programs
    • lead to improvement \( \sim 10^6 \)

Dream phase:

  • replay programs ~100s
  • 50/50 mixing generated and replay programs
  • to augment data but stay representative
  • train Q function for search
DreamCoder logo task
DreamCoder LOGO task (A: tasks, B & C: learned functions, D & E: dreams. (sourced from the paper)

Results

  • list processing
    • primitives: map, fold, cons (pre-pend), …
    • 109 training tasks
    • created library of 20 routines
      • filter, max, n-th largest
    • learned to sort
  • text editing
    • e.g. “Alan Turing” -> “A.T.”
    • train set: 128 auto-generated text editing task
    • test set: 2017 SyGuS program synth
    • prior-train solved 3.7%
    • post-training solved 84.3% vs 82.4% prior art
      • CVC4 competition
      • others used handpicked library from SysGuS
      • DreamCoder used only primitives + training samples
      • but is that really better?
  • Ablations:
    • abstraction & dreaming are important

Why DreamCoder?

  • builds on RobustFill and DeepCoder
  • DeepCoder does not predict the probability of a program
  • DeepCoder predicts DSL primitive will be used in the program at least once
  • DeepCoder & DreamCoder libraries are
    • modestly-sized set
    • generally useful routines
    • interpretable routines

Sources & Details

Paper Notation

  • library: \( L \)
  • program: \( \rho \)
  • length: \( P( \rho | L) \)
  • is solution: \( P(x | \rho) \in \lbrace 0, 1 \rbrace \)
  • search: approximate posterior of recognition model \( Q( \rho | x) \)

Bonus Paper: Inference of Regular Expressions

Solution search

  • regex is represented by a tree
  • nodes are
    • sub-node placeholder *
    • primitives (\d, \w, [*], [^*], look-around, …)
  • tree structure presentents DNA of the candidates
  • keep only performing candidates
    • (~ library in DreamCoder)
  • crossover swaps candidates’ sub-trees
  • mutation replaces a subtree with random subtree
  • fitness is:
    • “Character Precision”
    • “Character Accuracy”
    • length of the expression
      • ~ DreamCoder’s code length

Comparison

  • the method outperforms, but not on all datasets
  • time is human comparable ~10mins

27 Apr 2021