How Computers Understood Humans

Catch on with this 7-slide introduction to deep natural language processing of 2022, featuring TF-IDF, Word2vec, knowledge graphs, and transformers.
  • ideas existed at least since 1700s, but not enough compute and computer science
  • Current computers do almost what was predicted, but how?
  • How to instruct computer to perform tasks?
  • How represent knowledge in computers?
  • How to generate the answers?

by his contrivance, the most ignorant person, at a reasonable charge, and with a little bodily labour, might write books in philosophy, poetry, politics, laws, mathematics, and theology, without the least assistance from genius or study. ... to read the several lines softly, as they appeared upon the frame (Gulliver's Travels, by Jonathan Swift 1726, making fun of Ramon Llull 1232)

Prompt as an Interface

  • 2001: A Space Odyssey HAL 9000
  • input textual instructions e.g. explain a riddle
  • based on its knowledge computer generates the answer text

2001 A Space Odyssey HAL-9000 Interface

Simple Document Representations

Latent semantic analysis (LSA) - CC BY-SA 4.0 Christoph Carl Kling

Non-Contextual Words Vectors

  • document split into sentence sized running window of 10 words
  • each of 10k sparsely coded vocabulary words is mapped (embedded) to a vector into a 300 dimensional space
  • the embeddings are compressed as only 300 dimensions much less than 10k vocabulary feature vectors
  • the embeddings are dense as the vector norm is not allowed to grow too large
  • these word vectors are non-contextual (global), so we cannot disambiguate fruit (flowering) from fruit (food)


Word2vec Method for Non-contextual Word Vectors

  • word2vec (Mikolov 2013): 10 surrounding words embeddings trained to sum up close to the middle word vector
  • even simpler method: GloVe (Pennington 2014): just counting co-occurrence in a 10 word window
  • other similar methods: FastText, StarSpace
  • words appearing in similar context have similar embedding vectors
  • word disambiguation is not supported

word2vec operation

Knowledge Graph’s Nodes Are Disambiguated

  • knowledge graph (KG) e.g. Wikidata: each node is specific fruit (flowering) vs fruit (food)
  • KG is an imperfect tradeoff between database and training data samples
  • Wikipedia and the internet are something between knowledge graph and set of documents
  • random walks over KG are valid “sentences”, which can be used to train node embeddings e.g. with Word2vec (see “link prediction”)

knowledge graph visualization from wikipedia

Contextual Word Vectors

  • imagine there is a node for each specific meaning of each word in hypothetical knowledge graph
  • given a word in a text of 100s of words, the specific surrounding words locate our position within the knowledge graph, and identify the word’s meaning
  • two popular model architectures incorporate context:

transformer from word2vec

Big Transformer Models

  • generate by predicting input text continuation
  • $10M transformers trained on large amount of text from the internet in 2022
  • can solve wide variety of problems like explaining jokes, sometimes with human level performance
  • examples: PaLM (2022), RETRO (2021), GPT-3, …

transformer next token prediction

Created on 18 Apr 2022. Updated on: 21 May 2022.
Thank you

Ask or Report A Mistake

Let's connect

Privacy Policy How many days left in this quarter? Twitter Bullet Points to Copy & Paste About Vaclav Kosar