Vaclav Kosar's face photo
Vaclav Kosar
Software, Machine Learning, & Business

How Computers Understood Humans

Catch on with this 7-slide introduction to deep natural language processing of 2022, featuring TF-IDF, Word2vec, knowledge graphs, and transformers.
  • ideas existed at least since 1700s, but not enough compute and computer science
  • Current computers do almost what was predicted, but how?
  • How to instruct computer to perform tasks?
  • How represent knowledge in computers?
  • How to generate the answers?

by his contrivance, the most ignorant person, at a reasonable charge, and with a little bodily labour, might write books in philosophy, poetry, politics, laws, mathematics, and theology, without the least assistance from genius or study. ... to read the several lines softly, as they appeared upon the frame (Gulliver's Travels, by Jonathan Swift 1726, making fun of Ramon Llull 1232)

Prompt as an Interface

  • 2001: A Space Odyssey HAL 9000
  • input textual instructions e.g. explain a riddle
  • based on its knowledge computer generates the answer text

2001 A Space Odyssey HAL-9000 Interface

How To Represent Knowledge

  • library ~> textual documents in a database
  • search by list of words (query) ~1970s, find topics ~1980
  • counting word occurrences on documents level into sparce matrices
  • methods: TF*IDF, Latent semantic analysis

Latent semantic analysis - CC BY-SA 4.0 Christoph Carl Kling

Non-Contextual Words Vectors

  • document -> sentence or small running window of 10 words
  • vector is point in a multidimensional space - an array of numbers
  • each of 10k words gets one general vector in 300 dimensional space
  • each vector is compressed in only 300 dimensions - much less than 10k words
  • global (non) contextual word vectors - no disambiguation (flowering) vs fruit (food)

word2vec

Word2vec: Word To a Global Vector

  • count co-occurrence in a 10 word window GloVe (Pennington 2014)
  • word2vec (Mikolov 2013): 10 surrounding words sum close to the middle word vector
  • words appearing in similar context are close in the 300 dimensional space
  • disambiguation - word strings should be just name not an id!

word2vec operation

Knowledge Graph’s Nodes Are Disambiguated

  • knowledge graph e.g. Wikidata: each node is specific fruit (flowering) vs fruit (food)
  • imperfect tradeoff between database and training data samples
  • Wikipedia and internet is between knowledge graph and set of documents
  • random walk ~ valid “sentences”, link prediction ~ generating text

knowledge graph visualization from wikipedia

Transformer: Contextual Word Vectors

transformer from word2vec

Big Transformer Models

  • generate by predicting input text continuation
  • $10M transformers trained on large amount of text from the internet in 2022
  • can solve wide variety of problems like explaining jokes, sometimes with human level performance
  • examples: PaLM (2022), RETRO (2021), GPT-3, …

transformer next token prediction

Created on 18 Apr 2022. Updated on: 21 May 2022.

Let's connect





Privacy Policy How many days left in this quarter? Twitter Bullet Points to Copy & Paste